mlrun.execution#

class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#

Bases: object

ML Execution Client Context

the context is generated and injected to the function using the function.run() or manually using the get_or_create_ctx() call and provides an interface to use run params, metadata, inputs, and outputs

base metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using set_label(), set_annotation() access parameters and secrets using get_param(), get_secret() access input data objects using get_input() store results, artifacts, and real-time metrics using the log_result(), log_artifact(), log_dataset() and log_model() methods

see doc for the individual params and methods

property annotations#

dictionary with annotations (read-only)

artifact_subpath(*subpaths)[source]#

subpaths under output path artifacts path

example:

data_path=context.artifact_subpath('data')
property artifacts#

dictionary of artifacts (read-only)

commit(message: str = '', completed=True)[source]#

save run state and optionally add a commit message

Parameters
  • message – commit message to save in the run

  • completed – mark run as completed

classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False, update_db=True)[source]#

create execution context from dict

get_cached_artifact(key)[source]#

return an logged artifact from cache (for potential updates)

get_child_context(with_parent_params=False, **params)[source]#

get child context (iteration)

allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run

example:

def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem):
    df = data.as_df()
    best_accuracy = accuracy_sum = 0
    for param in param_list:
        with context.get_child_context(myparam=param) as child:
            accuracy = child_handler(child, df, **child.parameters)
            accuracy_sum += accuracy
            child.log_result('accuracy', accuracy)
            if accuracy > best_accuracy:
                child.mark_as_best()
                best_accuracy = accuracy

    context.log_result('avg_accuracy', accuracy_sum / len(param_list))
Parameters
  • params – extra (or override) params to parent context

  • with_parent_params – child will copy the parent parameters and add to them

Returns

child context

get_dataitem(url, secrets: Optional[dict] = None)[source]#

get mlrun dataitem from url

example:

data = context.get_dataitem("s3://my-bucket/file.csv").as_df()
Parameters
  • url – data-item uri/path

  • secrets – additional secrets to use when accessing the data-item

get_input(key: str, url: str = '')[source]#

get an input DataItem object, data objects have methods such as .get(), .download(), .url, .. to access the actual data

example:

data = context.get_input("my_data").get()
get_meta() dict[source]#

Reserved for internal use

get_param(key: str, default=None)[source]#

get a run parameter, or use the provided default if not set

example:

p1 = context.get_param("p1", 0)
get_project_param(key: str, default=None)[source]#

get a parameter from the run’s project’s parameters

get_secret(key: str)[source]#

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..

example:

access_key = context.get_secret("ACCESS_KEY")
get_store_resource(url, secrets: Optional[dict] = None)[source]#

get mlrun data resource (feature set/vector, artifact, item) from url

example:

feature_vector = context.get_store_resource("store://feature-vectors/default/myvec")
dataset = context.get_store_resource("store://artifacts/default/mydata")
Parameters
  • url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors

  • secrets – additional secrets to use when accessing the store resource

property in_path#

default input path for data objects

property inputs#

dictionary of input data items (read-only)

property iteration#

child iteration index, for hyper parameters

kind = 'run'#
property labels#

dictionary with labels (read-only)

log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#

log an output artifact and optionally upload it to datastore

example:

context.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)
Parameters
  • item – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • viewer – kubeflow viewer type

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • src_path – deprecated, use local_path

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • format – optional, format to use (e.g. csv, parquet, ..)

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=None, db_key=None, target_path='', extra_data=None, label_column: Optional[str] = None, **kwargs)[source]#

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)
Parameters
  • key – artifact key

  • df – dataframe object

  • label_column – name of the label column (the one holding the target (y) values)

  • local_path – path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • format – optional, format to use (e.g. csv, parquet, ..)

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • preview – number of lines to store as preview in the artifact metadata

  • stats – calculate and store dataset stats in the artifact metadata

  • extra_data – key/value list of extra files/charts to link with this dataset

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_iteration_results(best, summary: list, task: dict, commit=False)[source]#

Reserved for internal use

property log_level#

get the logging level, e.g. ‘debug’, ‘info’, ‘error’

log_metric(key: str, value, timestamp=None, labels=None)[source]#

TBD, log a real-time time-series metric

log_metrics(keyvals: dict, timestamp=None, labels=None)[source]#

TBD, log a set of real-time time-series metrics

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]#

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})
Parameters
  • key – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • model_file – path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)

  • model_dir – path to the local dir holding the model file and extra files

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • framework – name of the ML framework

  • algorithm – training algorithm name

  • tag – version tag

  • metrics – key/value dict of model metrics

  • parameters – key/value dict of model parameters

  • inputs – ordered list of model input features (name, type, ..)

  • outputs – ordered list of model output/result elements (name, type, ..)

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights – list of feature weights, one per input column

  • training_set – training set dataframe, used to infer inputs & outputs

  • label_column – which columns in the training set are the label (target) columns

  • extra_data – key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_result(key: str, value, commit=False)[source]#

log a scalar result value

example:

context.log_result('accuracy', 0.85)
Parameters
  • key – result key

  • value – result value

  • commit – commit (write to DB now vs wait for the end of the run)

log_results(results: dict, commit=False)[source]#

log a set of scalar result values

example:

context.log_results({'accuracy': 0.85, 'loss': 0.2})
Parameters
  • results – key/value dict or results

  • commit – commit (write to DB now vs wait for the end of the run)

property logger#

built-in logger interface

example:

context.logger.info("started experiment..", param=5)
mark_as_best()[source]#

mark a child as the best iteration result, see .get_child_context()

property out_path#

default output path for artifacts

property parameters#

dictionary of run parameters (read-only)

property project#

project name, runs can be categorized by projects

property results#

dictionary of results (read-only)

set_annotation(key: str, value, replace: bool = True)[source]#

set/record a specific annotation

example:

context.set_annotation("comment", "some text")
set_hostname(host: str)[source]#

update the hostname, for internal use

set_label(key: str, value, replace: bool = True)[source]#

set/record a specific label

example:

context.set_label("framework", "sklearn")
set_logger_stream(stream)[source]#
set_state(state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]#

modify and store the run state or mark an error

Parameters
  • state – set run state

  • error – error message (if exist will set the state to error)

  • commit – will immediately update the state in the DB

property tag#

run tag (uid or workflow id if exists)

to_dict()[source]#

convert the run context to a dictionary

to_json()[source]#

convert the run context to a json buffer

to_yaml()[source]#

convert the run context to a yaml buffer

property uid#

Unique run id

update_artifact(artifact_object)[source]#

update an artifact object in the cache and the DB

update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#

update children results in the parent, and optionally mark the best

Parameters
  • best_run – marks the child iteration number (starts from 1)

  • commit_children – commit all child runs to the db

  • completed – mark children as completed