mlrun.execution#

class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#

Bases: object

ML Execution Client Context

the context is generated and injected to the function using the function.run() or manually using the get_or_create_ctx() call and provides an interface to use run params, metadata, inputs, and outputs

base metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using set_label(), set_annotation() access parameters and secrets using get_param(), get_secret() access input data objects using get_input() store results, artifacts, and real-time metrics using the log_result(), log_artifact(), log_dataset() and log_model() methods

see doc for the individual params and methods

property annotations#

dictionary with annotations (read-only)

artifact_subpath(*subpaths)[source]#

subpaths under output path artifacts path

example:

data_path=context.artifact_subpath('data')
property artifacts#

dictionary of artifacts (read-only)

commit(message: str = '', completed=False)[source]#

save run state and optionally add a commit message

Parameters
  • message – commit message to save in the run

  • completed – mark run as completed

classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False, store_run=True)[source]#

create execution context from dict

get_cached_artifact(key)[source]#

return an logged artifact from cache (for potential updates)

get_child_context(with_parent_params=False, **params)[source]#

get child context (iteration)

allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run

example:

def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem):
    df = data.as_df()
    best_accuracy = accuracy_sum = 0
    for param in param_list:
        with context.get_child_context(myparam=param) as child:
            accuracy = child_handler(child, df, **child.parameters)
            accuracy_sum += accuracy
            child.log_result('accuracy', accuracy)
            if accuracy > best_accuracy:
                child.mark_as_best()
                best_accuracy = accuracy

    context.log_result('avg_accuracy', accuracy_sum / len(param_list))
Parameters
  • params – extra (or override) params to parent context

  • with_parent_params – child will copy the parent parameters and add to them

Returns

child context

get_dataitem(url, secrets: Optional[dict] = None)[source]#

get mlrun dataitem from url

example:

data = context.get_dataitem("s3://my-bucket/file.csv").as_df()
Parameters
  • url – data-item uri/path

  • secrets – additional secrets to use when accessing the data-item

get_input(key: str, url: str = '')[source]#

get an input DataItem object, data objects have methods such as .get(), .download(), .url, .. to access the actual data

example:

data = context.get_input("my_data").get()
get_meta() dict[source]#

Reserved for internal use

get_param(key: str, default=None)[source]#

get a run parameter, or use the provided default if not set

example:

p1 = context.get_param("p1", 0)
get_project_param(key: str, default=None)[source]#

get a parameter from the run’s project’s parameters

get_secret(key: str)[source]#

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..

example:

access_key = context.get_secret("ACCESS_KEY")
get_store_resource(url, secrets: Optional[dict] = None)[source]#

get mlrun data resource (feature set/vector, artifact, item) from url

example:

feature_vector = context.get_store_resource("store://feature-vectors/default/myvec")
dataset = context.get_store_resource("store://artifacts/default/mydata")
Parameters
  • url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors

  • secrets – additional secrets to use when accessing the store resource

property in_path#

default input path for data objects

property inputs#

dictionary of input data items (read-only)

property iteration#

child iteration index, for hyper parameters

kind = 'run'#
property labels#

dictionary with labels (read-only)

log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#

log an output artifact and optionally upload it to datastore

example:

context.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)
Parameters
  • item – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • viewer – kubeflow viewer type

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • src_path – deprecated, use local_path

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • format – optional, format to use (e.g. csv, parquet, ..)

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=None, db_key=None, target_path='', extra_data=None, label_column: Optional[str] = None, **kwargs)[source]#

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)
Parameters
  • key – artifact key

  • df – dataframe object

  • label_column – name of the label column (the one holding the target (y) values)

  • local_path – path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • format – optional, format to use (e.g. csv, parquet, ..)

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • preview – number of lines to store as preview in the artifact metadata

  • stats – calculate and store dataset stats in the artifact metadata

  • extra_data – key/value list of extra files/charts to link with this dataset

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_iteration_results(best, summary: list, task: dict, commit=False)[source]#

Reserved for internal use

property log_level#

get the logging level, e.g. ‘debug’, ‘info’, ‘error’

log_metric(key: str, value, timestamp=None, labels=None)[source]#

TBD, log a real-time time-series metric

log_metrics(keyvals: dict, timestamp=None, labels=None)[source]#

TBD, log a set of real-time time-series metrics

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]#

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})
Parameters
  • key – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • model_file – path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)

  • model_dir – path to the local dir holding the model file and extra files

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • framework – name of the ML framework

  • algorithm – training algorithm name

  • tag – version tag

  • metrics – key/value dict of model metrics

  • parameters – key/value dict of model parameters

  • inputs – ordered list of model input features (name, type, ..)

  • outputs – ordered list of model output/result elements (name, type, ..)

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights – list of feature weights, one per input column

  • training_set – training set dataframe, used to infer inputs & outputs

  • label_column – which columns in the training set are the label (target) columns

  • extra_data – key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object

  • db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_result(key: str, value, commit=False)[source]#

log a scalar result value

example:

context.log_result('accuracy', 0.85)
Parameters
  • key – result key

  • value – result value

  • commit – commit (write to DB now vs wait for the end of the run)

log_results(results: dict, commit=False)[source]#

log a set of scalar result values

example:

context.log_results({'accuracy': 0.85, 'loss': 0.2})
Parameters
  • results – key/value dict or results

  • commit – commit (write to DB now vs wait for the end of the run)

property logger#

built-in logger interface

example:

context.logger.info("started experiment..", param=5)
mark_as_best()[source]#

mark a child as the best iteration result, see .get_child_context()

property out_path#

default output path for artifacts

property parameters#

dictionary of run parameters (read-only)

property project#

project name, runs can be categorized by projects

property results#

dictionary of results (read-only)

set_annotation(key: str, value, replace: bool = True)[source]#

set/record a specific annotation

example:

context.set_annotation("comment", "some text")
set_hostname(host: str)[source]#

update the hostname, for internal use

set_label(key: str, value, replace: bool = True)[source]#

set/record a specific label

example:

context.set_label("framework", "sklearn")
set_logger_stream(stream)[source]#
set_state(execution_state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]#

Modify and store the execution state or mark an error and update the run state accordingly. This method allows to set the run state to ‘completed’ in the DB which is discouraged. Completion of runs should be decided externally to the execution context.

Parameters
  • execution_state – set execution state

  • error – error message (if exist will set the state to error)

  • commit – will immediately update the state in the DB

property state#

execution state

store_run()[source]#

store the run object in the DB - removes missing fields use _update_run for coherent updates

property tag#

run tag (uid or workflow id if exists)

to_dict()[source]#

convert the run context to a dictionary

to_json()[source]#

convert the run context to a json buffer

to_yaml()[source]#

convert the run context to a yaml buffer

property uid#

Unique run id

update_artifact(artifact_object)[source]#

update an artifact object in the cache and the DB

update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#

update children results in the parent, and optionally mark the best

Parameters
  • best_run – marks the child iteration number (starts from 1)

  • commit_children – commit all child runs to the db

  • completed – mark children as completed