mlrun.execution#

class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#

Bases: object

ML Execution Client Context

the context is generated and injected to the function using the function.run() or manually using the get_or_create_ctx() call and provides an interface to use run params, metadata, inputs, and outputs

base metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using set_label(), set_annotation() access parameters and secrets using get_param(), get_secret() access input data objects using get_input() store results, artifacts, and real-time metrics using the log_result(), log_artifact(), log_dataset() and log_model() methods

see doc for the individual params and methods

property annotations#: dictionary with annotations (read-only)

artifact_subpath(*subpaths)[source]#

subpaths under output path artifacts path

example:

data_path=context.artifact_subpath('data')

property artifacts#: dictionary of artifacts (read-only)

commit(message: str = '', completed=False)[source]#

save run state and optionally add a commit message

Parameters

message – commit message to save in the run
completed – mark run as completed

classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False, store_run=True)[source]#: create execution context from dict

get_cached_artifact(key)[source]#: return an logged artifact from cache (for potential updates)

get_child_context(with_parent_params=False, **params)[source]#

get child context (iteration)

allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run

example:

def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem):
    df = data.as_df()
    best_accuracy = accuracy_sum = 0
    for param in param_list:
        with context.get_child_context(myparam=param) as child:
            accuracy = child_handler(child, df, **child.parameters)
            accuracy_sum += accuracy
            child.log_result('accuracy', accuracy)
            if accuracy > best_accuracy:
                child.mark_as_best()
                best_accuracy = accuracy

    context.log_result('avg_accuracy', accuracy_sum / len(param_list))

Parameters

params – extra (or override) params to parent context
with_parent_params – child will copy the parent parameters and add to them

Returns

child context

get_dataitem(url, secrets: Optional[dict] = None)[source]#

get mlrun dataitem from url

example:

data = context.get_dataitem("s3://my-bucket/file.csv").as_df()

Parameters

url – data-item uri/path
secrets – additional secrets to use when accessing the data-item

get_input(key: str, url: str = '')[source]#

get an input DataItem object, data objects have methods such as .get(), .download(), .url, .. to access the actual data

example:

data = context.get_input("my_data").get()

get_meta() → dict[source]#: Reserved for internal use

get_param(key: str, default=None)[source]#

get a run parameter, or use the provided default if not set

example:

p1 = context.get_param("p1", 0)

get_project_param(key: str, default=None)[source]#: get a parameter from the run’s project’s parameters

get_secret(key: str)[source]#

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..

example:

access_key = context.get_secret("ACCESS_KEY")

get_store_resource(url, secrets: Optional[dict] = None)[source]#

get mlrun data resource (feature set/vector, artifact, item) from url

example:

feature_vector = context.get_store_resource("store://feature-vectors/default/myvec")
dataset = context.get_store_resource("store://artifacts/default/mydata")

Parameters

url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors
secrets – additional secrets to use when accessing the store resource

property in_path#: default input path for data objects

property inputs#: dictionary of input data items (read-only)

property iteration#: child iteration index, for hyper parameters

kind = 'run'#

property labels#: dictionary with labels (read-only)

log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#

log an output artifact and optionally upload it to datastore

example:

context.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)

Parameters

item – artifact key or artifact class ()
body – will use the body as the artifact content
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
viewer – kubeflow viewer type
target_path – absolute target path (instead of using artifact_path + local_path)
src_path – deprecated, use local_path
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
format – optional, format to use (e.g. csv, parquet, ..)
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=None, db_key=None, target_path='', extra_data=None, label_column: Optional[str] = None, **kwargs)[source]#

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)

Parameters

key – artifact key
df – dataframe object
label_column – name of the label column (the one holding the target (y) values)
local_path – path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
format – optional, format to use (e.g. csv, parquet, ..)
target_path – absolute target path (instead of using artifact_path + local_path)
preview – number of lines to store as preview in the artifact metadata
stats – calculate and store dataset stats in the artifact metadata
extra_data – key/value list of extra files/charts to link with this dataset
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_iteration_results(best, summary: list, task: dict, commit=False)[source]#: Reserved for internal use

property log_level#: get the logging level, e.g. ‘debug’, ‘info’, ‘error’

log_metric(key: str, value, timestamp=None, labels=None)[source]#: TBD, log a real-time time-series metric

log_metrics(keyvals: dict, timestamp=None, labels=None)[source]#: TBD, log a set of real-time time-series metrics

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]#

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})

Parameters

key – artifact key or artifact class ()
body – will use the body as the artifact content
model_file – path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir – path to the local dir holding the model file and extra files
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
framework – name of the ML framework
algorithm – training algorithm name
tag – version tag
metrics – key/value dict of model metrics
parameters – key/value dict of model parameters
inputs – ordered list of model input features (name, type, ..)
outputs – ordered list of model output/result elements (name, type, ..)
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights – list of feature weights, one per input column
training_set – training set dataframe, used to infer inputs & outputs
label_column – which columns in the training set are the label (target) columns
extra_data – key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_result(key: str, value, commit=False)[source]#

log a scalar result value

example:

context.log_result('accuracy', 0.85)

Parameters

key – result key
value – result value
commit – commit (write to DB now vs wait for the end of the run)

log_results(results: dict, commit=False)[source]#

log a set of scalar result values

example:

context.log_results({'accuracy': 0.85, 'loss': 0.2})

Parameters

results – key/value dict or results
commit – commit (write to DB now vs wait for the end of the run)

property logger#

built-in logger interface

example:

context.logger.info("started experiment..", param=5)

mark_as_best()[source]#: mark a child as the best iteration result, see .get_child_context()

property out_path#: default output path for artifacts

property parameters#: dictionary of run parameters (read-only)

property project#: project name, runs can be categorized by projects

property results#: dictionary of results (read-only)

set_annotation(key: str, value, replace: bool = True)[source]#

set/record a specific annotation

example:

context.set_annotation("comment", "some text")

set_hostname(host: str)[source]#: update the hostname, for internal use

set_label(key: str, value, replace: bool = True)[source]#

set/record a specific label

example:

context.set_label("framework", "sklearn")

set_logger_stream(stream)[source]#

set_state(execution_state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]#

Modify and store the execution state or mark an error and update the run state accordingly. This method allows to set the run state to ‘completed’ in the DB which is discouraged. Completion of runs should be decided externally to the execution context.

Parameters

execution_state – set execution state
error – error message (if exist will set the state to error)
commit – will immediately update the state in the DB

property state#: execution state

store_run()[source]#: store the run object in the DB - removes missing fields use _update_run for coherent updates

property tag#: run tag (uid or workflow id if exists)

to_dict()[source]#: convert the run context to a dictionary

to_json()[source]#: convert the run context to a json buffer

to_yaml()[source]#: convert the run context to a yaml buffer

property uid#: Unique run id

update_artifact(artifact_object)[source]#: update an artifact object in the cache and the DB

update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#

update children results in the parent, and optionally mark the best

Parameters

best_run – marks the child iteration number (starts from 1)
commit_children – commit all child runs to the db
completed – mark children as completed