mlrun.execution#

class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#

Bases: object

ML Execution Client Context

the context is generated and injected to the function using the function.run() or manually using the get_or_create_ctx() call and provides an interface to use run params, metadata, inputs, and outputs

base metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using set_label(), set_annotation() access parameters and secrets using get_param(), get_secret() access input data objects using get_input() store results, artifacts, and real-time metrics using the log_result(), log_artifact(), log_dataset() and log_model() methods

see doc for the individual params and methods

property annotations#: dictionary with annotations (read-only)

artifact_subpath(*subpaths)[source]#

subpaths under output path artifacts path

example:

data_path=context.artifact_subpath('data')

property artifacts#: dictionary of artifacts (read-only)

commit(message: str = '', completed=True)[source]#

save run state and optionally add a commit message

Parameters

message – commit message to save in the run
completed – mark run as completed

classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False)[source]#: create execution context from dict

get_cached_artifact(key)[source]#: return an logged artifact from cache (for potential updates)

get_child_context(with_parent_params=False, **params)[source]#

get child context (iteration)

allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run

example:

def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem):
    df = data.as_df()
    best_accuracy = accuracy_sum = 0
    for param in param_list:
        with context.get_child_context(myparam=param) as child:
            accuracy = child_handler(child, df, **child.parameters)
            accuracy_sum += accuracy
            child.log_result('accuracy', accuracy)
            if accuracy > best_accuracy:
                child.mark_as_best()
                best_accuracy = accuracy

    context.log_result('avg_accuracy', accuracy_sum / len(param_list))

Parameters

params – extra (or override) params to parent context
with_parent_params – child will copy the parent parameters and add to them

Returns

child context

get_dataitem(url)[source]#

get mlrun dataitem from url

example:

data = context.get_dataitem("s3://my-bucket/file.csv").as_df()

get_input(key: str, url: str = '')[source]#

get an input DataItem object, data objects have methods such as .get(), .download(), .url, .. to access the actual data

example:

data = context.get_input("my_data").get()

get_meta()[source]#: Reserved for internal use

get_param(key: str, default=None)[source]#

get a run parameter, or use the provided default if not set

example:

p1 = context.get_param("p1", 0)

get_project_param(key: str, default=None)[source]#: get a parameter from the run’s project’s parameters

get_secret(key: str)[source]#

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..

example:

access_key = context.get_secret("ACCESS_KEY")

get_store_resource(url)[source]#

get mlrun data resource (feature set/vector, artifact, item) from url

example:

feature_vector = context.get_store_resource("store://feature-vectors/default/myvec")
dataset = context.get_store_resource("store://artifacts/default/mydata")

Parameters: url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors

property in_path#: default input path for data objects

property inputs#: dictionary of input data items (read-only)

property iteration#: child iteration index, for hyper parameters

kind = 'run'#

property labels#: dictionary with labels (read-only)

log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#

log an output artifact and optionally upload it to datastore

example:

context.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)

Parameters

item – artifact key or artifact class ()
body – will use the body as the artifact content
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
viewer – kubeflow viewer type
target_path – absolute target path (instead of using artifact_path + local_path)
src_path – deprecated, use local_path
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
format – optional, format to use (e.g. csv, parquet, ..)
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=False, db_key=None, target_path='', extra_data=None, label_column: Optional[str] = None, **kwargs)[source]#

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)

Parameters

key – artifact key
df – dataframe object
label_column – name of the label column (the one holding the target (y) values)
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
format – optional, format to use (e.g. csv, parquet, ..)
target_path – absolute target path (instead of using artifact_path + local_path)
preview – number of lines to store as preview in the artifact metadata
stats – calculate and store dataset stats in the artifact metadata
extra_data – key/value list of extra files/charts to link with this dataset
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_iteration_results(best, summary: list, task: dict, commit=False)[source]#: Reserved for internal use

property log_level#: get the logging level, e.g. ‘debug’, ‘info’, ‘error’

log_metric(key: str, value, timestamp=None, labels=None)[source]#: TBD, log a real-time time-series metric

log_metrics(keyvals: dict, timestamp=None, labels=None)[source]#: TBD, log a set of real-time time-series metrics

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]#

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})

Parameters

key – artifact key or artifact class ()
body – will use the body as the artifact content
model_file – path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir – path to the local dir holding the model file and extra files
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
framework – name of the ML framework
algorithm – training algorithm name
tag – version tag
metrics – key/value dict of model metrics
parameters – key/value dict of model parameters
inputs – ordered list of model input features (name, type, ..)
outputs – ordered list of model output/result elements (name, type, ..)
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights – list of feature weights, one per input column
training_set – training set dataframe, used to infer inputs & outputs
label_column – which columns in the training set are the label (target) columns
extra_data – key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table

Returns

artifact object

log_result(key: str, value, commit=False)[source]#

log a scalar result value

example:

context.log_result('accuracy', 0.85)

Parameters

key – result key
value – result value
commit – commit (write to DB now vs wait for the end of the run)

log_results(results: dict, commit=False)[source]#

log a set of scalar result values

example:

context.log_results({'accuracy': 0.85, 'loss': 0.2})

Parameters

results – key/value dict or results
commit – commit (write to DB now vs wait for the end of the run)

property logger#

built-in logger interface

example:

context.logger.info("started experiment..", param=5)

mark_as_best()[source]#: mark a child as the best iteration result, see .get_child_context()

property out_path#: default output path for artifacts

property parameters#: dictionary of run parameters (read-only)

property project#: project name, runs can be categorized by projects

property results#: dictionary of results (read-only)

set_annotation(key: str, value, replace: bool = True)[source]#

set/record a specific annotation

example:

context.set_annotation("comment", "some text")

set_hostname(host: str)[source]#: update the hostname, for internal use

set_label(key: str, value, replace: bool = True)[source]#

set/record a specific label

example:

context.set_label("framework", "sklearn")

set_logger_stream(stream)[source]#

set_state(state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]#

modify and store the run state or mark an error

Parameters

state – set run state
error – error message (if exist will set the state to error)
commit – will immediately update the state in the DB

property tag#: run tag (uid or workflow id if exists)

to_dict()[source]#: convert the run context to a dictionary

to_json()[source]#: convert the run context to a json buffer

to_yaml()[source]#: convert the run context to a yaml buffer

property uid#: Unique run id

update_artifact(artifact_object)[source]#: update an artifact object in the cache and the DB

update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#

update children results in the parent, and optionally mark the best

Parameters

best_run – marks the child iteration number (starts from 1)
commit_children – commit all child runs to the db
completed – mark children as completed