mlrun.execution

mlrun.execution#

class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#

Bases: object

ML Execution Client Context

The context is generated and injected to the function using the function.run() or manually using the get_or_create_ctx() call and provides an interface to use run params, metadata, inputs, and outputs.

Base metadata include: uid, name, project, and iteration (for hyper params). Users can set labels and annotations using set_label(), set_annotation(). Access parameters and secrets using get_param(), get_secret(). Access input data objects using get_input(). Store results, artifacts, and real-time metrics using the log_result(), log_artifact(), log_dataset() and log_model() methods.

See doc for the individual params and methods

property annotations#: Dictionary with annotations (read-only)

artifact_subpath(*subpaths)[source]#

Subpaths under output path artifacts path

Example:

data_path = context.artifact_subpath("data")

property artifacts#: Dictionary of artifacts (read-only)

commit(message: str = '', completed=False)[source]#

Save run state and optionally add a commit message

Parameters:

message -- Commit message to save in the run
completed -- Mark run as completed

classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False, store_run=True, include_status=False)[source]#: Create execution context from dict

get_cached_artifact(key)[source]#: Return a logged artifact from cache (for potential updates)

get_child_context(with_parent_params=False, **params)[source]#

Get child context (iteration)

Allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run.

Example:

def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem):
    df = data.as_df()
    best_accuracy = accuracy_sum = 0
    for param in param_list:
        with context.get_child_context(myparam=param) as child:
            accuracy = child_handler(child, df, **child.parameters)
            accuracy_sum += accuracy
            child.log_result("accuracy", accuracy)
            if accuracy > best_accuracy:
                child.mark_as_best()
                best_accuracy = accuracy

    context.log_result("avg_accuracy", accuracy_sum / len(param_list))

Parameters:

params -- Extra (or override) params to parent context
with_parent_params -- Child will copy the parent parameters and add to them

Returns:

Child context

get_dataitem(url, secrets: dict | None = None)[source]#

Get mlrun dataitem from url

Example:

data = context.get_dataitem("s3://my-bucket/file.csv").as_df()

Parameters:

url -- Data-item uri/path
secrets -- Additional secrets to use when accessing the data-item

get_input(key: str, url: str = '')[source]#

Get an input DataItem object, data objects have methods such as .get(), .download(), .url, .. to access the actual data. Requires access to the data store secrets if configured.

Example:

data = context.get_input("my_data").get()

Parameters:

key -- The key name for the input url entry.
url -- The url of the input data (file, stream, ..) - optional, saved in the inputs dictionary if the key is not already present.

Returns:

DataItem object

get_meta() → dict[source]#: Reserved for internal use

get_param(key: str, default=None)[source]#

Get a run parameter, or use the provided default if not set

Example:

p1 = context.get_param("p1", 0)

get_project_object()[source]#

Get the MLRun project object by the project name set in the context.

Returns:: The project object or None if it couldn't be retrieved.

get_project_param(key: str, default=None)[source]#: Get a parameter from the run's project's parameters

get_secret(key: str)[source]#

Get a key based secret e.g. DB password from the context. Secrets can be specified when invoking a run through vault, files, env, ..

Example:

access_key = context.get_secret("ACCESS_KEY")

get_store_resource(url, secrets: dict | None = None)[source]#

Get mlrun data resource (feature set/vector, artifact, item) from url.

Example:

feature_vector = context.get_store_resource(
    "store://feature-vectors/default/myvec"
)
dataset = context.get_store_resource("store://artifacts/default/mydata")

Parameters:

url -- Store resource uri/path, store://<type>/<project>/<name>:<version> Types: artifacts | feature-sets | feature-vectors
secrets -- Additional secrets to use when accessing the store resource

property in_path#: Default input path for data objects

property inputs#: Dictionary of input data item urls (read-only)

is_logging_worker()[source]#

Check if the current worker is the logging worker.

Returns:: True if the context belongs to the logging worker and False otherwise.

property iteration#: Child iteration index, for hyperparameters

kind = 'run'#

property labels#: Dictionary with labels (read-only)

log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#

Log an output artifact and optionally upload it to datastore

Example:

context.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)

Parameters:

item -- Artifact key or artifact object (can be any type, such as dataset, model, feature store)
body -- Will use the body as the artifact content
local_path -- Path to the local file we upload, will also be use as the destination subpath (under "artifact_path")
artifact_path -- Target artifact path (when not using the default) To define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- Version tag
viewer -- Kubeflow viewer type
target_path -- Absolute target path (instead of using artifact_path + local_path)
src_path -- Deprecated, use local_path
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
format -- Optional, format to use (e.g. csv, parquet, ..)
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table

Returns:

Artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=None, db_key=None, target_path='', extra_data=None, label_column: str | None = None, **kwargs)[source]#

Log a dataset artifact and optionally upload it to datastore

If the dataset exists with the same key and tag, it will be overwritten.

Example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(
    raw_data, columns=["first_name", "last_name", "age", "testScore"]
)
context.log_dataset("mydf", df=df, stats=True)

Parameters:

key -- Artifact key
df -- Dataframe object
label_column -- Name of the label column (the one holding the target (y) values)
local_path -- Path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.
artifact_path -- Target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- Version tag
format -- Optional, format to use (e.g. csv, parquet, ..)
target_path -- Absolute target path (instead of using artifact_path + local_path)
preview -- Number of lines to store as preview in the artifact metadata
stats -- Calculate and store dataset stats in the artifact metadata
extra_data -- Key/value list of extra files/charts to link with this dataset
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table

Returns:

Artifact object

log_iteration_results(best, summary: list, task: dict, commit=False)[source]#: Reserved for internal use

property log_level#: Get the logging level, e.g. 'debug', 'info', 'error'

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: list[mlrun.features.Feature] | None = None, outputs: list[mlrun.features.Feature] | None = None, feature_vector: str | None = None, feature_weights: list | None = None, training_set=None, label_column: str | list | None = None, extra_data=None, db_key=None, **kwargs)[source]#

Log a model artifact and optionally upload it to datastore

Example:

context.log_model(
    "model",
    body=dumps(model),
    model_file="model.pkl",
    metrics=context.results,
    training_set=training_df,
    label_column="label",
    feature_vector=feature_vector_uri,
    labels={"app": "fraud"},
)

Parameters:

key -- Artifact key or artifact class ()
body -- Will use the body as the artifact content
model_file -- Path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir -- Path to the local dir holding the model file and extra files
artifact_path -- Target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
framework -- Name of the ML framework
algorithm -- Training algorithm name
tag -- Version tag
metrics -- Key/value dict of model metrics
parameters -- Key/value dict of model parameters
inputs -- Ordered list of model input features (name, type, ..)
outputs -- Ordered list of model output/result elements (name, type, ..)
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
feature_vector -- Feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights -- List of feature weights, one per input column
training_set -- Training set dataframe, used to infer inputs & outputs
label_column -- Which columns in the training set are the label (target) columns
extra_data -- Key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table

Returns:

Artifact object

log_result(key: str, value, commit=False)[source]#

Log a scalar result value

Example:

context.log_result("accuracy", 0.85)

Parameters:

key -- Result key
value -- Result value
commit -- Commit (write to DB now vs wait for the end of the run)

log_results(results: dict, commit=False)[source]#

Log a set of scalar result values

Example:

context.log_results({"accuracy": 0.85, "loss": 0.2})

Parameters:

results -- Key/value dict or results
commit -- Commit (write to DB now vs wait for the end of the run)

property logger#

Built-in logger interface

Example:

context.logger.info("Started experiment..", param=5)

mark_as_best()[source]#: mark a child as the best iteration result, see .get_child_context()

property out_path#: Default output path for artifacts

property parameters#: Dictionary of run parameters (read-only)

property project#: Project name, runs can be categorized by projects

property results#: Dictionary of results (read-only)

set_annotation(key: str, value, replace: bool = True)[source]#

Set/record a specific annotation

Example:

context.set_annotation("comment", "some text")

set_hostname(host: str)[source]#: Update the hostname, for internal use

set_label(key: str, value, replace: bool = True)[source]#

Set/record a specific label

Example:

context.set_label("framework", "sklearn")

set_logger_stream(stream)[source]#

set_state(execution_state: str | None = None, error: str | None = None, commit=True)[source]#

Modify and store the execution state or mark an error and update the run state accordingly. This method allows to set the run state to 'completed' in the DB which is discouraged. Completion of runs should be decided externally to the execution context.

Parameters:

execution_state -- set execution state
error -- error message (if exist will set the state to error)
commit -- will immediately update the state in the DB

property state#: Execution state

store_run()[source]#: Store the run object in the DB - removes missing fields. Use _update_run for coherent updates. Should be called by the logging worker only (see is_logging_worker()).

property tag#: Run tag (uid or workflow id if exists)

to_dict()[source]#: Convert the run context to a dictionary

to_json()[source]#: Convert the run context to a json buffer

to_yaml()[source]#: Convert the run context to a yaml buffer

property uid#: Unique run id

update_artifact(artifact_object)[source]#: Update an artifact object in the cache and the DB

update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#

Update children results in the parent, and optionally mark the best.

Parameters:

best_run -- Marks the child iteration number (starts from 1)
commit_children -- Commit all child runs to the db
completed -- Mark children as completed

mlrun.execution

Contents

mlrun.execution#