mlrun.execution#
- class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#
Bases:
object
ML Execution Client Context
The context is generated and injected to the function using the
function.run()
or manually using theget_or_create_ctx()
call and provides an interface to use run params, metadata, inputs, and outputs.Base metadata include: uid, name, project, and iteration (for hyper params). Users can set labels and annotations using
set_label()
,set_annotation()
. Access parameters and secrets usingget_param()
,get_secret()
. Access input data objects usingget_input()
. Store results, artifacts, and real-time metrics using thelog_result()
,log_artifact()
,log_dataset()
andlog_model()
methods.See doc for the individual params and methods
- property annotations#
Dictionary with annotations (read-only)
- artifact_subpath(*subpaths)[source]#
Subpaths under output path artifacts path
Example:
data_path=context.artifact_subpath('data')
- property artifacts#
Dictionary of artifacts (read-only)
- commit(message: str = '', completed=False)[source]#
Save run state and optionally add a commit message
- Parameters:
message -- Commit message to save in the run
completed -- Mark run as completed
- classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False, store_run=True, include_status=False)[source]#
Create execution context from dict
- get_child_context(with_parent_params=False, **params)[source]#
Get child context (iteration)
Allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run.
Example:
def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem): df = data.as_df() best_accuracy = accuracy_sum = 0 for param in param_list: with context.get_child_context(myparam=param) as child: accuracy = child_handler(child, df, **child.parameters) accuracy_sum += accuracy child.log_result('accuracy', accuracy) if accuracy > best_accuracy: child.mark_as_best() best_accuracy = accuracy context.log_result('avg_accuracy', accuracy_sum / len(param_list))
- Parameters:
params -- Extra (or override) params to parent context
with_parent_params -- Child will copy the parent parameters and add to them
- Returns:
Child context
- get_dataitem(url, secrets: dict | None = None)[source]#
Get mlrun dataitem from url
Example:
data = context.get_dataitem("s3://my-bucket/file.csv").as_df()
- Parameters:
url -- Data-item uri/path
secrets -- Additional secrets to use when accessing the data-item
- get_input(key: str, url: str = '')[source]#
Get an input
DataItem
object, data objects have methods such as .get(), .download(), .url, .. to access the actual data. Requires access to the data store secrets if configured.Example:
data = context.get_input("my_data").get()
- Parameters:
key -- The key name for the input url entry.
url -- The url of the input data (file, stream, ..) - optional, saved in the inputs dictionary if the key is not already present.
- Returns:
DataItem
object
- get_param(key: str, default=None)[source]#
Get a run parameter, or use the provided default if not set
Example:
p1 = context.get_param("p1", 0)
- get_project_object()[source]#
Get the MLRun project object by the project name set in the context.
- Returns:
The project object or None if it couldn't be retrieved.
- get_project_param(key: str, default=None)[source]#
Get a parameter from the run's project's parameters
- get_secret(key: str)[source]#
Get a key based secret e.g. DB password from the context. Secrets can be specified when invoking a run through vault, files, env, ..
Example:
access_key = context.get_secret("ACCESS_KEY")
- get_store_resource(url, secrets: dict | None = None)[source]#
Get mlrun data resource (feature set/vector, artifact, item) from url.
Example:
feature_vector = context.get_store_resource("store://feature-vectors/default/myvec") dataset = context.get_store_resource("store://artifacts/default/mydata")
- Parameters:
url -- Store resource uri/path, store://<type>/<project>/<name>:<version> Types: artifacts | feature-sets | feature-vectors
secrets -- Additional secrets to use when accessing the store resource
- property in_path#
Default input path for data objects
- property inputs#
Dictionary of input data item urls (read-only)
- is_logging_worker()[source]#
Check if the current worker is the logging worker.
- Returns:
True if the context belongs to the logging worker and False otherwise.
- property iteration#
Child iteration index, for hyperparameters
- kind = 'run'#
- property labels#
Dictionary with labels (read-only)
- log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#
Log an output artifact and optionally upload it to datastore
Example:
context.log_artifact( "some-data", body=b"abc is 123", local_path="model.txt", labels={"framework": "xgboost"}, )
- Parameters:
item -- Artifact key or artifact object (can be any type, such as dataset, model, feature store)
body -- Will use the body as the artifact content
local_path -- Path to the local file we upload, will also be use as the destination subpath (under "artifact_path")
artifact_path -- Target artifact path (when not using the default) To define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- Version tag
viewer -- Kubeflow viewer type
target_path -- Absolute target path (instead of using artifact_path + local_path)
src_path -- Deprecated, use local_path
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
format -- Optional, format to use (e.g. csv, parquet, ..)
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table
- Returns:
Artifact object
- log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=None, db_key=None, target_path='', extra_data=None, label_column: str | None = None, **kwargs)[source]#
Log a dataset artifact and optionally upload it to datastore
If the dataset exists with the same key and tag, it will be overwritten.
Example:
raw_data = { "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "age": [42, 52, 36, 24, 73], "testScore": [25, 94, 57, 62, 70], } df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"]) context.log_dataset("mydf", df=df, stats=True)
- Parameters:
key -- Artifact key
df -- Dataframe object
label_column -- Name of the label column (the one holding the target (y) values)
local_path -- Path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.
artifact_path -- Target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- Version tag
format -- Optional, format to use (e.g. csv, parquet, ..)
target_path -- Absolute target path (instead of using artifact_path + local_path)
preview -- Number of lines to store as preview in the artifact metadata
stats -- Calculate and store dataset stats in the artifact metadata
extra_data -- Key/value list of extra files/charts to link with this dataset
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table
- Returns:
Artifact object
- log_iteration_results(best, summary: list, task: dict, commit=False)[source]#
Reserved for internal use
- property log_level#
Get the logging level, e.g. 'debug', 'info', 'error'
- log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: List[Feature] | None = None, outputs: List[Feature] | None = None, feature_vector: str | None = None, feature_weights: list | None = None, training_set=None, label_column: str | list | None = None, extra_data=None, db_key=None, **kwargs)[source]#
Log a model artifact and optionally upload it to datastore
Example:
context.log_model("model", body=dumps(model), model_file="model.pkl", metrics=context.results, training_set=training_df, label_column='label', feature_vector=feature_vector_uri, labels={"app": "fraud"})
- Parameters:
key -- Artifact key or artifact class ()
body -- Will use the body as the artifact content
model_file -- Path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir -- Path to the local dir holding the model file and extra files
artifact_path -- Target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
framework -- Name of the ML framework
algorithm -- Training algorithm name
tag -- Version tag
metrics -- Key/value dict of model metrics
parameters -- Key/value dict of model parameters
inputs -- Ordered list of model input features (name, type, ..)
outputs -- Ordered list of model output/result elements (name, type, ..)
upload -- Upload to datastore (default is True)
labels -- A set of key/value labels to tag the artifact with
feature_vector -- Feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights -- List of feature weights, one per input column
training_set -- Training set dataframe, used to infer inputs & outputs
label_column -- Which columns in the training set are the label (target) columns
extra_data -- Key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
db_key -- The key to use in the artifact DB table, by default its run name + '_' + key db_key=False will not register it in the artifacts table
- Returns:
Artifact object
- log_result(key: str, value, commit=False)[source]#
Log a scalar result value
Example:
context.log_result('accuracy', 0.85)
- Parameters:
key -- Result key
value -- Result value
commit -- Commit (write to DB now vs wait for the end of the run)
- log_results(results: dict, commit=False)[source]#
Log a set of scalar result values
Example:
context.log_results({'accuracy': 0.85, 'loss': 0.2})
- Parameters:
results -- Key/value dict or results
commit -- Commit (write to DB now vs wait for the end of the run)
- property logger#
Built-in logger interface
Example:
context.logger.info("Started experiment..", param=5)
- property out_path#
Default output path for artifacts
- property parameters#
Dictionary of run parameters (read-only)
- property project#
Project name, runs can be categorized by projects
- property results#
Dictionary of results (read-only)
- set_annotation(key: str, value, replace: bool = True)[source]#
Set/record a specific annotation
Example:
context.set_annotation("comment", "some text")
- set_label(key: str, value, replace: bool = True)[source]#
Set/record a specific label
Example:
context.set_label("framework", "sklearn")
- set_state(execution_state: str | None = None, error: str | None = None, commit=True)[source]#
Modify and store the execution state or mark an error and update the run state accordingly. This method allows to set the run state to 'completed' in the DB which is discouraged. Completion of runs should be decided externally to the execution context.
- Parameters:
execution_state -- set execution state
error -- error message (if exist will set the state to error)
commit -- will immediately update the state in the DB
- property state#
Execution state
- store_run()[source]#
Store the run object in the DB - removes missing fields. Use _update_run for coherent updates. Should be called by the logging worker only (see is_logging_worker()).
- property tag#
Run tag (uid or workflow id if exists)
- property uid#
Unique run id
- update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#
Update children results in the parent, and optionally mark the best.
- Parameters:
best_run -- Marks the child iteration number (starts from 1)
commit_children -- Commit all child runs to the db
completed -- Mark children as completed