mlrun.execution¶
-
class
mlrun.execution.
MLClientCtx
(autocommit=False, tmp='', log_stream=None)[source]¶ Bases:
object
ML Execution Client Context
the context is generated and injected to the function using the
function.run()
or manually using theget_or_create_ctx()
call and provides an interface to use run params, metadata, inputs, and outputsbase metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using
set_label()
,set_annotation()
access parameters and secrets usingget_param()
,get_secret()
access input data objects usingget_input()
store results, artifacts, and real-time metrics using thelog_result()
,log_artifact()
,log_dataset()
andlog_model()
methodssee doc for the individual params and methods
-
property
annotations
¶ dictionary with annotations (read-only)
-
artifact_subpath
(*subpaths)[source]¶ subpaths under output path artifacts path
example:
data_path=context.artifact_subpath('data')
-
property
artifacts
¶ dictionary of artifacts (read-only)
-
commit
(message: str = '', completed=True)[source]¶ save run state and optionally add a commit message
- Parameters
message – commit message to save in the run
completed – mark run as completed
-
classmethod
from_dict
(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False)[source]¶ create execution context from dict
-
get_child_context
(with_parent_params=False, **params)[source]¶ get child context (iteration)
allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run
example:
def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem): df = data.as_df() best_accuracy = accuracy_sum = 0 for param in param_list: with context.get_child_context(myparam=param) as child: accuracy = child_handler(child, df, **child.parameters) accuracy_sum += accuracy child.log_result('accuracy', accuracy) if accuracy > best_accuracy: child.mark_as_best() best_accuracy = accuracy context.log_result('avg_accuracy', accuracy_sum / len(param_list))
- Parameters
params – extra (or override) params to parent context
with_parent_params – child will copy the parent parameters and add to them
- Returns
child context
-
get_dataitem
(url)[source]¶ get mlrun dataitem from url
example:
data = context.get_dataitem("s3://my-bucket/file.csv").as_df()
-
get_input
(key: str, url: str = '')[source]¶ get an input
DataItem
object, data objects have methods such as .get(), .download(), .url, .. to access the actual dataexample:
data = context.get_input("my_data").get()
-
get_param
(key: str, default=None)[source]¶ get a run parameter, or use the provided default if not set
example:
p1 = context.get_param("p1", 0)
-
get_project_param
(key: str, default=None)[source]¶ get a parameter from the run’s project’s parameters
-
get_secret
(key: str)[source]¶ get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..
example:
access_key = context.get_secret("ACCESS_KEY")
-
get_store_resource
(url)[source]¶ get mlrun data resource (feature set/vector, artifact, item) from url
example:
feature_vector = context.get_store_resource("store://feature-vectors/default/myvec") dataset = context.get_store_resource("store://artifacts/default/mydata")
- Parameters
url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors
-
property
in_path
¶ default input path for data objects
-
property
inputs
¶ dictionary of input data items (read-only)
-
property
iteration
¶ child iteration index, for hyper parameters
-
kind
= 'run'¶
-
property
labels
¶ dictionary with labels (read-only)
-
log_artifact
(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]¶ log an output artifact and optionally upload it to datastore
example:
context.log_artifact( "some-data", body=b"abc is 123", local_path="model.txt", labels={"framework": "xgboost"}, )
- Parameters
item – artifact key or artifact class ()
body – will use the body as the artifact content
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
viewer – kubeflow viewer type
target_path – absolute target path (instead of using artifact_path + local_path)
src_path – deprecated, use local_path
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
format – optional, format to use (e.g. csv, parquet, ..)
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
-
log_dataset
(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=False, db_key=None, target_path='', extra_data=None, **kwargs)[source]¶ log a dataset artifact and optionally upload it to datastore
example:
raw_data = { "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "age": [42, 52, 36, 24, 73], "testScore": [25, 94, 57, 62, 70], } df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"]) context.log_dataset("mydf", df=df, stats=True)
- Parameters
key – artifact key
df – dataframe object
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
format – optional, format to use (e.g. csv, parquet, ..)
target_path – absolute target path (instead of using artifact_path + local_path)
preview – number of lines to store as preview in the artifact metadata
stats – calculate and store dataset stats in the artifact metadata
extra_data – key/value list of extra files/charts to link with this dataset
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
-
log_iteration_results
(best, summary: list, task: dict, commit=False)[source]¶ Reserved for internal use
-
property
log_level
¶ get the logging level, e.g. ‘debug’, ‘info’, ‘error’
-
log_metric
(key: str, value, timestamp=None, labels=None)[source]¶ TBD, log a real-time time-series metric
-
log_metrics
(keyvals: dict, timestamp=None, labels=None)[source]¶ TBD, log a set of real-time time-series metrics
-
log_model
(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]¶ log a model artifact and optionally upload it to datastore
example:
context.log_model("model", body=dumps(model), model_file="model.pkl", metrics=context.results, training_set=training_df, label_column='label', feature_vector=feature_vector_uri, labels={"app": "fraud"})
- Parameters
key – artifact key or artifact class ()
body – will use the body as the artifact content
model_file – path to the local model file we upload (see also model_dir)
model_dir – path to the local dir holding the model file and extra files
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
framework – name of the ML framework
algorithm – training algorithm name
tag – version tag
metrics – key/value dict of model metrics
parameters – key/value dict of model parameters
inputs – ordered list of model input features (name, type, ..)
outputs – ordered list of model output/result elements (name, type, ..)
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights – list of feature weights, one per input column
training_set – training set dataframe, used to infer inputs & outputs
label_column – which columns in the training set are the label (target) columns
extra_data – key/value list of extra files/charts to link with this dataset value can be abs/relative path string | bytes | artifact object
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
-
log_result
(key: str, value, commit=False)[source]¶ log a scalar result value
example:
context.log_result('accuracy', 0.85)
- Parameters
key – result key
value – result value
commit – commit (write to DB now vs wait for the end of the run)
-
log_results
(results: dict, commit=False)[source]¶ log a set of scalar result values
example:
context.log_results({'accuracy': 0.85, 'loss': 0.2})
- Parameters
results – key/value dict or results
commit – commit (write to DB now vs wait for the end of the run)
-
property
logger
¶ built-in logger interface
example:
context.logger.info("started experiment..", param=5)
-
property
out_path
¶ default output path for artifacts
-
property
parameters
¶ dictionary of run parameters (read-only)
-
property
project
¶ project name, runs can be categorized by projects
-
property
results
¶ dictionary of results (read-only)
-
set_annotation
(key: str, value, replace: bool = True)[source]¶ set/record a specific annotation
example:
context.set_annotation("comment", "some text")
-
set_label
(key: str, value, replace: bool = True)[source]¶ set/record a specific label
example:
context.set_label("framework", "sklearn")
-
set_state
(state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]¶ modify and store the run state or mark an error
- Parameters
state – set run state
error – error message (if exist will set the state to error)
commit – will immediately update the state in the DB
-
property
tag
¶ run tag (uid or workflow id if exists)
-
property
uid
¶ Unique run id
-
update_child_iterations
(best_run=0, commit_children=False, completed=True)[source]¶ update children results in the parent, and optionally mark the best
- Parameters
best_run – marks the child iteration number (starts from 1)
commit_children – commit all child runs to the db
completed – mark children as completed
-
property