mlrun.execution
mlrun.execution#
- class mlrun.execution.MLClientCtx(autocommit=False, tmp='', log_stream=None)[source]#
Bases:
object
ML Execution Client Context
the context is generated and injected to the function using the
function.run()
or manually using theget_or_create_ctx()
call and provides an interface to use run params, metadata, inputs, and outputsbase metadata include: uid, name, project, and iteration (for hyper params) users can set labels and annotations using
set_label()
,set_annotation()
access parameters and secrets usingget_param()
,get_secret()
access input data objects usingget_input()
store results, artifacts, and real-time metrics using thelog_result()
,log_artifact()
,log_dataset()
andlog_model()
methodssee doc for the individual params and methods
- property annotations#
dictionary with annotations (read-only)
- artifact_subpath(*subpaths)[source]#
subpaths under output path artifacts path
example:
data_path=context.artifact_subpath('data')
- property artifacts#
dictionary of artifacts (read-only)
- commit(message: str = '', completed=True)[source]#
save run state and optionally add a commit message
- Parameters
message – commit message to save in the run
completed – mark run as completed
- classmethod from_dict(attrs: dict, rundb='', autocommit=False, tmp='', host=None, log_stream=None, is_api=False)[source]#
create execution context from dict
- get_child_context(with_parent_params=False, **params)[source]#
get child context (iteration)
allow sub experiments (epochs, hyper-param, ..) under a parent will create a new iteration, log_xx will update the child only use commit_children() to save all the children and specify the best run
example:
def handler(context: mlrun.MLClientCtx, data: mlrun.DataItem): df = data.as_df() best_accuracy = accuracy_sum = 0 for param in param_list: with context.get_child_context(myparam=param) as child: accuracy = child_handler(child, df, **child.parameters) accuracy_sum += accuracy child.log_result('accuracy', accuracy) if accuracy > best_accuracy: child.mark_as_best() best_accuracy = accuracy context.log_result('avg_accuracy', accuracy_sum / len(param_list))
- Parameters
params – extra (or override) params to parent context
with_parent_params – child will copy the parent parameters and add to them
- Returns
child context
- get_dataitem(url)[source]#
get mlrun dataitem from url
example:
data = context.get_dataitem("s3://my-bucket/file.csv").as_df()
- get_input(key: str, url: str = '')[source]#
get an input
DataItem
object, data objects have methods such as .get(), .download(), .url, .. to access the actual dataexample:
data = context.get_input("my_data").get()
- get_param(key: str, default=None)[source]#
get a run parameter, or use the provided default if not set
example:
p1 = context.get_param("p1", 0)
- get_project_param(key: str, default=None)[source]#
get a parameter from the run’s project’s parameters
- get_secret(key: str)[source]#
get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through vault, files, env, ..
example:
access_key = context.get_secret("ACCESS_KEY")
- get_store_resource(url)[source]#
get mlrun data resource (feature set/vector, artifact, item) from url
example:
feature_vector = context.get_store_resource("store://feature-vectors/default/myvec") dataset = context.get_store_resource("store://artifacts/default/mydata")
- Parameters
url – store resource uri/path, store://<type>/<project>/<name>:<version> types: artifacts | feature-sets | feature-vectors
- property in_path#
default input path for data objects
- property inputs#
dictionary of input data items (read-only)
- property iteration#
child iteration index, for hyper parameters
- kind = 'run'#
- property labels#
dictionary with labels (read-only)
- log_artifact(item, body=None, local_path=None, artifact_path=None, tag='', viewer=None, target_path='', src_path=None, upload=None, labels=None, format=None, db_key=None, **kwargs)[source]#
log an output artifact and optionally upload it to datastore
example:
context.log_artifact( "some-data", body=b"abc is 123", local_path="model.txt", labels={"framework": "xgboost"}, )
- Parameters
item – artifact key or artifact class ()
body – will use the body as the artifact content
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
viewer – kubeflow viewer type
target_path – absolute target path (instead of using artifact_path + local_path)
src_path – deprecated, use local_path
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
format – optional, format to use (e.g. csv, parquet, ..)
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
- log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=False, db_key=None, target_path='', extra_data=None, label_column: Optional[str] = None, **kwargs)[source]#
log a dataset artifact and optionally upload it to datastore
example:
raw_data = { "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "age": [42, 52, 36, 24, 73], "testScore": [25, 94, 57, 62, 70], } df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"]) context.log_dataset("mydf", df=df, stats=True)
- Parameters
key – artifact key
df – dataframe object
label_column – name of the label column (the one holding the target (y) values)
local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
tag – version tag
format – optional, format to use (e.g. csv, parquet, ..)
target_path – absolute target path (instead of using artifact_path + local_path)
preview – number of lines to store as preview in the artifact metadata
stats – calculate and store dataset stats in the artifact metadata
extra_data – key/value list of extra files/charts to link with this dataset
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
- log_iteration_results(best, summary: list, task: dict, commit=False)[source]#
Reserved for internal use
- property log_level#
get the logging level, e.g. ‘debug’, ‘info’, ‘error’
- log_metric(key: str, value, timestamp=None, labels=None)[source]#
TBD, log a real-time time-series metric
- log_metrics(keyvals: dict, timestamp=None, labels=None)[source]#
TBD, log a set of real-time time-series metrics
- log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column: Optional[Union[str, list]] = None, extra_data=None, db_key=None, **kwargs)[source]#
log a model artifact and optionally upload it to datastore
example:
context.log_model("model", body=dumps(model), model_file="model.pkl", metrics=context.results, training_set=training_df, label_column='label', feature_vector=feature_vector_uri, labels={"app": "fraud"})
- Parameters
key – artifact key or artifact class ()
body – will use the body as the artifact content
model_file – path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir – path to the local dir holding the model file and extra files
artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)
framework – name of the ML framework
algorithm – training algorithm name
tag – version tag
metrics – key/value dict of model metrics
parameters – key/value dict of model parameters
inputs – ordered list of model input features (name, type, ..)
outputs – ordered list of model output/result elements (name, type, ..)
upload – upload to datastore (default is True)
labels – a set of key/value labels to tag the artifact with
feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights – list of feature weights, one per input column
training_set – training set dataframe, used to infer inputs & outputs
label_column – which columns in the training set are the label (target) columns
extra_data – key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
db_key – the key to use in the artifact DB table, by default its run name + ‘_’ + key db_key=False will not register it in the artifacts table
- Returns
artifact object
- log_result(key: str, value, commit=False)[source]#
log a scalar result value
example:
context.log_result('accuracy', 0.85)
- Parameters
key – result key
value – result value
commit – commit (write to DB now vs wait for the end of the run)
- log_results(results: dict, commit=False)[source]#
log a set of scalar result values
example:
context.log_results({'accuracy': 0.85, 'loss': 0.2})
- Parameters
results – key/value dict or results
commit – commit (write to DB now vs wait for the end of the run)
- property logger#
built-in logger interface
example:
context.logger.info("started experiment..", param=5)
- property out_path#
default output path for artifacts
- property parameters#
dictionary of run parameters (read-only)
- property project#
project name, runs can be categorized by projects
- property results#
dictionary of results (read-only)
- set_annotation(key: str, value, replace: bool = True)[source]#
set/record a specific annotation
example:
context.set_annotation("comment", "some text")
- set_label(key: str, value, replace: bool = True)[source]#
set/record a specific label
example:
context.set_label("framework", "sklearn")
- set_state(state: Optional[str] = None, error: Optional[str] = None, commit=True)[source]#
modify and store the run state or mark an error
- Parameters
state – set run state
error – error message (if exist will set the state to error)
commit – will immediately update the state in the DB
- property tag#
run tag (uid or workflow id if exists)
- property uid#
Unique run id
- update_child_iterations(best_run=0, commit_children=False, completed=True)[source]#
update children results in the parent, and optionally mark the best
- Parameters
best_run – marks the child iteration number (starts from 1)
commit_children – commit all child runs to the db
completed – mark children as completed