Decorators and auto-logging#
While it is possible to log results and artifacts using the MLRun execution context, it is often more convenient to use the mlrun.handler()
decorator.
Basic example#
Assume you have the following code in train.py
import pandas as pd
from sklearn.svm import SVC
def train_and_predict(train_data, predict_input, label_column="label"):
x = train_data.drop(label_column, axis=1)
y = train_data[label_column]
clf = SVC()
clf.fit(x, y)
return list(clf.predict(predict_input))
With the mlrun.handler
the python function itself would not change, and logging of the inputs and outputs would be automatic. The resultant code is as follows:
import pandas as pd
from sklearn.svm import SVC
import mlrun
@mlrun.handler(
labels={"framework": "scikit-learn"},
outputs=["prediction:dataset"],
inputs={"train_data": pd.DataFrame, "predict_input": pd.DataFrame},
)
def train_and_predict(train_data, predict_input, label_column="label"):
x = train_data.drop(label_column, axis=1)
y = train_data[label_column]
clf = SVC()
clf.fit(x, y)
return list(clf.predict(predict_input))
To run the code, use the following example:
import mlrun
project = mlrun.get_or_create_project("mlrun-example", context="./", user_project=True)
trainer = project.set_function(
"train.py",
name="train_and_predict",
kind="job",
image="mlrun/mlrun",
handler="train_and_predict",
)
trainer_run = project.run_function(
"train_and_predict",
inputs={
"train_data": mlrun.get_sample_path("data/iris/iris_dataset.csv"),
"predict_input": mlrun.get_sample_path("data/iris/iris_to_predict.csv"),
},
)
The outcome is a run with:
A label with key "framework" and value "scikit-learn".
Two inputs "train_data" and "predict_input" created from Pandas DataFrame.
An artifact called "prediction" of type "dataset". The contents of the dataset will be the return value (in this case the prediction result).
Labels#
The decorator gives you the option to set labels for the run. The labels
parameter is a dictionary with keys and values to set for the labels.
Input type parsing#
The mlrun.handler
decorator can also parse the input types, if they are specified. An equivalent definition is as follows:
@mlrun.handler(labels={"framework": "scikit-learn"}, outputs=["prediction:dataset"])
def train_and_predict(
train_data: pd.DataFrame, predict_input: pd.DataFrame, label_column="label"
): ...
Notice: Type hints from the typing
module (e.g. typing.Optional
, typing.Union
, typing.List
etc.) are
currently not supported but will be in the future.
Note: If the inputs does not have a type input, the decorator assumes the parameter type in
mlrun.datastore.DataItem
. If you specifyinputs=False
, all the run inputs are assumed to be of typemlrun.datastore.DataItem
. You also have the option to specify a dictionary where each key is the name of the input and the value is the type.
Logging return values as artifacts#
If you specify the outputs
parameter, the return values will be logged as the run artifacts. outputs
expects a list; the length of the list must match the number of returned values.
The simplest option is to specify a list of strings. Each string contains the name of the artifact. You can also specify the artifact type by adding a colon after the artifact name followed by the type ('name:artifact_type'
). The following are valid artifact types:
dataset
directory
file
object
plot
result
If you use only the name without the type, the following mapping is used:
Python type |
Artifact type |
---|---|
pandas.DataFrame |
Dataset |
pandas.Series |
Dataset |
numpy.ndarray |
Dataset |
dict |
Result |
list |
Result |
tuple |
Result |
str |
Result |
int |
Result |
float |
Result |
bytes |
Object |
bytearray |
Object |
matplotlib.pyplot.Figure |
Plot |
plotly.graph_objs.Figure |
Plot |
Refer to the mlrun.handler()
for more details.