Quick Start Tutorial

How to easily Train and Deploy Models to Production with MLRun

This notebook provides a quick overview of developing and deploying machine learning applications to production using MLRun MLOps orchestration framework. Watch the video for this tutorial.

Check out MLRun Katacoda Scenarios for interactive tutorials.

Tutorial steps:

Install MLRun package and dependencies:

Before you start, make sure MLRun client package is installed (pip install mlrun) and the environment is set (pointing to a local or Kubernetes based MLRun service).

# verify the sklearn version (restart the notebook after the install), run only once
!pip install scikit-learn~=1.0
import mlrun

# check if we are attached to k8s for running remote (container) jobs
no_k8s = False if mlrun.mlconf.namespace else True

Define MLRun project and ML functions

MLRun Project is a container for all your work on a particular activity or application. Projects host functions, workflow, artifacts, secrets, and more. Projects have access control and can be accessed by one or more users; they are usually associated with a GIT and interact with CI/CD frameworks for automation. See the MLRun Projects documentation.

MLRun Serverless Function specify the source code, base image, extra package requirements, runtime engine kind, and desired resources (cpu, gpu, mem, storage, ..). The runtime engines (local, job, Nuclio, Spark, etc.) automatically transform the function code and spec into fully managed and elastic services that run over Kubernetes. Function source code can come from a single file (.py, .ipynb, etc.) or a full archive (git, zip, tar). MLRun can execute an entire file/notebook or specific function classes/handlers.

Functions in this project:

Registering the function code and basic info in the project:

project = mlrun.new_project("breast-cancer", "./", user_project=True, init_git=True)
project.set_function("gen_breast_cancer.py", "gen-breast-cancer", image="mlrun/mlrun")
project.set_function("trainer.py", "trainer", 
                     handler="train", image="mlrun/mlrun")
project.set_function("serving.py", "serving", image="mlrun/mlrun", kind="serving")
project.save()

The project spec (project.yaml) is saved to the project root dir for use by CI/CD and automation frameworks.

Run data processing function and log artifacts

Functions are executed (using the CLI or SDK run command) with an optional handler, various params, inputs and resource requirements. This generates a run object that can be tracked through the CLI, UI, and SDK. Multiple functions can be executed and tracked as part of a multi-stage pipeline (workflow).

When a function has additional package requirements or need to include the content of a source archive, you must first build the function using the project.build_function() method.

The local flag indicates if the function is executed locally or “teleported” and executed in the Kubernetes cluster. The execution progress and results can be viewed in the UI (see hyperlinks below).


Run using the SDK:

gen_data_run = project.run_function("gen-breast-cancer", params={"format": "csv"}, local=True)
> 2022-05-08 11:11:14,002 [info] starting run gen-breast-cancer uid=9609cb9609734b308c94cd0faf8c915a DB=http://mlrun-api:8080
> 2022-05-08 11:11:15,540 [info] handler was not provided running main (./gen_breast_cancer.py)
> 2022-05-08 11:11:31,359 [info] logging run results to: http://mlrun-api:8080
> 2022-05-08 11:11:31,460 [info] saving breast cancer dataframe
project uid iter start state name labels inputs parameters results artifacts
breast-cancer-admin 0 May 08 11:11:31 completed gen-breast-cancer
v3io_user=admin
kind=
owner=admin
host=jupyter-b7945bb6c-4hvj4
format=csv
label_column=label
dataset

> to track results use the .show() or .logs() methods or click here to open in UI
> 2022-05-08 11:11:32,511 [info] run executed, status=completed

Run using the CLI (command line):

The functions can also be invoked using the following CLI command (see help with: mlrun run --help):

mlrun run -f gen-breast-cancer --local

Print the run state and outputs:

gen_data_run.state()
'completed'
gen_data_run.outputs
{'label_column': 'label',
 'dataset': 'store://artifacts/breast-cancer-admin/gen-breast-cancer_dataset:d58cbce3b81c4b7c9d557a6f2bc858c8'}

Print the output dataset artifact (DataItem object) as dataframe

gen_data_run.artifact("dataset").as_df().head()
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension label
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902 0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758 0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300 0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678 0

5 rows × 31 columns

Use MLRun built-in marketplace functions (data analysis)

You can import an ML function from the mlrun public marketplace or private repositories and use them in your project. Let’s import and use a data analysis function:

# import the function
describe = mlrun.import_function('hub://describe')

See the describe function usage instructions in the marketplace or by typing describe.doc()


Analyze the dataset using the describe function (run on the Kubernetes cluster):

describe_run = describe.run(params={'label_column': 'label'},
                            inputs={"table": gen_data_run.outputs['dataset']}, local=no_k8s)
> 2022-05-08 10:34:00,643 [info] starting run describe-analyze uid=0bd1b5e6151a49dbbeec91993b725a2c DB=http://mlrun-api:8080
> 2022-05-08 10:34:00,910 [info] Job is running in the background, pod: describe-analyze-qzqpx
> 2022-05-08 10:36:41,374 [info] The data set named dataset is updated
> 2022-05-08 10:36:41,466 [info] run executed, status=completed
final state: completed
project uid iter start state name labels inputs parameters results artifacts
breast-cancer-admin 0 May 08 10:34:05 completed describe-analyze
v3io_user=admin
kind=job
owner=admin
mlrun/client_version=1.0.0
host=describe-analyze-qzqpx
table
label_column=label
describe-csv
histograms-matrix
histograms
violin
imbalance
imbalance-weights-vec
correlation-matrix-csv
correlation

> to track results use the .show() or .logs() methods or click here to open in UI
> 2022-05-08 10:36:44,391 [info] run executed, status=completed

View the results in MLRun UI:

describe

# view generated artifacts (charts)
describe_run.outputs
{'describe-csv': 'v3io:///projects/breast-cancer-admin/artifacts/plots/describe.csv',
 'histograms-matrix': 'v3io:///projects/breast-cancer-admin/artifacts/plots/hist_mat.html',
 'histograms': 'v3io:///projects/breast-cancer-admin/artifacts/plots/histograms.html',
 'violin': 'v3io:///projects/breast-cancer-admin/artifacts/plots/violin.html',
 'imbalance': 'v3io:///projects/breast-cancer-admin/artifacts/plots/imbalance.html',
 'imbalance-weights-vec': 'v3io:///projects/breast-cancer-admin/artifacts/plots/imbalance-weights-vec.csv',
 'correlation-matrix-csv': 'v3io:///projects/breast-cancer-admin/artifacts/plots/correlation-matrix.csv',
 'correlation': 'v3io:///projects/breast-cancer-admin/artifacts/plots/correlation.html'}
# view an artifact in Jupyter
describe_run.artifact("histograms").show()