Quick-Start Guide

MLRun is an end-to-end open-source MLOps solution to manage and automate your entire analytics and machine learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun is running as a built-in service in Iguazio and is integrated well with other services in the platform. Its primary goal is to ease the development of machine learning pipeline at scale and help organizations build a robust process for moving from the research phase to fully operational production deployments.

Table of Contents

Working with MLRun

If you need to install MLRun, refer to the Installation Guide.

Note: If you are using the Iguazio Data Science Platform, MLRun already comes preinstalled and integrated in your system.

If you are not viewing this quick-start guide from a Jupyter Lab instance, open it on your cluster, create a new notebook, and copy the sections below to the notebook to run them.

Set Environment

Before you begin, initialize MLRun by calling set_environment and provide it the project name

from mlrun import set_environment

project_name = 'quick-start'
_, artifact_path = set_environment(project=project_name, user_project=True)

Train a Model

MLRun introduces the concept of functions. You can run your own code as functions, or use functions from the function marketplace. In the example below, we’ll use the sklearn_classifier from the function marketplace to train a model.

from mlrun import import_function
from mlrun.platforms import auto_mount

train = import_function('hub://sklearn_classifier').apply(auto_mount())

train_run = train.run(name='train',
                      inputs={'dataset': 'https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv'},
                      params={'model_pkg_class': 'sklearn.linear_model.LogisticRegression',
                              'label_column': 'label'})
> 2021-05-27 10:40:34,250 [info] starting run train uid=6995ba20ebe04b03b88120af3673a22c DB=http://mlrun-api:8080
> 2021-05-27 10:40:34,468 [info] Job is running in the background, pod: train-xhmp2
> 2021-05-27 10:42:16,454 [info] run executed, status=completed
final state: completed
project uid iter start state name labels inputs parameters results artifacts
quick-start-test 0 May 27 10:42:15 completed train
to track results use .show() or .logs() or in CLI: 
!mlrun get run 6995ba20ebe04b03b88120af3673a22c --project quick-start-test , !mlrun logs 6995ba20ebe04b03b88120af3673a22c --project quick-start-test
> 2021-05-27 10:42:24,081 [info] run executed, status=completed

The run output above contains a link to the MLRun UI. Click it to inspect the various aspects of the jobs you run:


As well as their artifacts:


When running the function in a Jupyter notebook, the output cell for your function execution will contain a table with run information — including the state of the execution, all inputs and parameters, and the execution results and artifacts.

MLRun quick start train output

Test the Model

Now that you have a trained model, you can test it: run a task that uses the test_classifier function from the function marketplace to run the selected trained model against the test dataset. The test dataset was returned from the training task (train_run) in the previous step.

test = import_function('hub://test_classifier').apply(auto_mount())

You can then run the function as part of your project, just as any other function that you have written yourself. To view the function documentation, call the doc method:

function: test-classifier
test a classifier using held-out or new data
default handler: test_classifier
entry points:
  test_classifier: Test one or more classifier models against held-out dataset

Using held-out test features, evaluates the peformance of the estimated model

Can be part of a kubeflow pipeline as a test step that is run post EDA and 
training/validation cycles
    context  - the function context, default=
    models_path(DataItem)  - artifact models representing a file or a folder, default=
    test_set(DataItem)  - test features and labels, default=
    label_column(str)  - column name for ground truth labels, default=
    score_method(str)  - for multiclass classification, default=micro
    plots_dest(str)  - dir for test plots, default=
    model_evaluator  - NOT IMPLEMENTED: specific method to generate eval, passed in as string or available in this folder, default=None
    default_model(str)  - , default=model.pkl
    predictions_column(str)  - column name for the predictions column on the resulted artifact, default=yscore
    model_update  - (True) update model, when running as stand alone no need in update, default=True

Configure parameters for the test function (params), and provide the selected trained model from the train task as an input artifact (inputs)

test_run = test.run(name="test",
                    params={"label_column": "label"},
                    inputs={"models_path": train_run.outputs['model'],
                            "test_set": train_run.outputs['test_set']})
> 2021-05-27 10:42:24,252 [info] starting run test uid=e4839cc44e14440b8f6e1eaac9ba3cef DB=http://mlrun-api:8080
> 2021-05-27 10:42:24,473 [info] Job is running in the background, pod: test-r8zp7
> 2021-05-27 10:42:31,010 [info] run executed, status=completed
final state: completed
project uid iter start state name labels inputs parameters results artifacts
quick-start-test 0 May 27 10:42:30 completed test
to track results use .show() or .logs() or in CLI: 
!mlrun get run e4839cc44e14440b8f6e1eaac9ba3cef --project quick-start-test , !mlrun logs e4839cc44e14440b8f6e1eaac9ba3cef --project quick-start-test
> 2021-05-27 10:42:33,670 [info] run executed, status=completed

Serve the Model

MLRun serving can take MLRun models or standard model files and produce managed, real-time, serverless functions using the Nuclio real-time serverless framework. Nuclio is built around data, I/O, and compute intensive workloads and is focused on performance and flexibility. Nuclio is also deeply integrated into the MLRun framework. See MLRun Serving documentation to learn more about the rich serving capabilities MLRun has to offer.

To deploy your model using the v2_model_server function, run the following code:

serve = import_function('hub://v2_model_server').apply(auto_mount())
serve.add_model(model_name, model_path=train_run.outputs['model'])
addr = serve.deploy()
> 2021-05-27 10:42:33,823 [info] Starting remote function deploy
2021-05-27 10:42:33  (info) Deploying function
2021-05-27 10:42:33  (info) Building
2021-05-27 10:42:33  (info) Staging files and preparing base images
2021-05-27 10:42:33  (info) Building processor image
2021-05-27 10:42:35  (info) Build complete
> 2021-05-27 10:42:43,753 [info] function deployed, address=default-tenant.app.jfiehlfgdotc.iguazio-cd2.com:32645

The invoke method enables to programmatically test the function.

import json

inputs = [[5.1, 3.5, 1.4, 0.2],
          [7.7, 3.8, 6.7, 2.2]]
my_data = json.dumps({'inputs': inputs})
serve.invoke(f'v2/models/{model_name}/infer', my_data)
{'id': '7ebdf355-72f5-4646-9652-f1cbd70d0001',
 'model_name': 'iris',
 'outputs': [0, 2]}

Open the Nuclio UI to view the function and test it.

Nuclio Functions UI

For a more detailed walk-through, refer to the getting-started tutorial.