Part 3: Serving an ML Model

This part of the MLRun getting-started tutorial walks you through the steps for implementing ML model serving using MLRun serving and Nuclio runtimes. The tutorial walks you through the steps for creating, deploying, and testing a model-serving function (“a serving function” a.k.a. “a model server”).

MLRun serving can produce managed real-time serverless pipelines from various tasks, including MLRun models or standard model files. The pipelines use the Nuclio real-time serverless engine, which can be deployed anywhere. Nuclio is a high-performance open-source “serverless” framework that’s focused on data, I/O, and compute-intensive workloads.

Simple model serving classes can be written in Python or be taken from a set of pre-developed ML/DL classes. The code can handle complex data, feature preparation, and binary data (such as images and video files). The Nuclio serving engine supports the full model-serving life cycle; this includes auto generation of microservices, APIs, load balancing, model logging and monitoring, and configuration management.

MLRun serving supports more advanced real-time data processing and model serving pipelines. For more details and examples, see the MLRun Serving Graphs documentation.

The tutorial consists of the following steps:

  1. Setup and Configuration — load your project

  2. Writing A Simple Serving Class

  3. Deploying the Model-Serving Function (Service)

  4. Using the Live Model-Serving Function

  5. Viewing the Nuclio Serving Function on the Dashboard

By the end of this tutorial you’ll learn how to

  • Create model-serving functions.

  • Deploy models at scale.

  • Test your deployed models.


The following steps are a continuation of the previous parts of this getting-started tutorial and rely on the generated outputs. Therefore, make sure to first run parts 1—2 of the tutorial.

Step 1: Setup and Configuration

Importing Libraries

Run the following code to import required libraries:

from os import path
import mlrun

Initializing Your MLRun Environment

Use the set_environment MLRun method to configure the working environment and default configuration. Set the project and user_project parameters to the same values that you used in the call to this method in the Part 1: MLRun Basics tutorial notebook.

# Set the base project name
project_name_base = 'getting-started-tutorial'
# Initialize the MLRun environment and save the project name and artifacts path
project_name, artifact_path = mlrun.set_environment(project=project_name_base,

Step 2: Writing A Simple Serving Class

The serving class is initialized automatically by the model server. All you need is to implement two mandatory methods:

  • load — downloads the model files and loads the model into memory. This can be done either synchronously or asynchronously.

  • predict — accepts a request payload and returns prediction (inference) results.

For more detailed information on serving classes, see the MLRun documentation.

The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:

# nuclio: start-code
from cloudpickle import load
import numpy as np
from typing import List
import mlrun

class ClassifierModel(mlrun.serving.V2ModelServer):
    def load(self):
        """load and initialize the model and/or other elements"""
        model_file, extra_data = self.get_model('.pkl')
        self.model = load(open(model_file, 'rb'))

    def predict(self, body: dict) -> List:
        """Generate model predictions from sample."""
        feats = np.asarray(body['inputs'])
        result: np.ndarray = self.model.predict(feats)
        return result.tolist()
# nuclio: end-code

Step 3: Deploying the Model-Serving Function (Service)

To provision (deploy) a function for serving the model (“a serving function”) you need to create an MLRun function of type serving. You can do this by using the code_to_function MLRun method from a web notebook, or by importing an existing serving function or template from the MLRun functions marketplace.

Converting a Serving Class to a Serving Function

The following code converts the ClassifierModel class that you defined in the previous step to a serving function. The name of the class to be used by the serving function is set in spec.default_class.

from mlrun import code_to_function
serving_fn = code_to_function('serving', kind='serving',image='mlrun/mlrun')
serving_fn.spec.default_class = 'ClassifierModel'

Add the model created in previous notebook by the training function

model_file = f'store://{project_name}/train-iris-train_iris_model'
<mlrun.serving.states.TaskState at 0x7f3b10222f10>
from mlrun.platforms import auto_mount
serving_fn = serving_fn.apply(auto_mount())

Testing Your Function Locally

To test your function locally, create a test server (mock server) and test it with sample data.

my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''
server = serving_fn.to_mock_server()
server.test("/v2/models/my_model/infer", body=my_data)

Building and Deploying the Serving Function

Use the deploy method of the MLRun serving function to build and deploy a Nuclio serving function from your serving-function code.

function_address = serving_fn.deploy()
> 2021-01-25 08:40:23,461 [info] Starting remote function deploy
2021-01-25 08:40:23  (info) Deploying function
2021-01-25 08:40:23  (info) Building
2021-01-25 08:40:23  (info) Staging files and preparing base images
2021-01-25 08:40:23  (info) Building processor image
2021-01-25 08:40:24  (info) Build complete
2021-01-25 08:40:30  (info) Function deploy complete
> 2021-01-25 08:40:31,117 [info] function deployed,

Step 4: Using the Live Model-Serving Function

After the function is deployed successfully, the serving function has a new HTTP endpoint for handling serving requests. The example tutorial serving function receives HTTP prediction (inference) requests on this endpoint; calls the infer method to get the requested predictions; and returns the results on the same endpoint.

print (f'The address for the function is {function_address} \n')

!curl $function_address
The address for the function is 

{"name": "ModelRouter", "version": "v2", "extensions": []}

Testing the Model Server

Test your model server by sending data for inference. The invoke serving-function method enables programmatic testing of the serving function. For model inference (predictions), specify the model name followed by infer:


For complete model-service API commands — such as for list models (models), get model health (ready), and model explanation (explain) — see the MLRun documentation.

serving_fn.invoke('/v2/models/my_model/infer', my_data)
{'id': 'a021aaa7-a335-421e-8158-194d5db8a140',
 'model_name': 'my_model',
 'outputs': [0, 2]}

Step 5: Viewing the Nuclio Serving Function on the Dashboard

On the Projects dashboard page, select the project and then select “Real-time functions (Nuclio)”.



Congratulation! You’ve completed Part 3 of the MLRun getting-started tutorial. Proceed to Part 4 to learn how to create an automated pipeline for your project.