Part 3: Serving an ML Model

This part of the MLRun getting-started tutorial walks you through the steps for implementing ML model serving using MLRun serving and Nuclio runtimes. The tutorial walks you through the steps for creating, deploying, and testing a model-serving function (“a serving function” a.k.a. “a model server”).

MLRun serving can produce managed real-time serverless pipelines from various tasks, including MLRun models or standard model files. The pipelines use the Nuclio real-time serverless engine, which can be deployed anywhere. Nuclio is a high-performance open-source “serverless” framework that’s focused on data, I/O, and compute-intensive workloads.

Simple model serving classes can be written in Python or be taken from a set of pre-developed ML/DL classes. The code can handle complex data, feature preparation, and binary data (such as images and video files). The Nuclio serving engine supports the full model-serving life cycle; this includes auto generation of microservices, APIs, load balancing, model logging and monitoring, and configuration management.

MLRun serving supports more advanced real-time data processing and model serving pipelines. For more details and examples, see the MLRun Serving Graphs documentation.

The tutorial consists of the following steps:

  1. Setup and Configuration — load your project

  2. Writing A Simple Serving Class

  3. Deploying the Model-Serving Function (Service)

  4. Using the Live Model-Serving Function

  5. Viewing the Nuclio Serving Function on the Dashboard

By the end of this tutorial you’ll learn how to

  • Create model-serving functions.

  • Deploy models at scale.

  • Test your deployed models.


The following steps are a continuation of the previous parts of this getting-started tutorial and rely on the generated outputs. Therefore, make sure to first run parts 1—2 of the tutorial.

Step 1: Setup and Configuration

Importing Libraries

Run the following code to import required libraries:

import mlrun

Initializing Your MLRun Environment

Use the get_or_create_project MLRun method to create a new project or fetch it from the DB/repository if it already exists. Set the project and user_project parameters to the same values that you used in the call to this method in the Part 1: MLRun Basics tutorial notebook.

# Set the base project name
project_name_base = 'getting-started'

# Initialize the MLRun project object
project = mlrun.get_or_create_project(project_name_base, context="./", user_project=True)
> 2022-02-08 19:57:17,874 [info] loaded project getting-started from MLRun DB

Step 2: Writing A Simple Serving Class

The serving class is initialized automatically by the model server. All you need is to implement two mandatory methods:

  • load — downloads the model files and loads the model into memory. This can be done either synchronously or asynchronously.

  • predict — accepts a request payload and returns prediction (inference) results.

For more detailed information on serving classes, see the MLRun documentation.

The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:

# mlrun: start-code
from cloudpickle import load
import numpy as np
from typing import List
import mlrun

class ClassifierModel(mlrun.serving.V2ModelServer):
    def load(self):
        """load and initialize the model and/or other elements"""
        model_file, extra_data = self.get_model('.pkl')
        self.model = load(open(model_file, 'rb'))

    def predict(self, body: dict) -> List:
        """Generate model predictions from sample."""
        feats = np.asarray(body['inputs'])
        result: np.ndarray = self.model.predict(feats)
        return result.tolist()
# mlrun: end-code

Step 3: Deploying the Model-Serving Function (Service)

To provision (deploy) a function for serving the model (“a serving function”) you need to create an MLRun function of type serving. You can do this by using the code_to_function MLRun method from a web notebook, or by importing an existing serving function or template from the MLRun functions marketplace.

Converting a Serving Class to a Serving Function

The following code converts the ClassifierModel class that you defined in the previous step to a serving function. The name of the class to be used by the serving function is set in spec.default_class.

serving_fn = mlrun.code_to_function('serving', kind='serving',image='mlrun/mlrun')
serving_fn.spec.default_class = 'ClassifierModel'

Add the model created in previous notebook by the training function

model_file = project.get_artifact_uri('my_model') 
<mlrun.serving.states.TaskStep at 0x7fec77d70390>

Testing Your Function Locally

To test your function locally, create a test server (mock server) and test it with sample data.

my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''
server = serving_fn.to_mock_server()
server.test("/v2/models/my_model/infer", body=my_data)
> 2022-02-08 19:58:44,716 [info] model my_model was loaded
> 2022-02-08 19:58:44,716 [info] Loaded ['my_model']
{'id': '97ed827394c24011bc2d95a001f7c372',
 'model_name': 'my_model',
 'outputs': [0, 2]}

Building and Deploying the Serving Function

Use the deploy method of the MLRun serving function to build and deploy a Nuclio serving function from your serving-function code.

function_address = serving_fn.deploy()
> 2022-02-08 19:58:50,645 [info] Starting remote function deploy
2022-02-08 19:58:51  (info) Deploying function
2022-02-08 19:58:51  (info) Building
2022-02-08 19:58:52  (info) Staging files and preparing base images
2022-02-08 19:58:52  (info) Building processor image
2022-02-08 19:59:47  (info) Build complete
> 2022-02-08 19:59:52,828 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-getting-started-admin-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['']}

Step 4: Using the Live Model-Serving Function

After the function is deployed successfully, the serving function has a new HTTP endpoint for handling serving requests. The example tutorial serving function receives HTTP prediction (inference) requests on this endpoint; calls the infer method to get the requested predictions; and returns the results on the same endpoint.

print (f'The address for the function is {function_address} \n')

!curl $function_address
The address for the function is 

{"name": "ModelRouter", "version": "v2", "extensions": []}

Testing the Model Server

Test your model server by sending data for inference. The invoke serving-function method enables programmatic testing of the serving function. For model inference (predictions), specify the model name followed by infer:


For complete model-service API commands — such as for list models (models), get model health (ready), and model explanation (explain) — see the MLRun documentation.

serving_fn.invoke('/v2/models/my_model/infer', my_data)
> 2022-02-08 19:59:53,584 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-getting-started-admin-serving.default-tenant.svc.cluster.local:8080/v2/models/my_model/infer'}
{'id': '3fe451d7-20d8-4813-a7d4-292ceeaca98c',
 'model_name': 'my_model',
 'outputs': [0, 2]}

Step 5: Viewing the Nuclio Serving Function on the Dashboard

On the Projects dashboard page, select the project and then select “Real-time functions (Nuclio)”.



Congratulation! You’ve completed Part 3 of the MLRun getting-started tutorial. Proceed to Part 4: ML Pipeline to learn how to create an automated pipeline for your project.