Serving ML/DL models#

This notebook demonstrate how to serve standard ML/DL models using MLRun Serving.

Make sure you went over the basics in MLRun Quick Start Tutorial.

MLRun serving can produce managed real-time serverless pipelines from various tasks, including MLRun models or standard model files. The pipelines use the Nuclio real-time serverless engine, which can be deployed anywhere. Nuclio is a high-performance open-source “serverless” framework that’s focused on data, I/O, and compute-intensive workloads.

MLRun serving supports advanced real-time data processing and model serving pipelines.
For more details and examples, see the MLRun serving pipelines documentation.

Tutorial steps:

MLRun installation and configuration#

Before running this notebook make sure the mlrun package is installed (pip install mlrun) and that you have configured the access to MLRun service.

# install MLRun if not installed, run this only once (restart the notebook after the install !!!)
%pip install mlrun

Get or create a new project:

You should create, load or use (get) an MLRun Project. The get_or_create_project() method tries to load the project from the MLRun DB. If the project does not exist it creates a new one.

import mlrun
project = mlrun.get_or_create_project("tutorial", context="./", user_project=True)
> 2022-06-20 09:07:50,188 [info] loaded project tutorial from MLRun DB

Using pre-built MLRun serving classes and images#

MLRun contains built-in serving functionality for the major ML/DL frameworks (Scikit-Learn, TensorFlow.Keras, ONNX, XGBoost, LightGBM and PyTorch). In addition MLRun provide a few container images with the required ML/DL packages pre-installed.

You can overwrite the packages in the images, or provide your own image (just need to make sure that the mlrun package is installed in it).

The following table specifies, for each framework, the relevant pre-integrated image and the corresponding MLRun ModelServer serving class:

framework

image

serving class

SciKit-Learn

mlrun/mlrun

mlrun.frameworks.sklearn.SklearnModelServer

TensorFlow.Keras

mlrun/ml-models

mlrun.frameworks.tf_keras.TFKerasModelServer

ONNX

mlrun/ml-models

mlrun.frameworks.onnx.ONNXModelServer

XGBoost

mlrun/ml-models

mlrun.frameworks.xgboost.XGBoostModelServer

LightGBM

mlrun/ml-models

mlrun.frameworks.lgbm.LGBMModelServer

PyTorch

mlrun/ml-models

mlrun.frameworks.pytorch.PyTorchModelServer

For GPU support use the mlrun/ml-models-gpu image (adding GPU drivers and support)

Example using SKlearn and TF Keras models

See how to specify the parameters in the following two examples. These use standard pre-trained models (using the iris dataset) stored in MLRun samples repository. (You can use your own models instead.)

models_dir = mlrun.get_sample_path('models/serving/')

framework = 'sklearn'  # change to 'keras' to try the 2nd option 
kwargs = {}
if framework == "sklearn":
    serving_class = 'mlrun.frameworks.sklearn.SklearnModelServer'
    model_path = models_dir + 'sklearn.pkl'
    image = 'mlrun/mlrun'
else:
    serving_class = 'mlrun.frameworks.tf_keras.TFKerasModelServer'
    model_path = models_dir + 'keras.h5'
    image = 'mlrun/ml-models'  # or mlrun/ml-models-gpu when using GPUs
    kwargs['labels'] = {'model-format': 'h5'}

Logging the model#

The model and its metadata are first registered in MLRun’s Model Registry. Use the log_model() method to specify the model files and metadata (metrics, schema, parameters, etc.).

model_object = project.log_model(f'{framework}-model', model_file=model_path, **kwargs)

Create and test the serving function#

Create a new serving function, specify its name and the correct image (with your desired framework).

If you want to add specific packages to the base image, specify the requirements attribute, example:

serving_fn = mlrun.new_function("serving", image=image, kind="serving", requirements=["tensorflow==2.8.1"])

The following example uses a basic topology of a model router and adds a single model behind it (you can add multiple models to the same function)

serving_fn = mlrun.new_function("serving", image=image, kind="serving", requirements={})
serving_fn.add_model(framework ,model_path=model_object.uri, class_name=serving_class, to_list=True)

# Plot the serving topology input -> router -> model
serving_fn.plot(rankdir="LR")
../_images/3e3735d22674e927f6bcf85b90e03861b3cfe3854c14cd2704b404d2387d4b5a.svg

Simulating the model server locally (using the mock_server):

# create a mock server that represents the serving pipeline
server = serving_fn.to_mock_server()

Test the mock model server endpoint:

  • List the served models

server.test("/v2/models/", method="GET")
{'models': ['sklearn']}
  • Infer using test data

sample = {"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}
server.test(path=f'/v2/models/{framework}/infer',body=sample)
{'id': '1da64557daa843c1a2d6719eea7d4361',
 'model_name': 'sklearn',
 'outputs': [0, 2]}

See more API options and parameters in the Model serving API documentation.

Deploy the serving function#

Deploy the serving function and use invoke to test it with the provided sample.

serving_fn.with_code(body=" ") # adds the serving wrapper, not required with MLRun >= 1.0.3
project.deploy_function(serving_fn)
> 2022-06-20 09:07:56,977 [info] Starting remote function deploy
2022-06-20 09:07:57  (info) Deploying function
2022-06-20 09:07:57  (info) Building
2022-06-20 09:07:57  (info) Staging files and preparing base images
2022-06-20 09:07:57  (info) Building processor image
2022-06-20 09:08:32  (info) Build complete
2022-06-20 09:08:44  (info) Function deploy complete
> 2022-06-20 09:08:44,641 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-tutorial-yaron-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['tutorial-yaron-serving-tutorial-yaron.default-tenant.app.yh43.iguazio-cd1.com/']}
DeployStatus(state=ready, outputs={'endpoint': 'http://tutorial-yaron-serving-tutorial-yaron.default-tenant.app.yh43.iguazio-cd1.com/', 'name': 'tutorial-yaron-serving'})
serving_fn.invoke(path=f'/v2/models/{framework}/infer',body=sample)
> 2022-06-20 09:08:44,692 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-tutorial-yaron-serving.default-tenant.svc.cluster.local:8080/v2/models/sklearn/infer'}
{'id': 'a16f00e8-663a-4031-a04e-e42a7d4dd697',
 'model_name': 'sklearn',
 'outputs': [0, 2]}

Building a custom serving class#

Model serving classes implement the full model serving functionality, which include loading models, pre- and post-processing, prediction, explainability, and model monitoring.

Model serving classes must inherit from mlrun.serving.V2ModelServer, and at the minimum implement the load() (download the model file(s) and load the model into memory) and predict() (accept request payload and return prediction/inference results) methods.

For more detailed information on custom serving classes, see Creating a custom model serving class.

The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:

from cloudpickle import load
import numpy as np
from typing import List
import mlrun

class ClassifierModel(mlrun.serving.V2ModelServer):
    def load(self):
        """load and initialize the model and/or other elements"""
        model_file, extra_data = self.get_model('.pkl')
        self.model = load(open(model_file, 'rb'))

    def predict(self, body: dict) -> List:
        """Generate model predictions from sample."""
        feats = np.asarray(body['inputs'])
        result: np.ndarray = self.model.predict(feats)
        return result.tolist()

In order to create a function that incorporates the code of the new class (in serving.py ) use code_to_function:

serving_fn = mlrun.code_to_function('serving', filename='serving.py', kind='serving',image='mlrun/mlrun')
serving_fn.add_model('my_model',model_path=model_file, class_name='ClassifierModel')

Building an advanced model serving graph#

MLRun graphs enable building and running DAGs (directed acyclic graph). Graphs are composed of individual steps. The first graph element accepts an Event object, transforms/processes the event and passes the result to the next step in the graph, and so on. The final result can be written out to a destination (file, DB, stream, etc.) or returned back to the caller (one of the graph steps can be marked with .respond()).

The serving graphs can be composed of pre-defined graph steps, block-type elements (model servers, routers, ensembles, data readers and writers, data engineering tasks, validators, etc.), custom steps, or from native python classes/functions. A graph can have data processing steps, model ensembles, model servers, post-processing, etc. Graphs can auto-scale and span multiple function containers (connected through streaming protocols).

See the Advanced Model Serving Graph Notebook Example.

Done!#

Congratulations! You’ve completed Part 3 of the MLRun getting-started tutorial. Proceed to Part 4: ML Pipeline to learn how to create an automated pipeline for your project.