Serving pre-trained ML/DL models
Contents
Serving pre-trained ML/DL models#
This notebook demonstrate how to serve standard ML/DL models using MLRun Serving.
Make sure you went over the basics in MLRun Quick Start Tutorial.
MLRun serving can produce managed real-time serverless pipelines from various tasks, including MLRun models or standard model files. The pipelines use the Nuclio real-time serverless engine, which can be deployed anywhere. Nuclio is a high-performance open-source “serverless” framework that’s focused on data, I/O, and compute-intensive workloads.
MLRun serving supports advanced real-time data processing and model serving pipelines.
For more details and examples, see the MLRun serving pipelines documentation.
Tutorial steps:
MLRun installation and configuration#
Before running this notebook make sure the mlrun
package is installed (pip install mlrun
) and that you have configured the access to MLRun service.
# Install MLRun if not installed, run this only once. Restart the notebook after the install!
%pip install mlrun
Get or create a new project
You should create, load or use (get) an MLRun Project. The get_or_create_project()
method tries to load the project from the MLRun DB. If the project does not exist it creates a new one.
import mlrun
project = mlrun.get_or_create_project("tutorial", context="./", user_project=True)
> 2022-06-20 09:07:50,188 [info] loaded project tutorial from MLRun DB
Using pre-built MLRun serving classes and images#
MLRun contains built-in serving functionality for the major ML/DL frameworks (Scikit-Learn, TensorFlow.Keras, ONNX, XGBoost, LightGBM, and PyTorch). In addition, MLRun provides a few container images with the required ML/DL packages pre-installed.
You can overwrite the packages in the images, or provide your own image. (You just need to make sure that the mlrun
package is installed in it.)
The following table specifies, for each framework, the relevant pre-integrated image and the corresponding MLRun ModelServer
serving class:
framework |
image |
serving class |
---|---|---|
SciKit-Learn |
mlrun/mlrun |
mlrun.frameworks.sklearn.SklearnModelServer |
TensorFlow.Keras |
mlrun/ml-models |
mlrun.frameworks.tf_keras.TFKerasModelServer |
ONNX |
mlrun/ml-models |
mlrun.frameworks.onnx.ONNXModelServer |
XGBoost |
mlrun/ml-models |
mlrun.frameworks.xgboost.XGBoostModelServer |
LightGBM |
mlrun/ml-models |
mlrun.frameworks.lgbm.LGBMModelServer |
PyTorch |
mlrun/ml-models |
mlrun.frameworks.pytorch.PyTorchModelServer |
For GPU support use the
mlrun/ml-models-gpu
image (adding GPU drivers and support)
Example using SKlearn and TF Keras models
See how to specify the parameters in the following two examples. These use standard pre-trained models (using the iris dataset) stored in MLRun samples repository. (You can use your own models instead.)
models_dir = mlrun.get_sample_path('models/serving/')
framework = 'sklearn' # change to 'keras' to try the 2nd option
kwargs = {}
if framework == "sklearn":
serving_class = 'mlrun.frameworks.sklearn.SklearnModelServer'
model_path = models_dir + 'sklearn.pkl'
image = 'mlrun/mlrun'
else:
serving_class = 'mlrun.frameworks.tf_keras.TFKerasModelServer'
model_path = models_dir + 'keras.h5'
image = 'mlrun/ml-models' # or mlrun/ml-models-gpu when using GPUs
kwargs['labels'] = {'model-format': 'h5'}
Log the model#
The model and its metadata are first registered in MLRun’s Model Registry. Use the log_model()
method to specify the model files and metadata (metrics, schema, parameters, etc.).
model_object = project.log_model(f'{framework}-model', model_file=model_path, **kwargs)
Create and test the serving function#
Create a new serving
function, specify its name
and the correct image
(with your desired framework).
If you want to add specific packages to the base image, specify the
requirements
attribute, example:serving_fn = mlrun.new_function("serving", image=image, kind="serving", requirements=["tensorflow==2.8.1"])
The following example uses a basic topology of a model router
and adds a single model behind it. (You can add multiple models to the same function.)
serving_fn = mlrun.new_function("serving", image=image, kind="serving", requirements={})
serving_fn.add_model(framework ,model_path=model_object.uri, class_name=serving_class, to_list=True)
# Plot the serving topology input -> router -> model
serving_fn.plot(rankdir="LR")
Simulate the model server locally (using the mock_server)
# Create a mock server that represents the serving pipeline
server = serving_fn.to_mock_server()
Test the mock model server endpoint
List the served models
server.test("/v2/models/", method="GET")
{'models': ['sklearn']}
Infer using test data
sample = {"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}
server.test(path=f'/v2/models/{framework}/infer',body=sample)
{'id': '1da64557daa843c1a2d6719eea7d4361',
'model_name': 'sklearn',
'outputs': [0, 2]}
See more API options and parameters in Model serving API.
Deploy the serving function#
Deploy the serving function and use invoke
to test it with the provided sample
.
project.deploy_function(serving_fn)
> 2022-06-20 09:07:56,977 [info] Starting remote function deploy
2022-06-20 09:07:57 (info) Deploying function
2022-06-20 09:07:57 (info) Building
2022-06-20 09:07:57 (info) Staging files and preparing base images
2022-06-20 09:07:57 (info) Building processor image
2022-06-20 09:08:32 (info) Build complete
2022-06-20 09:08:44 (info) Function deploy complete
> 2022-06-20 09:08:44,641 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-tutorial-yaron-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['tutorial-yaron-serving-tutorial-yaron.default-tenant.app.yh43.iguazio-cd1.com/']}
DeployStatus(state=ready, outputs={'endpoint': 'http://tutorial-yaron-serving-tutorial-yaron.default-tenant.app.yh43.iguazio-cd1.com/', 'name': 'tutorial-yaron-serving'})
serving_fn.invoke(path=f'/v2/models/{framework}/infer',body=sample)
> 2022-06-20 09:08:44,692 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-tutorial-yaron-serving.default-tenant.svc.cluster.local:8080/v2/models/sklearn/infer'}
{'id': 'a16f00e8-663a-4031-a04e-e42a7d4dd697',
'model_name': 'sklearn',
'outputs': [0, 2]}
Build a custom serving class#
Model serving classes implement the full model serving functionality, which include loading models, pre- and post-processing, prediction, explainability, and model monitoring.
Model serving classes must inherit from mlrun.serving.V2ModelServer
, and at the minimum implement the load()
(download the model file(s) and load the model into memory) and predict()
(accept request payload and return prediction/inference results) methods.
For more detailed information on custom serving classes, see Build your own model serving class.
The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:
from cloudpickle import load
import numpy as np
from typing import List
import mlrun
class ClassifierModel(mlrun.serving.V2ModelServer):
def load(self):
"""load and initialize the model and/or other elements"""
model_file, extra_data = self.get_model('.pkl')
self.model = load(open(model_file, 'rb'))
def predict(self, body: dict) -> List:
"""Generate model predictions from sample."""
feats = np.asarray(body['inputs'])
result: np.ndarray = self.model.predict(feats)
return result.tolist()
In order to create a function that incorporates the code of the new class (in serving.py
) use code_to_function
:
serving_fn = mlrun.code_to_function('serving', filename='serving.py', kind='serving',image='mlrun/mlrun')
serving_fn.add_model('my_model',model_path=model_file, class_name='ClassifierModel')
Build an advanced model serving graph#
MLRun graphs enable building and running DAGs (directed acyclic graphs). Graphs are composed of individual steps.
The first graph element accepts an Event
object, transforms/processes the event and passes the result to the next step
in the graph, and so on. The final result can be written out to a destination (file, DB, stream, etc.) or returned back to the caller
(one of the graph steps can be marked with .respond()
).
The serving graphs can be composed of pre-defined graph steps, block-type elements (model servers, routers, ensembles, data readers and writers, data engineering tasks, validators, etc.), custom steps, or from native python classes/functions. A graph can have data processing steps, model ensembles, model servers, post-processing, etc. Graphs can auto-scale and span multiple function containers (connected through streaming protocols).
Done!#
Congratulations! You’ve completed Part 3 of the MLRun getting-started tutorial. Proceed to Part 4: ML Pipeline to learn how to create an automated pipeline for your project.