Test and deploy a model server#

In this section

Testing the model#

MLRun provides a mock server as part of the serving runtime. This gives you the ability to deploy your serving function in your local environment for testing purposes.

serving_fn = code_to_function(name='myService', kind='serving', image='mlrun/mlrun')
serving_fn.add_model('my_model', model_path=model_file_path)
server = serving_fn.to_mock_server()

You can use test data and programmatically invoke the predict() method of mock server. In this example, the model is expecting a python dictionary as input.

my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''
server.test("/v2/models/my_model/infer", body=my_data)

The data structure used in the body parameter depends on how the predict() method of the model server is defined. For examples of how to define your own model server class, see here.

To review the mock server api, see here.

Deploying the model#

Deploying models in MLRun uses a special function type serving. You can create a serving function using the code_to_function() call from a notebook. You can also import an existing serving function/template from the marketplace.

This example converts a notebook to a serving function and adds a model to it:

from mlrun import code_to_function
fn = code_to_function('my-function', kind='serving')
fn.add_model('m1', model_path=<model-artifact/dir>, class_name='MyClass', x=100)

See .add_model() docstring for help and parameters.

See the full Model Server example.

If you want to use multiple versions for the same model, use : to separate the name from the version. For example, if the name is mymodel:v2 it means model name mymodel version v2.

You should specify the model_path (url of the model artifact/dir) and the class_name name (or class module.submodule.class). Alternatively, you can set the model_url for calling a model that is served by another function (can be used for ensembles).

The function object(fn) accepts many options. You can specify replicas range (auto-scaling), cpu/gpu/mem resources, add shared volume mounts, secrets, and any other Kubernetes resource through the fn.spec object or fn methods.

For example, fn.gpu(1) means each replica uses one GPU.

To deploy a model, simply call:

fn.deploy()

You can also deploy a model from within an ML pipeline (check the various demos for details).