Model monitoring tutorial#

This tutorial illustrates the basic model monitoring capabilities of MLRun: deploying a model to a live endpoint and calculating data drift.

In this tutorial:

See also


Enable model monitoring on the project#

Enable model monitoring for a project with enable_model_monitoring(). The controller runs, by default, every 10 minutes, which is also the minimum interval. You can modify the frequency with the parameter base_period. To change the base_period, call update_model_monitoring_controller.

import mlrun
import os
import uuid

project_name = "tutorial"
project = mlrun.get_or_create_project(project_name, "./")
> 2024-10-20 08:49:27,495 [info] Project loaded successfully: {"project_name":"tutorial"}
project.set_model_monitoring_credentials(None, "v3io", "v3io")
> 2024-10-20 08:49:27,599 [warning] enable_model_monitoring: 'base_period' < 10 minutes is not supported in production environments: {"project":"tutorial"}

Log the model with training data#

See the parameter descriptions in log_model(). Download the pickle file used in this example.

# Download the training set
import pandas as pd

train_set = pd.read_csv(
model_name = "RandomForestClassifier"
<mlrun.artifacts.model.ModelArtifact at 0x7f83dbbdbbe0>

Import, enable monitoring, and deploy the serving function#

Use the v2_model_server serving function from the MLRun function hub.

Add the model to the serving function's routing spec (add_model()), enable monitoring on the serving function (set_tracking()), and then deploy the function (deploy_function()).

The result of this step is that the model-monitoring stream pod writes data to Parquet, by model endpoint. Every base period, the controller checks for new data and if it finds, sends it to the relevant app.

# Import the serving function
serving_fn = mlrun.import_function(
    "hub://v2_model_server", project=project_name, new_name="serving"

    model_name, model_path=f"store://models/tutorial/RandomForestClassifier:latest"

# enable monitoring on this serving function
serving_fn.set_tracking() = ["scikit-learn~=1.5.1"]

# Deploy the serving function
> 2024-10-20 08:49:30,631 [info] Starting remote function deploy
2024-10-20 08:49:30  (info) Deploying function
2024-10-20 08:49:30  (info) Building
2024-10-20 08:49:31  (info) Staging files and preparing base images
2024-10-20 08:49:31  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:49:31  (info) Building processor image
2024-10-20 08:50:36  (info) Build complete
2024-10-20 08:50:44  (info) Function deploy complete
> 2024-10-20 08:50:51,999 [info] Successfully deployed function: {"external_invocation_urls":[""],"internal_invocation_urls":["nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': '', 'name': 'tutorial-serving'})

Invoke the model#

Invoke the model function with invoke().

import json
from time import sleep
from random import choice

iris_data = pd.read_csv(
iris_data = iris_data.to_numpy().tolist()

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")

for i in range(1000):
    data_point = choice(iris_data)
        f"v2/models/{model_name}/infer", json.dumps({"inputs": [data_point]})
    sleep(choice([0.01, 0.04]))
> 2024-10-20 08:50:53,016 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,546 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,595 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,645 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,730 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,783 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,835 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}

After invoking the model, you can see the model endpoints and minimal meta data (for example, last prediction) in the Models | Model Endpoints page.


You can also see the basic statistics in Grafana.

Register and deploy the model monitoring app#

The next step is to deploy the model-monitoring job to generate the full meta data. Add the monitoring function to the project using set_model_monitoring_function(). Then, deploy the function using deploy_function().

This tutorial illustrates two monitoring apps:

  • The first is the default monitoring app.

  • The second integrates Evidently as an MLRun function to create MLRun artifacts.

Learn how to write your own app in Writing a model monitoring application.

After deploying the jobs they show in the UI under Real-time functions (Nuclio).

Default monitoring app#

First download the demo_app.

my_app = project.set_model_monitoring_function(

> 2024-10-20 08:51:56,847 [info] Starting remote function deploy
2024-10-20 08:51:57  (info) Deploying function
2024-10-20 08:51:57  (info) Building
2024-10-20 08:51:57  (info) Staging files and preparing base images
2024-10-20 08:51:57  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:51:57  (info) Building processor image
2024-10-20 08:53:43  (info) Build complete
2024-10-20 08:54:01  (info) Function deploy complete
> 2024-10-20 08:54:09,038 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myapp'})

Evidently app#

You can use the MLRun built-in class, EvidentlyModelMonitoringApplicationBase, to integrate Evidently as an MLRun function and create MLRun artifacts.

First download the evidently_app.

# register the second app named "evidently_app"
import os
import uuid

my_evidently_app = project.set_model_monitoring_function(

> 2024-10-20 08:54:09,190 [info] Starting remote function deploy
2024-10-20 08:54:09  (info) Deploying function
2024-10-20 08:54:09  (info) Building
2024-10-20 08:54:09  (info) Staging files and preparing base images
2024-10-20 08:54:09  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:54:09  (info) Building processor image
2024-10-20 08:56:34  (info) Build complete
2024-10-20 08:56:57  (info) Function deploy complete
> 2024-10-20 08:57:01,573 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myevidentlyapp'})

Invoke the model again#

The controller checks for new datasets every base_period to send to the app. Invoking the model a second time ensures that the previous window closed and therefore the data contains the full monitoring window. The controller checks the Parquet DB every 10 minutes (or higher number, user-configurable), and streams any new data to the app.

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")

for i in range(150):
    data_point = choice(iris_data)
        f"v2/models/RandomForestClassifier/infer", json.dumps({"inputs": [data_point]})
    sleep(choice([0.01, 0.04]))
> 2024-10-20 09:14:40,877 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,426 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,471 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,545 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,590 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,664 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}

View model monitoring artifacts and drift in the UI#

Now you can view the application results.


And if you've used Evidently:


And an example from the various graphs:


For more information on the UI, see Model monitoring using the platform UI

View model monitoring artifacts and drift in Grafana#

Monitoring details:


And drift and operational metrics over time:


All of the Grafana dashboards are described in View model monitoring results in Grafana.

Batch infer model-monitoring#

You can use the batch function (stored in the function hub) to evaluate data against your logged model without disturbing the model, for example a one-time evaluation of new data.

See more in Batch inference and Batch inference and drift detection.