Model monitoring tutorial#

This tutorial illustrates the basic model monitoring capabilities of MLRun: deploying a model to a live endpoint and calculating data drift.

In this tutorial:

See also

Prerequisites#

Enable model monitoring on the project#

Enable model monitoring for a project with enable_model_monitoring(). The controller runs, by default, every 10 minutes, which is also the minimum interval. You can modify the frequency with the parameter base_period. To change the base_period, call update_model_monitoring_controller.

import mlrun
import os
import uuid

project_name = "tutorial"
project = mlrun.get_or_create_project(project_name, "./")
> 2024-10-20 08:49:27,495 [info] Project loaded successfully: {"project_name":"tutorial"}
project.set_model_monitoring_credentials(None, "v3io", "v3io")
project.enable_model_monitoring(base_period=2)
> 2024-10-20 08:49:27,599 [warning] enable_model_monitoring: 'base_period' < 10 minutes is not supported in production environments: {"project":"tutorial"}

Log the model with training data#

See the parameter descriptions in log_model(). Download the pickle file used in this example.

# Download the training set
import pandas as pd

train_set = pd.read_csv(
    "https://s3.us-east-1.wasabisys.com/iguazio/data/iris/iris_dataset.csv"
)
model_name = "RandomForestClassifier"
project.log_model(
    model_name,
    model_file="src/model.pkl",
    training_set=train_set,
    framework="sklearn",
    label_column="label",
)
<mlrun.artifacts.model.ModelArtifact at 0x7f83dbbdbbe0>

Import, enable monitoring, and deploy the serving function#

Use the v2_model_server serving function from the MLRun function hub.

Add the model to the serving function's routing spec (add_model()), enable monitoring on the serving function (set_tracking()), and then deploy the function (deploy_function()).

The result of this step is that the model-monitoring stream pod writes data to Parquet, by model endpoint. Every base period, the controller checks for new data and if it finds, sends it to the relevant app.

# Import the serving function
serving_fn = mlrun.import_function(
    "hub://v2_model_server", project=project_name, new_name="serving"
)

serving_fn.add_model(
    model_name, model_path=f"store://models/tutorial/RandomForestClassifier:latest"
)

# enable monitoring on this serving function
serving_fn.set_tracking()

serving_fn.spec.build.requirements = ["scikit-learn~=1.5.1"]

# Deploy the serving function
project.deploy_function(serving_fn)
> 2024-10-20 08:49:30,631 [info] Starting remote function deploy
2024-10-20 08:49:30  (info) Deploying function
2024-10-20 08:49:30  (info) Building
2024-10-20 08:49:31  (info) Staging files and preparing base images
2024-10-20 08:49:31  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:49:31  (info) Building processor image
2024-10-20 08:50:36  (info) Build complete
2024-10-20 08:50:44  (info) Function deploy complete
> 2024-10-20 08:50:51,999 [info] Successfully deployed function: {"external_invocation_urls":["tutorial-serving.default-tenant.app.vmdev94.lab.iguazeng.com/"],"internal_invocation_urls":["nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://tutorial-serving.default-tenant.app.vmdev94.lab.iguazeng.com/', 'name': 'tutorial-serving'})

Invoke the model#

Invoke the model function with invoke().

import json
from time import sleep
from random import choice

iris_data = pd.read_csv(
    "https://s3.us-east-1.wasabisys.com/iguazio/data/iris/iris_to_predict.csv"
)
iris_data = iris_data.to_numpy().tolist()

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")

for i in range(1000):
    data_point = choice(iris_data)
    serving_1.invoke(
        f"v2/models/{model_name}/infer", json.dumps({"inputs": [data_point]})
    )
    sleep(choice([0.01, 0.04]))
> 2024-10-20 08:50:53,016 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,546 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,595 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,645 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,730 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,783 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,835 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}

After invoking the model, you can see the model endpoints and minimal meta data (for example, last prediction) in the Models | Model Endpoints page.

../_images/model_endpoint_1.png

You can also see the basic statistics in Grafana.

Register and deploy the model monitoring app#

The next step is to deploy the model-monitoring job to generate the full meta data. Add the monitoring function to the project using set_model_monitoring_function(). Then, deploy the function using deploy_function().

This tutorial illustrates two monitoring apps:

  • The first is the default monitoring app.

  • The second integrates Evidently as an MLRun function to create MLRun artifacts.

Learn how to write your own app in Writing a model monitoring application.

After deploying the jobs they show in the UI under Real-time functions (Nuclio).

Default monitoring app#

First download the demo_app.

my_app = project.set_model_monitoring_function(
    func="src/demo_app.py",
    application_class="DemoMonitoringApp",
    name="myApp",
)

project.deploy_function(my_app)
> 2024-10-20 08:51:56,847 [info] Starting remote function deploy
2024-10-20 08:51:57  (info) Deploying function
2024-10-20 08:51:57  (info) Building
2024-10-20 08:51:57  (info) Staging files and preparing base images
2024-10-20 08:51:57  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:51:57  (info) Building processor image
2024-10-20 08:53:43  (info) Build complete
2024-10-20 08:54:01  (info) Function deploy complete
> 2024-10-20 08:54:09,038 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myapp'})

Evidently app#

You can use the MLRun built-in class, EvidentlyModelMonitoringApplicationBase, to integrate Evidently as an MLRun function and create MLRun artifacts.

First download the evidently_app.

# register the second app named "evidently_app"
import os
import uuid

my_evidently_app = project.set_model_monitoring_function(
    func="src/evidently_app.py",
    image="mlrun/mlrun",
    requirements=[
        "evidently~=0.4.32",
    ],
    name="MyEvidentlyApp",
    application_class="DemoEvidentlyMonitoringApp",
    evidently_workspace_path=os.path.abspath(
        f"/v3io/projects/set_model_monitoring_function/artifacts/evidently_workspace"
    ),
    evidently_project_id=str(uuid.uuid4()),
)

project.deploy_function(my_evidently_app)
> 2024-10-20 08:54:09,190 [info] Starting remote function deploy
2024-10-20 08:54:09  (info) Deploying function
2024-10-20 08:54:09  (info) Building
2024-10-20 08:54:09  (info) Staging files and preparing base images
2024-10-20 08:54:09  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:54:09  (info) Building processor image
2024-10-20 08:56:34  (info) Build complete
2024-10-20 08:56:57  (info) Function deploy complete
> 2024-10-20 08:57:01,573 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myevidentlyapp'})

Invoke the model again#

The controller checks for new datasets every base_period to send to the app. Invoking the model a second time ensures that the previous window closed and therefore the data contains the full monitoring window. The controller checks the Parquet DB every 10 minutes (or higher number, user-configurable), and streams any new data to the app.

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")

for i in range(150):
    data_point = choice(iris_data)
    serving_1.invoke(
        f"v2/models/RandomForestClassifier/infer", json.dumps({"inputs": [data_point]})
    )
    sleep(choice([0.01, 0.04]))
> 2024-10-20 09:14:40,877 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,426 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,471 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,545 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,590 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,664 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}

View model monitoring artifacts and drift in the UI#

Now you can view the application results.

../_images/mm-myapp.png

And if you've used Evidently:

../_images/mm-logger-dashb-evidently.png

And an example from the various graphs:

../_images/mm-evidently.png

For more information on the UI, see Model monitoring using the platform UI

View model monitoring artifacts and drift in Grafana#

Monitoring details:

grafana_dashboard_2

And drift and operational metrics over time:

grafana_dashboard_3

All of the Grafana dashboards are described in View model monitoring results in Grafana.

Batch infer model-monitoring#

You can use the batch function (stored in the function hub) to evaluate data against your logged model without disturbing the model, for example a one-time evaluation of new data.

See more in Batch inference and Batch inference and drift detection.