Model monitoring tutorial#
This tutorial illustrates the basic model monitoring capabilities of MLRun: deploying a model to a live endpoint and calculating data drift.
In this tutorial:
See also
Prerequisites#
Enable model monitoring on the project#
Enable model monitoring for a project with enable_model_monitoring()
.
The controller runs, by default, every 10 minutes, which is also the minimum interval.
You can modify the frequency with the parameter base_period
.
To change the base_period
, call update_model_monitoring_controller
.
import mlrun
import os
import uuid
project_name = "tutorial"
project = mlrun.get_or_create_project(project_name, "./")
> 2024-10-20 08:49:27,495 [info] Project loaded successfully: {"project_name":"tutorial"}
project.set_model_monitoring_credentials(None, "v3io", "v3io")
project.enable_model_monitoring(base_period=2)
> 2024-10-20 08:49:27,599 [warning] enable_model_monitoring: 'base_period' < 10 minutes is not supported in production environments: {"project":"tutorial"}
Log the model with training data#
See the parameter descriptions in log_model()
.
Download the pickle file
used in this example.
# Download the training set
import pandas as pd
train_set = pd.read_csv(
"https://s3.us-east-1.wasabisys.com/iguazio/data/iris/iris_dataset.csv"
)
model_name = "RandomForestClassifier"
project.log_model(
model_name,
model_file="src/model.pkl",
training_set=train_set,
framework="sklearn",
label_column="label",
)
<mlrun.artifacts.model.ModelArtifact at 0x7f83dbbdbbe0>
Import, enable monitoring, and deploy the serving function#
Use the v2_model_server serving function from the MLRun function hub.
Add the model to the serving function's routing spec (add_model()
),
enable monitoring on the serving function (set_tracking()
),
and then deploy the function (deploy_function()
).
The result of this step is that the model-monitoring stream pod writes data to Parquet, by model endpoint. Every base period, the controller checks for new data and if it finds, sends it to the relevant app.
# Import the serving function
serving_fn = mlrun.import_function(
"hub://v2_model_server", project=project_name, new_name="serving"
)
serving_fn.add_model(
model_name, model_path=f"store://models/tutorial/RandomForestClassifier:latest"
)
# enable monitoring on this serving function
serving_fn.set_tracking()
serving_fn.spec.build.requirements = ["scikit-learn~=1.5.1"]
# Deploy the serving function
project.deploy_function(serving_fn)
> 2024-10-20 08:49:30,631 [info] Starting remote function deploy
2024-10-20 08:49:30 (info) Deploying function
2024-10-20 08:49:30 (info) Building
2024-10-20 08:49:31 (info) Staging files and preparing base images
2024-10-20 08:49:31 (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:49:31 (info) Building processor image
2024-10-20 08:50:36 (info) Build complete
2024-10-20 08:50:44 (info) Function deploy complete
> 2024-10-20 08:50:51,999 [info] Successfully deployed function: {"external_invocation_urls":["tutorial-serving.default-tenant.app.vmdev94.lab.iguazeng.com/"],"internal_invocation_urls":["nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://tutorial-serving.default-tenant.app.vmdev94.lab.iguazeng.com/', 'name': 'tutorial-serving'})
Invoke the model#
Invoke the model function with invoke()
.
import json
from time import sleep
from random import choice
iris_data = pd.read_csv(
"https://s3.us-east-1.wasabisys.com/iguazio/data/iris/iris_to_predict.csv"
)
iris_data = iris_data.to_numpy().tolist()
model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")
for i in range(1000):
data_point = choice(iris_data)
serving_1.invoke(
f"v2/models/{model_name}/infer", json.dumps({"inputs": [data_point]})
)
sleep(choice([0.01, 0.04]))
> 2024-10-20 08:50:53,016 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,546 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,595 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,645 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,730 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,783 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 08:50:53,835 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
After invoking the model, you can see the model endpoints and minimal meta data (for example, last prediction) in the Models | Model Endpoints page.
You can also see the basic statistics in Grafana.
Register and deploy the model monitoring app#
The next step is to deploy the model-monitoring job to generate the full meta data.
Add the monitoring function to the project using set_model_monitoring_function()
.
Then, deploy the function using deploy_function()
.
This tutorial illustrates two monitoring apps:
The first is the default monitoring app.
The second integrates Evidently as an MLRun function to create MLRun artifacts.
Learn how to write your own app in Writing a model monitoring application.
After deploying the jobs they show in the UI under Real-time functions (Nuclio).
Default monitoring app#
First download the demo_app
.
my_app = project.set_model_monitoring_function(
func="src/demo_app.py",
application_class="DemoMonitoringApp",
name="myApp",
)
project.deploy_function(my_app)
> 2024-10-20 08:51:56,847 [info] Starting remote function deploy
2024-10-20 08:51:57 (info) Deploying function
2024-10-20 08:51:57 (info) Building
2024-10-20 08:51:57 (info) Staging files and preparing base images
2024-10-20 08:51:57 (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:51:57 (info) Building processor image
2024-10-20 08:53:43 (info) Build complete
2024-10-20 08:54:01 (info) Function deploy complete
> 2024-10-20 08:54:09,038 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myapp'})
Evidently app#
You can use the MLRun built-in class, EvidentlyModelMonitoringApplicationBase
, to integrate Evidently as an MLRun function and create MLRun artifacts.
First download the evidently_app
.
# register the second app named "evidently_app"
import os
import uuid
my_evidently_app = project.set_model_monitoring_function(
func="src/evidently_app.py",
image="mlrun/mlrun",
requirements=[
"evidently~=0.4.32",
],
name="MyEvidentlyApp",
application_class="DemoEvidentlyMonitoringApp",
evidently_workspace_path=os.path.abspath(
f"/v3io/projects/set_model_monitoring_function/artifacts/evidently_workspace"
),
evidently_project_id=str(uuid.uuid4()),
)
project.deploy_function(my_evidently_app)
> 2024-10-20 08:54:09,190 [info] Starting remote function deploy
2024-10-20 08:54:09 (info) Deploying function
2024-10-20 08:54:09 (info) Building
2024-10-20 08:54:09 (info) Staging files and preparing base images
2024-10-20 08:54:09 (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-10-20 08:54:09 (info) Building processor image
2024-10-20 08:56:34 (info) Build complete
2024-10-20 08:56:57 (info) Function deploy complete
> 2024-10-20 08:57:01,573 [info] Successfully deployed function: {"external_invocation_urls":[],"internal_invocation_urls":["nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080"]}
DeployStatus(state=ready, outputs={'endpoint': 'http://nuclio-tutorial-myevidentlyapp.default-tenant.svc.cluster.local:8080', 'name': 'tutorial-myevidentlyapp'})
Invoke the model again#
The controller checks for new datasets every base_period
to send to the app. Invoking the model a second time ensures that the previous
window closed and therefore the data contains the full monitoring window. The controller checks the Parquet DB every 10 minutes
(or higher number, user-configurable), and streams any new data to the app.
model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")
for i in range(150):
data_point = choice(iris_data)
serving_1.invoke(
f"v2/models/RandomForestClassifier/infer", json.dumps({"inputs": [data_point]})
)
sleep(choice([0.01, 0.04]))
> 2024-10-20 09:14:40,877 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,426 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,471 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,545 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,590 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
> 2024-10-20 09:14:41,664 [info] Invoking function: {"method":"POST","path":"http://nuclio-tutorial-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer"}
View model monitoring artifacts and drift in the UI#
Now you can view the application results.
And if you've used Evidently:
And an example from the various graphs:
For more information on the UI, see Model monitoring using the platform UI
View model monitoring artifacts and drift in Grafana#
Monitoring details:
And drift and operational metrics over time:
All of the Grafana dashboards are described in View model monitoring results in Grafana.
Batch infer model-monitoring#
You can use the batch function (stored in the function hub) to evaluate data against your logged model without disturbing the model, for example a one-time evaluation of new data.
See more in Batch inference and Batch inference and drift detection.