Model monitoring architecture#
In this section
Overview#
When you call enable_model_monitoring()
, you effectively deploy three components:
application controller function: handles the monitoring processing and the triggers the apps that trigger the writer. The controller is a real-time Nuclio job whose frequency is determined by
base_period
.stream function: monitors the log of the data stream. It is triggered when a new log entry is detected. The monitored data is used to create real-time dashboards, detect drift, and analyze performance.
writer function: writes the results and the metrics that output from the model monitoring applications to the databases, and outputs alerts according to the user configuration.
The model monitoring process flow starts with collecting operational data from a function in the model serving pod. The model
monitoring stream pod forwards data to a Parquet database. MLRun supports integers, float, strings, images.
The controller periodically checks the Parquet DB for new data and forwards it to the relevant application.
Each model monitoring application is a separate nuclio real-time function. Each one listens to a stream that is filled by
the monitoring controller at each base_period
interval.
The stream function examines the log entry, processes it into statistics which are then written to the statistics databases
(parquet file, time series database and key value database).
The monitoring stream function writes the Parquet files using a basic storey ParquetTarget. Additionally, there is a monitoring feature set that refers
to the same target.
APIs#
The model monitoring APIs are configured per project. The APIs are:
enable_model_monitoring()
— Brings up the controller, writer and stream realtime functions, and schedules the controller according to thebase_period
. You can also deploy the default histogram-based data drift application when you enable model monitoring.create_model_monitoring_function()
— Creates a monitoring function object without setting it to the project, used for user-apps and troubleshooting.set_model_monitoring_function()
— Updates or sets a monitoring function to the project. (Monitoring does not start until the function is deployed.)list_model_monitoring_functions()
— Retrieves a list of all the model monitoring functions.remove_model_monitoring_function()
— Removes the specified model-monitoring-app function from the project and from the DB.set_model_monitoring_credentials()
— Set the credentials that are used by the project's model monitoring infrastructure functions.disable_model_monitoring()
— Disables the controller.update_model_monitoring_controller()
— Redeploys the model monitoring application controller functions.get_model_monitoring_file_target_path()
— Gets the full path from the configuration based on the provided project and kind.
Model and model monitoring endpoints#
For each model that is served in a model serving function, there is a model endpoint. The model endpoint is associated
with a feature set that manages the model endpoint statistics. See model endpoint
.
All model monitoring endpoints are presented in the UI with information about the actual inference, including data on the inputs, outputs, and results. The Model Endpoints tab presents the overall metrics. From there you can select an endpoint and view the Overview, Features Analysis, and the Metrics tabs. Metrics are grouped under their applications. After you select the metrics and the timeframe, you get a histogram showing the number of occurrences/values range, and a timeline graph of the metric and the threshold. Any alerts are shown in the upper-right corner of the metrics box.
For example:
Selecting the streaming and TSDB platforms#
Model monitoring supports Kafka or V3io as streaming platforms, and TDEngine or V3IO as TSDB platforms. In addition, internal model-monitoring metadata can be saved in MySQL or V3IO.
We recommend the following versions:
TDEngine -
3.3.2.0
.MySQL -
8.0.39
, or higher8.0
compatible versions.
Before you deploy the model monitoring or serving function, you need to set the credentials
.
There are three credentials you can set, and each one can have a different value. For example:
stream_path = "kafka://<some_kafka_broker>:<port>" # or "v3io"
tsdb_connection = "taosws://<username>:<password>@<host>:<port>" # or "v3io"
endpoint_store_connection = (
"mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>" # or "v3io"
)
Model monitoring applications#
When you call enable_model_monitoring
on a project, by default MLRun deploys the monitoring app, HistogramDataDriftApplication
, which is
tailored for classical ML models (not LLMs, gen AI, deep-learning models, etc.). It includes:
Total Variation Distance (TVD) — The statistical difference between the actual predictions and the model's trained predictions.
Hellinger Distance — A type of f-divergence that quantifies the similarity between the actual predictions, and the model's trained predictions.
The average of TVD & Hellinger as the general drift result.
Kullback–Leibler Divergence (KLD) — The measure of how the probability distribution of actual predictions is different from the second model's trained reference probability distribution.
You can create your own model monitoring applications for LLMs, gen AI, deep-learning models, etc., based on the class mlrun.model_monitoring.applications.ModelMonitoringApplicationBaseV2()
.
See Writing a model monitoring application.
You can also integrate Evidently
as an MLRun function and create MLRun artifacts, using the built-in class EvidentlyModelMonitoringApplicationBase
. See an example in Model monitoring tutorial.
Projects are used to group functions that use the same model monitoring application. You first need to create a project for a specific application. Then you disable the default app, enable your customer app, and create and run the functions.
The basic flow for classic ML and other models is the same, but the apps and the infer requests are different.
Multi-port predictions#
Multi-port predictions involve generating multiple outputs or predictions at the same time from a single model or system. Each "port" can be thought of as a separate output channel that provides a distinct prediction or piece of information. This capability is particularly useful in scenarios where multiple related predictions are needed simultaneously. Multi-port predictions increase efficiently, reducing the time and computational resources required. And, multi-port predictions provide a more holistic view of the data, enabling better decision-making and more accurate forecasting. For example, in a gen AI model, one port gives a response on prompts, one is for meta data, and the third for images.
Multi-port predictions can be applied in several ways:
Multi-task learning — A single model is trained to perform multiple tasks simultaneously, such as predicting different attributes of an object. For example, a model could predict both the age and gender of a person from a single image.
Ensemble methods — Multiple models are combined to make predictions, and each model's output can be considered a separate port. The final prediction is often an aggregation of these individual outputs.
Time series forecasting — In time series analysis, multi-port predictions can be used to forecast multiple future time points simultaneously, providing a more comprehensive view of future trends.
Batch inputs#
Processing data in batches allows for parallel computation, significantly speeding up the training and inference processes. This is especially important for large-scale models that require substantial computational resources. Batch inputs are used with CPUs and GPUs. For gen AI models, batch input is typically a list of prompts. For classic ML models, batch input is a list of features.
See an example of batch input in the Serving pre-trained ML/DL models tutorial.
Alerts and notifications#
You can set up Alerts to inform you about suspected and detected issues in the model monitoring functions. And you can use Notifications to notify about the status of runs and pipelines.