MLRun CE development notes#

This page contains notes for configuring your development system (after installation).

In this section

Change the deployment and jobs default PVC#

A default PVC is created during the MLRun installation. If you modified the env vars before importing MLRun (to change the PVC), those values are overwritten. Change the PVC, after importing MLRun, by running this code:

import mlrun
mlrun.mlconf.storage.auto_mount_type = "pvc"
pvc_params = {
    "pvc_name": <your-pvc-name>,
    "volume_name": <volume-name>,
    "volume_mount_path": <container mount path>,
}
mlrun.mlconf.storage.auto_mount_params = ",".join(
    [f"{key}={value}" for key, value in pvc_params.items()]
)

Configuring the user Jupyter conda environment#

The default Jupyter comes with a conda env named mlrun. This conda is not persistent. If you install any packages on this conda env, and then the Jupyter pod gets restarted or deleted, those packages will be deleted.

To create a new, persistent, environment, run this in your Jupyter terminal, where myenv is the name of your environment:

# Create the virtual environment
conda create -n <myenv> python==3.11 -y

# Activate the virtual environment
conda activate <myenv>

# Make sure that ipykernel is installed
pip install --user ipykernel

# Add the new virtual environment to Jupyter
python -m ipykernel install --user --name <myenv> --display-name "Python (<myenv>)"

Configuring TimescaleDB and Kafka for model monitoring#

TimescaleDB and Kafka are part of the default CE installations for model monitoring.

TimescaleDB is a PostgreSQL-based time-series database used as the TSDB backend for model monitoring. Default connection values for CE:

  • Host: timescaledb..svc.cluster.local

  • Port: 5432

  • Database: postgres

  • User: postgres

  • Password: postgres

Kafka is the streaming platform used for data flow between model monitoring components. Default connection values for CE:

  • Brokers: kafka-stream..svc.cluster.local:9092

Configuring data store profiles#

The connections are managed by using data store profiles. Data store profiles manage the connection credentials securely.

from mlrun.datastore.datastore_profile import (
    DatastoreProfileKafkaStream,
    DatastoreProfilePostgreSQL,
)
# Create and register TSDB profile
tsdb_profile = DatastoreProfilePostgreSQL(
    name=tsdb_profile_name,
    user="postgres",
    password="postgres",
    host="timescaledb",
    port=5432,
    database="postgres",
)
project.register_datastore_profile(tsdb_profile)
# Create and register stream profile
stream_profile = DatastoreProfileKafkaStream(
    name=stream_profile_name,
    brokers="kafka-stream:9092",
    topics=[],
)
project.register_datastore_profile(stream_profile)
# Set model monitoring credentials and enable the infrastructure
project.set_model_monitoring_credentials(
    tsdb_profile_name=tsdb_profile.name,
    stream_profile_name=stream_profile.name,
)

See more details, including additional configuration options, in set_model_monitoring_credentials.