Using MLRun#

MLRun is an open MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications. MLRun significantly reduces engineering efforts, time to production, and computation resources. With MLRun, you can choose any IDE on your local machine or on the cloud. MLRun breaks the silos between data, ML, software, and DevOps/MLOps teams, enabling collaboration and fast continuous improvements.

Get started with MLRun Tutorials and examples, Installation and setup guide, or read about Using MLRun.

This page explains how MLRun addresses the MLOps tasks and presents the MLRun core components.

MLOps tasks#

Project management and CI/CD automation
Ingest and process data
Develop and train models
Deploy models and apps
Monitor and alert

The MLOps development workflow section describes the different tasks and stages in detail. MLRun can be used to automate and orchestrate all the different tasks or just specific tasks (and integrate them with what you have already deployed).

Project management and CI/CD automation

In MLRun the assets, metadata, and services (data, functions, jobs, artifacts, models, secrets, etc.) are organized into projects. Projects can be imported/exported as a whole, mapped to git repositories or IDE projects (in PyCharm, VSCode, etc.), which enables versioning, collaboration, and CI/CD. Project access can be restricted to a set of users and roles. more…

Docs: Projects and automation CI/CD integration , Tutorials: quick start Automated ML pipeline , Videos: Quick start

Ingest and process data

MLRun provides abstract interfaces to various offline and online data sources, supports batch or realtime data processing at scale, data lineage and versioning, structured and unstructured data, and more. In addition, the MLRun Feature store automates the collection, transformation, storage, catalog, serving, and monitoring of data features across the ML lifecycle and enables feature reuse and sharing. more…

Docs: Feature store Data & artifacts , Tutorials: quick start Feature store

Develop and train models

MLRun allows you to easily build ML pipelines that take data from various sources or the Feature Store and process it, train models at scale with multiple parameters, test models, track each experiment, and register, version and deploy models, etc. MLRun provides scalable built-in or custom model training services that integrate with any framework and can work with 3rd party training/auto-ML services. You can also bring your own pre-trained model and use it in the pipeline. more…

Docs: Model training and tracking Batch runs and workflows , Tutorials: Train & eval models Automated ML pipeline , Videos: Train & compare models

Deploy models and applications

MLRun rapidly deploys and manages production-grade real-time or batch application pipelines using elastic and resilient serverless functions. MLRun addresses the entire ML application: intercepting application/user requests, running data processing tasks, inferencing using one or more models, driving actions, and integrating with the application logic. more…

Docs: Realtime pipelines Batch inference , Tutorials: Realtime serving Batch inference Advanced pipeline , Videos: Serve pre-trained models

Monitor and alert

Observability is built into the different MLRun objects (data, functions, jobs, models, pipelines, etc.), eliminating the need for complex integrations and code instrumentation. With MLRun, you can observe the application/model resource usage and model behavior (drift, performance, etc.), define custom app metrics, and trigger alerts or retraining jobs. more…

Docs: Model monitoring overview , Tutorials: Model monitoring & drift detection

MLRun core components#

MLRun includes the following major components:

Project management & automation (SDK, API, etc.)
Serverless functions
Data & artifacts
Feature store
Batch runs & workflows
Real-time pipelines
Monitoring

Project management: A service (API, SDK, DB, UI) that manages the different project assets (data, functions, jobs, workflows, secrets, etc.) and provides central control and metadata layer.

Serverless functions: An automatically deployed software package with one or more methods and runtime-specific attributes (such as image, libraries, command, arguments, resources, etc.).

Data & artifacts: Glueless connectivity to various data sources, metadata management, catalog, and versioning for structured/unstructured artifacts.

Feature store: Automatically collects, prepares, catalogs, and serves production data features for development (offline) and real-time (online) deployment using minimal engineering effort.

Batch Runs & workflows: Execute one or more functions with specific parameters and collect, track, and compare all their results and artifacts.

Real-time serving pipeline: Rapid deployment of scalable data and ML pipelines using real-time serverless technology, including API handling, data preparation/enrichment, model serving, ensembles, driving and measuring actions, etc.

Real-time monitoring: Monitors data, models, resources, and production components and provides a feedback loop for exploring production data, identifying drift, alerting on anomalies or data quality issues, triggering retraining jobs, measuring business impact, etc.