What is MLRun? #

The challenge#

Most data science solutions and platforms on the market today begin and therefore emphasize the research workflow. When it comes time to integrate the generated models into real-world AI applications, they have significant functionality gaps.

These types of solutions tend to be useful only for the model development flow, and contribute very little to the production pipeline: automated data collection and preparation, automated training and evaluation pipelines, real-time application pipelines, data quality and model monitoring, feedback loops, etc.

To address the full ML application lifecycle, it’s common for organizations to combine many different tools which then forces them to develop and maintain complex glue layers, introduce manual processes, and creates technology silos that slow down developers and data scientists.

With this disjointed approach, the ML team must re-engineer the entire flow to fit production environments and methodologies while consuming excessive resources. Organizations need a way to streamline the process, automate as many tasks as possible, and break the silos between data, ML, and DevOps/MLOps teams.

MLRun - The Open Source MLOps Orchestration#

Instead of this siloed, complex and manual process, MLRun enables production pipeline design using a modular strategy, where the different parts contribute to a continuous, automated, and far simpler path from research and development to scalable production pipelines, without refactoring code, adding glue logic, or spending significant efforts on data and ML engineering.

MLRun uses Serverless Function technology: write the code once, using your preferred development environment and simple “local” semantics, and then run it as-is on different platforms and at scale. MLRun automates the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, CI/CD integration, deployment to production, monitoring, and more.

Those easily developed data or ML “functions” can then be published or loaded from a marketplace and used later to form offline or real-time production pipelines with minimal engineering efforts.

mlrun-flow


Data preparation, model development, model and application delivery, and end to end monitoring are tightly connected: they cannot be managed in silos. This is where MLRun MLOps orchestration comes in. ML, data, and DevOps/MLOps teams collaborate using the same set of tools, practices, APIs, metadata, and version control.

MLRun simplifies & accelerates the time to production.

Architecture#

pipeline



MLRun is composed of the following layers:

  • Feature Store — collects, prepares, catalogs, and serves data features for development (offline) and real-time (online) usage for real-time and batch data.

  • ML CI/CD pipeline — automatically trains, tests, optimizes, and deploys or updates models using a snapshot of the production data (generated by the feature store) and code from the source control (Git).

  • Real-Time Serving Pipeline — Rapid deployment of scalable data and ML pipelines using real-time serverless technology, including the API handling, data preparation/enrichment, model serving, ensembles, driving and measuring actions, etc.

  • Real-Time monitoring and retraining — monitors data, models, and production components and provides a feedback loop for exploring production data, identifying drift, alerting on anomalies or data quality issues, triggering re-training jobs, measuring business impact, etc.

While each of those layers is independent, the integration provides much greater value and simplicity. For example:

  • The training jobs obtain features from the feature store and update the feature store with metadata, which will be used in the serving or monitoring.

  • The real-time pipeline enriches incoming events with features stored in the feature store, and can also use feature metadata (policies, statistics, schema, etc.) to impute missing data or validate data quality.

  • The monitoring layer collects real-time inputs and outputs from the real-time pipeline and compares them with the features data/metadata from the feature store or model metadata generated by the training layer. It writes all the fresh production data back to the feature store so it can be used for various tasks such as data analysis, model re-training (on fresh data), model improvements.

When one of the components detailed above is updated, it immediately impacts the feature generation, the model serving pipeline, and the monitoring. MLRun applies versioning to each component, as well as versioning and rolling upgrades across components.

Basic components#

MLRun has the following main components that are used throughout the system:

  • Project — a container for organizing all of your work on a particular activity. Projects consist of metadata, source code, workflows, data and artifacts, models, triggers, and member management for user collaboration. Read more in Projects.

  • Function — a software package with one or more methods and runtime-specific attributes (such as image, command, arguments, and environment). Read more in MLRun serverless functions and Functions.

  • Run — an object that contains information about an executed function. The run object is created as a result of running a function, and contains the function attributes (such as arguments, inputs, and outputs), as well the execution status and results (including links to output artifacts). Read more in Running a job.

  • Artifact — versioned data artifacts (such as data sets, files and models) are produced or consumed by functions, runs, and workflows. Read more in Artifacts.

  • Workflow — defines a functions pipeline or a directed acyclic graph (DAG) to execute using Kubeflow Pipelines or MLRun Real-time serving pipelines. Read more in Workflows.

  • UI — a graphical user interface (dashboard) for displaying and managing projects and their contained experiments, artifacts, and code.