mlrun

mlrun#

class mlrun.ArtifactType[source]#: Possible artifact types to pack objects as and log using a mlrun.Packager.

class mlrun.Client(credentials: Credentials)[source]#

A per-session MLRun client owning its own HTTPRunDB.

The backend URL is taken from mlrun.mlconf.dbpath (already populated by import mlrun); a single MLRun cluster per Python process is assumed. Only credentials vary per Client.

Example:

client = mlrun.Client(credentials=mlrun.Credentials(token="..."))
with client.session():
    project = mlrun.get_or_create_project("my-proj")

session() → Iterator[Client][source]#: Bind this client as active for the current contextvars scope.

class mlrun.Credentials(token: str | None = None, username: str | None = None, password: str | None = None, use_env: bool = False, extra_headers: dict[str, str] | None = None)[source]#

User credentials for MLRun API access.

One of: token=, username=/password=, or use_env=True for legacy env/config/file resolution.

extra_headers adds default headers for this client's requests (for example {"X-IGZ-Authenticator-Kind": "sa"}). Per-call headers= and Authorization always override defaults. Excluded from equality/hashing to keep the frozen dataclass hashable.

class mlrun.RuntimeConfigurationContext(auth_token_name: str | None = None)[source]#

Context manager for runtime configuration options. Settings here override any function-level configuration.

Usage Example:

with mlrun.RuntimeConfigurationContext(auth_token_name="my-token"):
func.run() project.run(name="my-pipeline") project.enable_model_monitoring()

Parameters:: auth_token_name -- Name of the authentication token to use for operations.

static get_auth_token_name() → str | None[source]#

Get auth token name from context manager.

Returns:: The auth token name if set in the current context, None otherwise.

mlrun.VolumeMount#: alias of Mount

mlrun.auto_mount(pvc_name: str = '', volume_mount_path: str = '', volume_name: str | None = None) → Callable[[KubeResource], KubeResource][source]#

Choose the mount based on env variables and params

Volume will be selected by the following order:

k8s PVC volume when both pvc_name and volume_mount_path are set
k8s PVC volume when env var is set: MLRUN_PVC_MOUNT=<pvc-name>:<mount-path>
k8s PVC volume if it's configured as the auto mount type
S3 credentials when configured as the auto mount type
Secret-based env vars when configured as the auto mount type
iguazio v3io volume when V3IO_ACCESS_KEY and V3IO_USERNAME env vars are set

Convenience function to insert code and configure an mlrun runtime.

Easiest way to construct a runtime type object. Provides the most often used configuration options for all runtimes as parameters.

Instantiated runtimes are considered 'functions' in mlrun, but they are anything from nuclio functions to generic kubernetes pods to spark jobs. Functions are meant to be focused, and as such limited in scope and size. Typically, a function can be expressed in a single python module with added support from custom docker images and commands for the environment. The returned runtime object can be further configured if more customization is required.

One of the most important parameters is 'kind'. This is what is used to specify the chosen runtimes. The options are:

local: execute a local python or shell script
job: insert the code into a Kubernetes pod and execute it
nuclio: insert the code into a real-time serverless nuclio function
serving: insert code into orchestrated nuclio function(s) forming a DAG
dask: run the specified python code / script as Dask Distributed job
mpijob: run distributed Horovod jobs over the MPI job operator
spark: run distributed Spark job using Spark Kubernetes Operator
remote-spark: run distributed Spark job on remote Spark service
databricks: run code on Databricks cluster (python scripts, Spark etc.)
application: run a long living application (e.g. a web server, UI, etc.)

Learn more about Kinds of functions (runtimes)

Parameters:

name -- function name, typically best to use hyphen-case
project -- project used to namespace the function, defaults to the active project
tag -- function tag to track multiple versions of the same function, defaults to 'latest'
filename -- path to .py/.ipynb file, defaults to current jupyter notebook
handler -- The default function handler to call for the job or nuclio function, in batch functions (job, mpijob, ..) the handler can also be specified in the .run() command, when not specified the entire file will be executed (as main). for nuclio functions the handler is in the form of module:function, defaults to 'main:handler'
kind -- function runtime type string - nuclio, job, etc. (see docstring for all options)
image -- base docker image to use for building the function container, defaults to None
code_output -- specify '.' to generate python module from the current jupyter notebook
embed_code -- indicates whether or not to inject the code directly into the function runtime spec, defaults to True
description -- short function description, defaults to ''
requirements -- a list of python packages
requirements_file -- path to a python requirements file
categories -- list of categories for MLRun Hub, defaults to None
labels -- name/value pairs dict to tag the function with useful metadata, defaults to None
with_doc -- indicates whether to document the function parameters, defaults to True
ignored_tags -- notebook cells to ignore when converting notebooks to py code (separated by ';')

Returns:

pre-configured function object from a mlrun runtime class

example:

import mlrun

# create job function object from notebook code and add doc/metadata
fn = mlrun.code_to_function(
    "file_utils",
    kind="job",
    handler="open_archive",
    image="mlrun/mlrun",
    description="this function opens a zip archive into a local/mounted folder",
    categories=["fileutils"],
    labels={"author": "me"},
)

example:

import mlrun
from pathlib import Path

# create file
Path("mover.py").touch()

# create nuclio function object from python module call mover.py
fn = mlrun.code_to_function(
    "nuclio-mover",
    kind="nuclio",
    filename="mover.py",
    image="python:3.11",
    description="this function moves files from one system to another",
    requirements=["pandas"],
    labels={"author": "me"},
)

Retrieve value of a secret, either from a user-provided secret store, or from environment variables. The function will retrieve a secret value, attempting to find it according to the following order:

If secret_provider was provided, will attempt to retrieve the secret from it
If an MLRun SecretsStore was provided, query it for the secret key
An environment variable with the same key
An MLRun-generated env. variable, mounted from a project secret (to be used in MLRun runtimes)
The default value

Also supports discovering the value inside any environment variable that contains a JSON-encoded list of dicts with fields: {'name': 'KEY', 'value': 'VAL', 'value_from': ...}. This fallback is applied after checking normal environment variables and before returning the default. Example:

secrets = {"KEY1": "VALUE1"}
secret = get_secret_or_env("KEY1", secret_provider=secrets)


# Using a function to retrieve a secret
def my_secret_provider(key):
    # some internal logic to retrieve secret
    return value


secret = get_secret_or_env(
    "KEY1", secret_provider=my_secret_provider, default="TOO-MANY-SECRETS"
)

Parameters:

key -- Secret key to look for
secret_provider -- Dictionary, callable or SecretsStore to extract the secret value from. If using a callable, it must use the signature callable(key:str)
default -- Default value to return if secret was not available through any other means
prefix -- When passed, the prefix is added to the secret key.

Returns:

The secret value if found in any of the sources, or default if provided.

mlrun.get_version()[source]#: get current mlrun version

mlrun.import_function(url='', secrets=None, db='', project=None, new_name=None)[source]#

Create function object from DB or local/remote YAML file

Functions can be imported from function repositories (MLRun Hub) or local db), or be read from a remote URL (http(s), s3, git, v3io, ..) containing the function YAML

special URLs:

function hub:       hub://[{source}/]{name}[:{tag}]
local mlrun db:     db://{project-name}/{name}[:{tag}]

examples:

function = mlrun.import_function("hub://auto-trainer")
function = mlrun.import_function("./func.yaml")
function = mlrun.import_function(
    "https://raw.githubusercontent.com/org/repo/func.yaml"
)

Parameters:

url -- path/url to MLRun Hub, db or function YAML file
secrets -- optional, credentials dict for DB or URL (s3, v3io, ...)
db -- optional, mlrun api/db path
project -- optional, target project for the function
new_name -- optional, override the imported function name

Returns:

function object

mlrun.mount_v3io(name: str = 'v3io', remote: str = '', access_key: str = '', user: str = '', secret: str | None = None, volume_mounts: list[Mount] | None = None) → Callable[[KubeResource], KubeResource][source]#

Modifier function to apply to a Container Op to volume mount a v3io path

Parameters:

name -- the volume name
remote -- the v3io path to use for the volume (~/ prefix will be replaced with /users/<username>/)
access_key -- the access key used to auth against v3io (default: V3IO_ACCESS_KEY env var)
user -- the username used to auth against v3io (default: V3IO_USERNAME env var)
secret -- k8s secret name for the username and access key
volume_mounts -- list of VolumeMount; if empty, defaults to mounting /v3io and /User

set and test default config for: api path, artifact_path and project

this function will try and read the configuration from the environment/api and merge it with the user provided project name, artifacts path or api path/access_key. it returns the configured artifacts path, this can be used to define sub paths.

Note: the artifact path is an mlrun data uri (e.g. s3://bucket/path) and can not be used with file utils.

example:

from os import path

project_name, artifact_path = set_environment()
set_environment("http://localhost:8080", artifact_path="./")
set_environment(env_file="mlrun.env")
set_environment("<remote-service-url>", access_key="xyz", username="joe")

Parameters:

api_path -- location/url of mlrun api service
artifact_path -- path/url for storing experiment artifacts
access_key -- set the remote cluster access key (V3IO_ACCESS_KEY)
username -- name of the user to authenticate
env_file -- path/url to .env file (holding MLRun config and other env vars), see: set_env_from_file()
mock_functions -- set to True to create local/mock functions instead of real containers, set to "auto" to auto determine based on the presence of k8s/Nuclio

Returns:

active project name actual artifact path/url, can be used to create subpaths per task or group of artifacts

mlrun.sync_secret_tokens() → None[source]#

Synchronize local secret tokens with the backend. Doesn't sync when running from a runtime.

This function:

Reads the local token file (defaults to mlrun.mlconf.auth_with_oauth_token.token_file value).
Validates its content and resolves the token currently in use into a SecretToken object.
Uploads the token to the backend.
Logs a warning if the token was updated on the backend due to a newer expiration time found locally.

mlrun.v3io_cred(api: str = '', user: str = '', access_key: str = '') → Callable[[KubeResource], KubeResource][source]#

Modifier function to copy local v3io env vars to container

Usage:

train = train_op(...)
train.apply(use_v3io_cred())

mlrun

Contents

mlrun#