mlrun#

class mlrun.ArtifactType(value)[source]#

Possible artifact types to log using the MLRun context decorator.

mlrun.code_to_function(name: str = '', project: str = '', tag: str = '', filename: str = '', handler: str = '', kind: str = '', image: Optional[str] = None, code_output: str = '', embed_code: bool = True, description: str = '', requirements: Optional[Union[str, List[str]]] = None, categories: Optional[List[str]] = None, labels: Optional[Dict[str, str]] = None, with_doc: bool = True, ignored_tags=None) Union[mlrun.runtimes.mpijob.v1alpha1.MpiRuntimeV1Alpha1, mlrun.runtimes.mpijob.v1.MpiRuntimeV1, mlrun.runtimes.function.RemoteRuntime, mlrun.runtimes.serving.ServingRuntime, mlrun.runtimes.daskjob.DaskCluster, mlrun.runtimes.kubejob.KubejobRuntime, mlrun.runtimes.local.LocalRuntime, mlrun.runtimes.sparkjob.spark2job.Spark2Runtime, mlrun.runtimes.sparkjob.spark3job.Spark3Runtime, mlrun.runtimes.remotesparkjob.RemoteSparkRuntime][source]#

Convenience function to insert code and configure an mlrun runtime.

Easiest way to construct a runtime type object. Provides the most often used configuration options for all runtimes as parameters.

Instantiated runtimes are considered ‘functions’ in mlrun, but they are anything from nuclio functions to generic kubernetes pods to spark jobs. Functions are meant to be focused, and as such limited in scope and size. Typically a function can be expressed in a single python module with added support from custom docker images and commands for the environment. The returned runtime object can be further configured if more customization is required.

One of the most important parameters is ‘kind’. This is what is used to specify the chosen runtimes. The options are:

  • local: execute a local python or shell script

  • job: insert the code into a Kubernetes pod and execute it

  • nuclio: insert the code into a real-time serverless nuclio function

  • serving: insert code into orchestrated nuclio function(s) forming a DAG

  • dask: run the specified python code / script as Dask Distributed job

  • mpijob: run distributed Horovod jobs over the MPI job operator

  • spark: run distributed Spark job using Spark Kubernetes Operator

  • remote-spark: run distributed Spark job on remote Spark service

Learn more about function runtimes here: https://docs.mlrun.org/en/latest/runtimes/functions.html#function-runtimes

Parameters
  • name – function name, typically best to use hyphen-case

  • project – project used to namespace the function, defaults to ‘default’

  • tag – function tag to track multiple versions of the same function, defaults to ‘latest’

  • filename – path to .py/.ipynb file, defaults to current jupyter notebook

  • handler – The default function handler to call for the job or nuclio function, in batch functions (job, mpijob, ..) the handler can also be specified in the .run() command, when not specified the entire file will be executed (as main). for nuclio functions the handler is in the form of module:function, defaults to ‘main:handler’

  • kind – function runtime type string - nuclio, job, etc. (see docstring for all options)

  • image – base docker image to use for building the function container, defaults to None

  • code_output – specify ‘.’ to generate python module from the current jupyter notebook

  • embed_code – indicates whether or not to inject the code directly into the function runtime spec, defaults to True

  • description – short function description, defaults to ‘’

  • requirements – list of python packages or pip requirements file path, defaults to None

  • categories – list of categories for mlrun function marketplace, defaults to None

  • labels – immutable name/value pairs to tag the function with useful metadata, defaults to None

  • with_doc – indicates whether to document the function parameters, defaults to True

  • ignored_tags – notebook cells to ignore when converting notebooks to py code (separated by ‘;’)

Returns

pre-configured function object from a mlrun runtime class

example:

import mlrun

# create job function object from notebook code and add doc/metadata
fn = mlrun.code_to_function("file_utils", kind="job",
                            handler="open_archive", image="mlrun/mlrun",
                            description = "this function opens a zip archive into a local/mounted folder",
                            categories = ["fileutils"],
                            labels = {"author": "me"})

example:

import mlrun
from pathlib import Path

# create file
Path("mover.py").touch()

# create nuclio function object from python module call mover.py
fn = mlrun.code_to_function("nuclio-mover", kind="nuclio",
                            filename="mover.py", image="python:3.7",
                            description = "this function moves files from one system to another",
                            requirements = ["pandas"],
                            labels = {"author": "me"})
mlrun.get_version()[source]#

get current mlrun version

mlrun.handler(labels: Optional[Dict[str, str]] = None, outputs: Optional[List[Optional[Union[Tuple[str, mlrun.run.ArtifactType], Tuple[str, str], Tuple[str, mlrun.run.ArtifactType, Dict[str, Any]], Tuple[str, str, Dict[str, Any]], str]]]] = None, inputs: Union[bool, Dict[str, Type]] = True)[source]#

MLRun’s handler is a decorator to wrap a function and enable setting labels, automatic mlrun.DataItem parsing and outputs logging.

Parameters
  • labels – Labels to add to the run. Expecting a dictionary with the labels names as keys. Default: None.

  • outputs

    Logging configurations for the function’s returned values. Expecting a list of tuples and None values:

    • str - A string in the format of ‘{key}:{artifact_type}’. If a string was given without ‘:’ it will

      indicate the key and the artifact type will be defaulted according to the returned value type.

    • tuple - A tuple of:

      • [0]: str - The key (name) of the artifact to use for the logged output.

      • [1]: Union[ArtifactType, str] = “result” - An ArtifactType enum or an equivalent string, that indicates how to log the returned value. The artifact types can be one of:

        • DATASET = “dataset”

        • DIRECTORY = “directory”

        • FILE = “file”

        • OBJECT = “object”

        • PLOT = “plot”

        • RESULT = “result”.

      • [2]: Optional[Dict[str, Any]] - A keyword arguments dictionary with the properties to pass to the relevant logging function (one of context.log_artifact, context.log_result, context.log_dataset).

    • None - Do not log the output.

    The list length must be equal to the total amount of returned values from the function. Default to None - meaning no outputs will be logged.

  • inputs

    Parsing configurations for the arguments passed as inputs via the run method of an MLRun function. Can be passed as a boolean value or a dictionary:

    • True - Parse all found inputs to the assigned type hint in the function’s signature. If there is no

      type hint assigned, the value will remain an mlrun.DataItem.

    • False - Do not parse inputs, leaving the inputs as mlrun.DataItem.

    • Dict[str, Type] - A dictionary with argument name as key and the expected type to parse the

      mlrun.DataItem to.

    Defaulted to True.

Example:

import mlrun

@mlrun.handler(outputs=["my_array", None, "my_multiplier"])
def my_handler(array: np.ndarray, m: int):
    array = array * m
    m += 1
    return array, "I won't be logged", m

>>> mlrun_function = mlrun.code_to_function("my_code.py", kind="job")
>>> run_object = mlrun_function.run(
...     handler="my_handler",
...     inputs={"array": "store://my_array_Artifact"},
...     params={"m": 2}
... )
>>> run_object.outputs
{'my_multiplier': 3, 'my_array': 'store://...'}
mlrun.import_function(url='', secrets=None, db='', project=None, new_name=None)[source]#

Create function object from DB or local/remote YAML file

Function can be imported from function repositories (mlrun marketplace or local db), or be read from a remote URL (http(s), s3, git, v3io, ..) containing the function YAML

special URLs:

function marketplace: hub://{name}[:{tag}]
local mlrun db:       db://{project-name}/{name}[:{tag}]

examples:

function = mlrun.import_function("hub://sklearn_classifier")
function = mlrun.import_function("./func.yaml")
function = mlrun.import_function("https://raw.githubusercontent.com/org/repo/func.yaml")
Parameters
  • url – path/url to marketplace, db or function YAML file

  • secrets – optional, credentials dict for DB or URL (s3, v3io, …)

  • db – optional, mlrun api/db path

  • project – optional, target project for the function

  • new_name – optional, override the imported function name

Returns

function object

mlrun.set_environment(api_path: Optional[str] = None, artifact_path: str = '', project: str = '', access_key: Optional[str] = None, user_project=False, username: Optional[str] = None)[source]#

set and test default config for: api path, artifact_path and project

this function will try and read the configuration from the environment/api and merge it with the user provided project name, artifacts path or api path/access_key. it returns the configured artifacts path, this can be used to define sub paths.

Note: the artifact path is an mlrun data uri (e.g. s3://bucket/path) and can not be used with file utils.

example:

from os import path
project_name, artifact_path = set_environment(project='my-project')
set_environment("http://localhost:8080", artifact_path="./")
set_environment("<remote-service-url>", access_key="xyz", username="joe")
Parameters
  • api_path – location/url of mlrun api service

  • artifact_path – path/url for storing experiment artifacts

  • project – default project name

  • access_key – set the remote cluster access key (V3IO_ACCESS_KEY)

  • user_project – add the current user name to the provided project name (making it unique per user)

  • username – name of the user to authenticate

Returns

default project name actual artifact path/url, can be used to create subpaths per task or group of artifacts