mlrun.projects

class mlrun.projects.MlrunProject(name=None, description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, metadata=None, spec=None)[source]

Bases: mlrun.model.ModelObj

property artifact_path

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property artifacts

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

build_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], with_mlrun: bool = True, skip_deployed: bool = False, image=None, base_image=None, commands: Optional[list] = None, secret_name='', mlrun_version_specifier=None, builder_env: Optional[dict] = None)[source]

deploy ML function, build container with its dependencies

Parameters
  • function – name of the function (in the project) or function object

  • with_mlrun – add the current mlrun package to the container build

  • skip_deployed – skip the build if we already have an image for the function

  • image – target image name/path

  • base_image – base image name/path (commands and source code will be added to it)

  • commands – list of docker build (RUN) commands e.g. [‘pip install pandas’]

  • secret_name – k8s secret for accessing the docker registry

  • mlrun_version_specifier – which mlrun package version to include (if not current)

  • builder_env – Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={“GIT_TOKEN”: token}, does not work yet in KFP

clear_context()[source]

delete all files and clear the context dir

property context

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

create_remote(url, name='origin')[source]

create remote for the project git

Parameters
  • url – remote git url

  • name – name for the remote (default is ‘origin’)

create_vault_secrets(secrets)[source]
deploy_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], dashboard: str = '', models: Optional[list] = None, env: Optional[dict] = None, tag: Optional[str] = None, verbose: Optional[bool] = None)[source]

deploy real-time (nuclio based) functions

Parameters
  • function – name of the function (in the project) or function object

  • dashboard – url of the remore Nuclio dashboard (when not local)

  • models – list of model items

  • env – dict of extra environment variables

  • tag – extra version tag

:param verbose add verbose prints/logs

property description

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

export(filepath=None)[source]

save the project object into a file (default to project.yaml)

func(key, sync=False)mlrun.runtimes.base.BaseRuntime[source]

get function object by name

Parameters

sync – will reload/reinit the function

Returns

function object

property functions

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

get_artifact_uri(key, category='artifact')str[source]

return the project artifact uri (store://..) from the artifact key

Parameters
  • key – artifact key/name

  • category – artifact category (artifact, model, feature-vector, ..)

get_function(key, sync=False, enrich=False)mlrun.runtimes.base.BaseRuntime[source]

get function object by name

Parameters
  • sync – will reload/reinit the function

  • enrich – add project info/config/source info to the function object

Returns

function object

get_function_objects()Dict[str, mlrun.runtimes.base.BaseRuntime][source]

“get a virtual dict with all the project functions ready for use in a pipeline

get_param(key: str, default=None)[source]

get project param by key

get_run_status(run, timeout=3600, expected_statuses=None, notifiers: Optional[mlrun.utils.helpers.RunNotifications] = None)[source]
get_secret(key: str)[source]

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through files, env, ..

get_store_resource(uri)[source]

get store resource object by uri

get_vault_secrets(secrets=None, local=False)[source]
kind = 'project'
log_artifact(item, body=None, tag='', local_path='', artifact_path=None, format=None, upload=True, labels=None, target_path=None)[source]
log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=False, target_path='', extra_data=None, **kwargs)[source]

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)
Parameters
  • key – artifact key

  • df – dataframe object

  • local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • format – optional, format to use (e.g. csv, parquet, ..)

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • preview – number of lines to store as preview in the artifact metadata

  • stats – calculate and store dataset stats in the artifact metadata

  • extra_data – key/value list of extra files/charts to link with this dataset

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

Returns

artifact object

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column=None, extra_data=None, **kwargs)[source]

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})
Parameters
  • key – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • model_file – path to the local model file we upload (see also model_dir)

  • model_dir – path to the local dir holding the model file and extra files

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • framework – name of the ML framework

  • algorithm – training algorithm name

  • tag – version tag

  • metrics – key/value dict of model metrics

  • parameters – key/value dict of model parameters

  • inputs – ordered list of model input features (name, type, ..)

  • outputs – ordered list of model output/result elements (name, type, ..)

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights – list of feature weights, one per input column

  • training_set – training set dataframe, used to infer inputs & outputs

  • label_column – which columns in the training set are the label (target) columns

  • extra_data – key/value list of extra files/charts to link with this dataset value can be abs/relative path string | bytes | artifact object

Returns

artifact object

property metadata
property mountdir

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property name

This is a property of the metadata, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property notifiers
property params

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

pull(branch=None, remote=None)[source]

pull/update sources from git or tar into the context dir

Parameters
  • branch – git branch, if not the current one

  • remote – git remote, if other than origin

push(branch, message=None, update=True, remote=None, add: Optional[list] = None)[source]

update spec and push updates to remote git repo

Parameters
  • branch – target git branch

  • message – git commit message

  • update – update files (git add update=True)

  • remote – git remote, default to origin

  • add – list of files to add

register_artifacts()[source]

register the artifacts in the MLRun DB (under this project)

reload(sync=False, context=None)mlrun.projects.project.MlrunProject[source]

reload the project and function objects from the project yaml/specs

Parameters
  • sync – set to True to load functions objects

  • context – context directory (where the yaml and code exist)

Returns

project object

remove_function(name)[source]

remove a function from a project

Parameters

name – name of the function (under the project)

run(name=None, workflow_path=None, arguments=None, artifact_path=None, workflow_handler=None, namespace=None, sync=False, watch=False, dirty=False, ttl=None, engine=None, local=False)mlrun.projects.pipelines._PipelineRunStatus[source]

run a workflow using kubeflow pipelines

Parameters
  • name – name of the workflow

  • workflow_path – url to a workflow file, if not a project workflow

  • arguments – kubeflow pipelines arguments (parameters)

  • artifact_path – target path/url for workflow artifacts, the string ‘{{workflow.uid}}’ will be replaced by workflow id

  • workflow_handler – workflow function handler (for running workflow function directly)

  • namespace – kubernetes namespace if other than default

  • sync – force functions sync before run

  • watch – wait for pipeline completion

  • dirty – allow running the workflow when the git repo is dirty

  • ttl – pipeline ttl in secs (after that the pods will be removed)

  • engine – workflow engine running the workflow. Only supported value is ‘kfp’ (also used if None)

  • local – run local pipeline with local functions (set local=True in function.run())

Returns

run id

run_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], handler: Optional[str] = None, name: str = '', params: Optional[dict] = None, hyperparams: Optional[dict] = None, hyper_param_options: Optional[mlrun.model.HyperParamOptions] = None, inputs: Optional[dict] = None, outputs: Optional[List[str]] = None, workdir: str = '', labels: Optional[dict] = None, base_task: Optional[mlrun.model.RunTemplate] = None, watch: bool = True, local: bool = False, verbose: Optional[bool] = None)Union[mlrun.model.RunObject, kfp.dsl._container_op.ContainerOp][source]

Run a local or remote task as part of a local/kubeflow pipeline

example (use with project):

# create a project with two functions (local and from marketplace)
project = mlrun.new_project(project_name, "./proj)
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://sklearn_classifier", "train")

# run functions (refer to them by name)
run1 = project.run_function("myfunc", params={"x": 7})
run2 = project.run_function("train", params={"data": run1.outputs["data"]})
Parameters
  • function – name of the function (in the project) or function object

  • handler – name of the function handler

  • name – execution name

  • params – input parameters (dict)

  • hyperparams – hyper parameters

  • hyper_param_options – hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions

  • inputs – input objects (dict of key: path)

  • outputs – list of outputs which can pass in the workflow

  • workdir – default input artifacts path

  • labels – labels to tag the job/run with ({key:val, ..})

  • base_task – task object to use as base

  • watch – watch/follow run log, True by default

  • local – run the function locally vs on the runtime/cluster

  • verbose – add verbose prints/logs

Returns

MLRun RunObject or KubeFlow containerOp

save(filepath=None)[source]
save_to_db()[source]
save_workflow(name, target, artifact_path=None, ttl=None)[source]

create and save a workflow as a yaml or archive file

Parameters
  • name – workflow name

  • target – target file path (can end with .yaml or .zip)

  • artifact_path – target path/url for workflow artifacts, the string ‘{{workflow.uid}}’ will be replaced by workflow id

  • ttl – pipeline ttl in secs (after that the pods will be removed)

set_artifact(key, artifact)[source]

add/set an artifact in the project spec (will be registered on load)

example:

project.set_artifact('data', Artifact(target_path=data_url))
Parameters
  • key – artifact key/name

  • artifact – mlrun Artifact object (or its subclasses)

set_function(func, name='', kind='', image=None, handler=None, with_repo=None)[source]

update or add a function object to the project

function can be provided as an object (func) or a .py/.ipynb/.yaml url support url prefixes:

object (s3://, v3io://, ..)
MLRun DB e.g. db://project/func:ver
functions hub/market: e.g. hub://sklearn_classifier:master

examples:

proj.set_function(func_object)
proj.set_function('./src/mycode.py', 'ingest',
                  image='myrepo/ing:latest', with_repo=True)
proj.set_function('http://.../mynb.ipynb', 'train')
proj.set_function('./func.yaml')
proj.set_function('hub://get_toy_data', 'getdata')
Parameters
  • func – function object or spec/code url

  • name – name of the function (under the project)

  • kind – runtime kind e.g. job, nuclio, spark, dask, mpijob default: job

  • image – docker image to be used, can also be specified in the function object/yaml

  • handler – default function handler to invoke (can only be set with .py/.ipynb files)

  • with_repo – add (clone) the current repo to the build source

Returns

project object

set_model_monitoring_credentials(access_key: str)[source]

Set the credentials that will be used by the project’s model monitoring infrastructure functions. The supplied credentials must have data access

Parameters

access_key – Model Monitoring access key for managing user permissions.

set_source(source, pull_at_runtime=False)[source]

set the project source code path(can be git/tar/zip archive)

Parameters
  • source – valid path to git, zip, or tar file, (or None for current) e.g. git://github.com/mlrun/something.git http://some/url/file.zip

  • pull_at_runtime – load the archive into the container at job runtime vs on build/deploy

set_workflow(name, workflow_path: str, embed=False, engine=None, args_schema: Optional[List[mlrun.model.EntrypointParam]] = None, handler=None, **args)[source]

add or update a workflow, specify a name and the code path

Parameters
  • name – name of the workflow

  • workflow_path – url/path for the workflow file

  • embed – add the workflow code into the project.yaml

  • engine – workflow processing engine (“kfp” or “local”)

  • args_schema – list of arg schema definitions (:py:class`~mlrun.model.EntrypointParam`)

  • handler – workflow function handler

  • args – argument values (key=value, ..)

property source

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property spec
property status
sync_functions(names: Optional[list] = None, always=True, save=False)[source]

reload function objects from specs and files

with_secrets(kind, source, prefix='')[source]

register a secrets source (file, env or dict)

read secrets from a source provider to be used in workflows,example:

proj.with_secrets('file', 'file.txt')
proj.with_secrets('inline', {'key': 'val'})
proj.with_secrets('env', 'ENV1,ENV2', prefix='PFX_')

Vault secret source has several options:

proj.with_secrets('vault', {'user': <user name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', {'project': <proj. name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', ['secret1', 'secret2' ...])

The 2nd option uses the current project name as context. Can also use empty secret list:

proj.with_secrets('vault', [])

This will enable access to all secrets in vault registered to the current project.

Parameters
  • kind – secret type (file, inline, env, vault)

  • source – secret data or link (see example)

  • prefix – add a prefix to the keys in this source

Returns

project object

property workflows

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

class mlrun.projects.ProjectMetadata(name=None, created=None, labels=None, annotations=None)[source]

Bases: mlrun.model.ModelObj

property name

Project name

static validate_project_name(name: str, raise_on_failure: bool = True)bool[source]
class mlrun.projects.ProjectSpec(description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, source=None, subpath=None, origin_url=None, goals=None, load_source_on_run=None, desired_state='online', owner=None, disable_auto_mount=False)[source]

Bases: mlrun.model.ModelObj

property artifacts

list of artifacts used in this project

property functions

list of function object/specs used in this project

property mountdir

specify to mount the context dir inside the function container use ‘.’ to use the same path as in the client e.g. Jupyter

remove_artifact(key)[source]
remove_function(name)[source]
remove_workflow(name)[source]
set_artifact(key, artifact)[source]
set_function(name, function_object, function_dict)[source]
set_workflow(name, workflow)[source]
property source

source url or git repo

property workflows

list of workflows specs used in this project

class mlrun.projects.ProjectStatus(state=None)[source]

Bases: mlrun.model.ModelObj

mlrun.projects.build_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], with_mlrun: bool = True, skip_deployed: bool = False, image=None, base_image=None, commands: Optional[list] = None, secret_name='', requirements: Optional[Union[str, List[str]]] = None, mlrun_version_specifier=None, builder_env: Optional[dict] = None, project_object=None)[source]

deploy ML function, build container with its dependencies

Parameters
  • function – name of the function (in the project) or function object

  • with_mlrun – add the current mlrun package to the container build

  • skip_deployed – skip the build if we already have an image for the function

  • image – target image name/path

  • base_image – base image name/path (commands and source code will be added to it)

  • commands – list of docker build (RUN) commands e.g. [‘pip install pandas’]

  • secret_name – k8s secret for accessing the docker registry

  • requirements – list of python packages or pip requirements file path, defaults to None

  • mlrun_version_specifier – which mlrun package version to include (if not current)

  • builder_env – Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={“GIT_TOKEN”: token}, does not work yet in KFP

mlrun.projects.deploy_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], dashboard: str = '', models: Optional[list] = None, env: Optional[dict] = None, tag: Optional[str] = None, verbose: Optional[bool] = None, project_object=None)[source]

deploy real-time (nuclio based) functions

Parameters
  • function – name of the function (in the project) or function object

  • dashboard – url of the remore Nuclio dashboard (when not local)

  • models – list of model items

  • env – dict of extra environment variables

  • tag – extra version tag

:param verbose add verbose prints/logs

mlrun.projects.get_or_create_project(name, context, url=None, secrets=None, init_git=False, subpath='', clone=False, user_project=False, from_template=None)mlrun.projects.project.MlrunProject[source]

Load a project from MLRun DB, or create/import if doesnt exist

example:

# load project from the DB (if exist) or the source repo
project = get_or_create_project("myproj", "./", "git://github.com/mlrun/demo-xgb-project.git")
project.pull("development")  # pull the latest code from git
project.run("main", arguments={'data': data_url})  # run the workflow "main"
Parameters
  • context – project local directory path

  • url – name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip

  • name – project name

  • secrets – key:secret dict or SecretsStore used to download sources

  • init_git – if True, will git init the context dir

  • subpath – project subpath (within the archive)

  • clone – if True, always clone (delete any existing content)

  • user_project – add the current user name to the project name (for db:// prefixes)

  • from_template – path to project YAML file that will be used as from_template (for new projects)

Returns

project object

mlrun.projects.load_project(context, url=None, name=None, secrets=None, init_git=False, subpath='', clone=False, user_project=False)mlrun.projects.project.MlrunProject[source]

Load an MLRun project from git or tar or dir

example:

# load the project and run the 'main' workflow
project = load_project("./", "git://github.com/mlrun/project-demo.git")
project.run("main", arguments={'data': data_url})
Parameters
  • context – project local directory path

  • url – name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip <project-name>

  • name – project name

  • secrets – key:secret dict or SecretsStore used to download sources

  • init_git – if True, will git init the context dir

  • subpath – project subpath (within the archive)

  • clone – if True, always clone (delete any existing content)

  • user_project – add the current user name to the project name (for db:// prefixes)

Returns

project object

mlrun.projects.new_project(name, context=None, init_git=False, user_project=False, remote=None, from_template=None, secrets=None, description=None)mlrun.projects.project.MlrunProject[source]

Create a new MLRun project, optionally load it from a yaml/zip/git template

example:

# create a project with local and marketplace functions, a workflow, and an artifact
project = mlrun.new_project("myproj", "./", init_git=True, description="my new project")
project.set_function('prep_data.py', 'prep-data', image='mlrun/mlrun', handler='prep_data')
project.set_function('hub://sklearn_classifier', 'train')
project.set_artifact('data', Artifact(target_path=data_url))
project.set_workflow('main', "./myflow.py")
project.save()

# run the "main" workflow (watch=True to wait for run completion)
project.run("main", watch=True)

example (load from template):

# create a new project from a zip template (can also use yaml/git templates)
# initialize a local git, and register the git remote path
project = mlrun.new_project("myproj", "./", init_git=True,
                            remote="git://github.com/mlrun/project-demo.git",
                            from_template="http://mysite/proj.zip")
project.run("main", watch=True)
Parameters
  • name – project name

  • context – project local directory path

  • init_git – if True, will git init the context dir

  • user_project – add the current user name to the provided project name (making it unique per user)

  • remote – remote Git url

  • from_template – path to project YAML/zip file that will be used as a template

  • secrets – key:secret dict or SecretsStore used to download sources

  • description – text describing the project

Returns

project object

mlrun.projects.run_function(function: Union[str, mlrun.runtimes.base.BaseRuntime], handler: Optional[str] = None, name: str = '', params: Optional[dict] = None, hyperparams: Optional[dict] = None, hyper_param_options: Optional[mlrun.model.HyperParamOptions] = None, inputs: Optional[dict] = None, outputs: Optional[List[str]] = None, workdir: str = '', labels: Optional[dict] = None, base_task: Optional[mlrun.model.RunTemplate] = None, watch: bool = True, local: bool = False, verbose: Optional[bool] = None, project_object=None)Union[mlrun.model.RunObject, kfp.dsl._container_op.ContainerOp][source]

Run a local or remote task as part of a local/kubeflow pipeline

run_function() allow you to execute a function locally, on a remote cluster, or as part of an automated workflow function can be specified as an object or by name (str), when the function is specified by name it is looked up in the current project eliminating the need to redefine/edit functions.

when functions run as part of a workflow/pipeline (project.run()) some attributes can be set at the run level, e.g. local=True will run all the functions locally, setting artifact_path will direct all outputs to the same path. project runs provide additional notifications/reporting and exception handling. inside a Kubeflow pipeline (KFP) run_function() generates KFP “ContainerOps” which are used to form a DAG some behavior may differ between regular runs and deferred KFP runs.

example (use with function object):

function = mlrun.import_function("hub://sklearn_classifier")
run1 = run_function(function, params={"data": url})

example (use with project):

# create a project with two functions (local and from marketplace)
project = mlrun.new_project(project_name, "./proj)
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://sklearn_classifier", "train")

# run functions (refer to them by name)
run1 = run_function("myfunc", params={"x": 7})
run2 = run_function("train", params={"data": run1.outputs["data"]})

example (use in pipeline):

@dsl.pipeline(name="test pipeline", description="test")
def my_pipe(url=""):
    run1 = run_function("loaddata", params={"url": url})
    run2 = run_function("train", params={"data": run1.outputs["data"]})

project.run(workflow_handler=my_pipe, arguments={"param1": 7})
Parameters
  • function – name of the function (in the project) or function object

  • handler – name of the function handler

  • name – execution name

  • params – input parameters (dict)

  • hyperparams – hyper parameters

  • hyper_param_options – hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions

  • inputs – input objects (dict of key: path)

  • outputs – list of outputs which can pass in the workflow

  • workdir – default input artifacts path

  • labels – labels to tag the job/run with ({key:val, ..})

  • base_task – task object to use as base

  • watch – watch/follow run log, True by default

  • local – run the function locally vs on the runtime/cluster

  • verbose – add verbose prints/logs

Returns

MLRun RunObject or KubeFlow containerOp