mlrun.projects

class mlrun.projects.MlrunProject(name=None, description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, metadata=None, spec=None)[source]

Bases: mlrun.model.ModelObj

property artifact_path

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property artifacts

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

clear_context()[source]

delete all files and clear the context dir

property context

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

create_remote(url, name='origin')[source]

create remote for the project git

Parameters
  • url – remote git url

  • name – name for the remote (default is ‘origin’)

create_vault_secrets(secrets)[source]
property description

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

export(filepath=None)[source]

save the project object into a file (default to project.yaml)

func(key, sync=False)[source]

get function object by name

Parameters

sync – will reload/reinit the function

Returns

function object

property functions

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

get_param(key: str, default=None)[source]

get project param by key

get_run_status(workflow_id, timeout=3600, expected_statuses=None, notifiers: Optional[mlrun.utils.helpers.RunNotifications] = None)[source]
get_secret(key: str)[source]

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through files, env, ..

get_vault_secrets(secrets=None, local=False)[source]
kind = 'project'
log_artifact(item, body=None, tag='', local_path='', artifact_path=None, format=None, upload=True, labels=None, target_path=None)[source]
log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=True, labels=None, format='', preview=None, stats=False, target_path='', extra_data=None, **kwargs)[source]

log a dataset artifact and optionally upload it to datastore

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
context.log_dataset("mydf", df=df, stats=True)
Parameters
  • key – artifact key

  • df – dataframe object

  • local_path – path to the local file we upload, will also be use as the destination subpath (under “artifact_path”)

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • tag – version tag

  • format – optional, format to use (e.g. csv, parquet, ..)

  • target_path – absolute target path (instead of using artifact_path + local_path)

  • preview – number of lines to store as preview in the artifact metadata

  • stats – calculate and store dataset stats in the artifact metadata

  • extra_data – key/value list of extra files/charts to link with this dataset

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

Returns

artifact object

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=True, labels=None, inputs: Optional[List[mlrun.features.Feature]] = None, outputs: Optional[List[mlrun.features.Feature]] = None, feature_vector: Optional[str] = None, feature_weights: Optional[list] = None, training_set=None, label_column=None, extra_data=None)[source]

log a model artifact and optionally upload it to datastore

example:

context.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})
Parameters
  • key – artifact key or artifact class ()

  • body – will use the body as the artifact content

  • model_file – path to the local model file we upload (see also model_dir)

  • model_dir – path to the local dir holding the model file and extra files

  • artifact_path – target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath(‘data’)

  • framework – name of the ML framework

  • algorithm – training algorithm name

  • tag – version tag

  • metrics – key/value dict of model metrics

  • parameters – key/value dict of model parameters

  • inputs – ordered list of model input features (name, type, ..)

  • outputs – ordered list of model output/result elements (name, type, ..)

  • upload – upload to datastore (default is True)

  • labels – a set of key/value labels to tag the artifact with

  • feature_vector – feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights – list of feature weights, one per input column

  • training_set – training set dataframe, used to infer inputs & outputs

  • label_column – which columns in the training set are the label (target) columns

  • extra_data – key/value list of extra files/charts to link with this dataset value can be abs/relative path string | bytes | artifact object

Returns

artifact object

property metadata
property mountdir

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property name

This is a property of the metadata, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property params

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

pull(branch=None, remote=None)[source]

pull/update sources from git or tar into the context dir

Parameters
  • branch – git branch, if not the current one

  • remote – git remote, if other than origin

push(branch, message=None, update=True, remote=None, add: Optional[list] = None)[source]

update spec and push updates to remote git repo

Parameters
  • branch – target git branch

  • message – git commit message

  • update – update files (git add update=True)

  • remote – git remote, default to origin

  • add – list of files to add

register_artifacts()[source]

register the artifacts in the MLRun DB (under this project)

reload(sync=False)[source]

reload the project and function objects from yaml/specs

Parameters

sync – set to True to load functions objects

Returns

project object

remove_function(name)[source]

remove a function from a project

Parameters

name – name of the function (under the project)

run(name=None, workflow_path=None, arguments=None, artifact_path=None, namespace=None, sync=False, watch=False, dirty=False, ttl=None)[source]

run a workflow using kubeflow pipelines

Parameters
  • name – name of the workflow

  • workflow_path – url to a workflow file, if not a project workflow

  • arguments – kubeflow pipelines arguments (parameters)

  • artifact_path – target path/url for workflow artifacts, the string ‘{{workflow.uid}}’ will be replaced by workflow id

  • namespace – kubernetes namespace if other than default

  • sync – force functions sync before run

  • watch – wait for pipeline completion

  • dirty – allow running the workflow when the git repo is dirty

  • ttl – pipeline ttl in secs (after that the pods will be removed)

Returns

run id

save(filepath=None)[source]
save_to_db()[source]
save_workflow(name, target, artifact_path=None, ttl=None)[source]

create and save a workflow as a yaml or archive file

Parameters
  • name – workflow name

  • target – target file path (can end with .yaml or .zip)

  • artifact_path – target path/url for workflow artifacts, the string ‘{{workflow.uid}}’ will be replaced by workflow id

  • ttl – pipeline ttl in secs (after that the pods will be removed)

set_function(func, name='', kind='', image=None, with_repo=None)[source]

update or add a function object to the project

function can be provided as an object (func) or a .py/.ipynb/.yaml url support url prefixes:

object (s3://, v3io://, ..)
MLRun DB e.g. db://project/func:ver
functions hub/market: e.g. hub://sklearn_classifier:master

examples:

proj.set_function(func_object)
proj.set_function('./src/mycode.py', 'ingest',
                  image='myrepo/ing:latest', with_repo=True)
proj.set_function('http://.../mynb.ipynb', 'train')
proj.set_function('./func.yaml')
proj.set_function('hub://get_toy_data', 'getdata')
Parameters
  • func – function object or spec/code url

  • name – name of the function (under the project)

  • kind – runtime kind e.g. job, nuclio, spark, dask, mpijob default: job

  • image – docker image to be used, can also be specified in the function object/yaml

  • with_repo – add (clone) the current repo to the build source

Returns

project object

set_workflow(name, workflow_path: str, embed=False, **args)[source]

add or update a workflow, specify a name and the code path

property source

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

property spec
property status
sync_functions(names: Optional[list] = None, always=True, save=False)[source]

reload function objects from specs and files

with_secrets(kind, source, prefix='')[source]

register a secrets source (file, env or dict)

read secrets from a source provider to be used in workflows,example:

proj.with_secrets('file', 'file.txt')
proj.with_secrets('inline', {'key': 'val'})
proj.with_secrets('env', 'ENV1,ENV2', prefix='PFX_')

Vault secret source has several options:

proj.with_secrets('vault', {'user': <user name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', {'project': <proj. name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', ['secret1', 'secret2' ...])

The 2nd option uses the current project name as context. Can also use empty secret list:

proj.with_secrets('vault', [])

This will enable access to all secrets in vault registered to the current project.

Parameters
  • kind – secret type (file, inline, env, vault)

  • source – secret data or link (see example)

  • prefix – add a prefix to the keys in this source

Returns

project object

property workflows

This is a property of the spec, look there for documentation leaving here for backwards compatibility with users code that used MlrunProjectLegacy

mlrun.projects.load_project(context, url=None, name=None, secrets=None, init_git=False, subpath='', clone=False, user_project=False)[source]

Load an MLRun project from git or tar or dir

Parameters
  • context – project local directory path

  • url – git or tar.gz sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git db://<project-name>

  • name – project name

  • secrets – key:secret dict or SecretsStore used to download sources

  • init_git – if True, will git init the context dir

  • subpath – project subpath (within the archive)

  • clone – if True, always clone (delete any existing content)

  • user_project – add the current user name to the project name (for db:// prefixes)

Returns

project object

mlrun.projects.new_project(name, context=None, init_git=False, user_project=False)[source]

Create a new MLRun project

Parameters
  • name – project name

  • context – project local directory path

  • init_git – if True, will git init the context dir

  • user_project – add the current user name to the provided project name (making it unique per user)

Returns

project object