Using projects

You can add/update a project’s functions, artifacts, or workflows using set_function(), set_artifact(), set_workflow(), and set various project attributes (parameters, secrets, etc.).

Use the project run() method to run a registered workflow using a pipeline engine (e.g. Kubeflow pipelines). The workflow executes its registered functions in a sequence/graph (DAG). The workflow can reference project parameters, secrets and artifacts by name.

Projects can also be loaded and workflows/pipelines can be executed using the CLI (using mlrun project command).

In this section

Updating and using project functions

Projects host or link to functions that are used in job or workflow runs. You add functions to a project using set_function(). This registers them as part of the project definition (and Yaml file). Alternatively you can create functions using methods like code_to_function() and save them to the DB (under the same project). The preferred approach is to use set_function (which also records the functions in the project spec).

The set_function() method allow you to add/update many types of functions:

  • marketplace functions - load/register a marketplace function into the project (func=”hub://…”)

  • notebook file - convert a notebook file into a function (func=”path/to/file.ipynb”)

  • python file - convert a python file into a function (func=”path/to/file.py”)

  • database function - function stored in MLRun DB (func=”db://project/func-name:version”)

  • function yaml file - read the function object from a yaml file (func=”path/to/file.yaml”)

  • inline function spec - save the full function spec in the project definition file (func=func_object), not recommended

When loading a function from code file (py, ipynb) you should also specify a container image and the runtime kind (will use job kind as default). You can optionally specify the function handler (the function handler to invoke), and a name.

If the function is not a single file function, and it requires access to multiple files/libraries in the project, you should set the with_repo=True to add the entire repo code into the destination container during build or run time.

Note

When using with_repo=True the functions need to be deployed (function.deploy()) to build a container, unless you set project.spec.load_source_on_run=True which instructs MLRun to load the git/archive repo into the function container at run time and do not require a build (this is simpler when developing, for production its preferred to build the image with the code)

Examples:

    project.set_function('hub://sklearn_classifier', 'train')
    project.set_function('http://.../mynb.ipynb', 'test', image="mlrun/mlrun")
    project.set_function('./src/mycode.py', 'ingest',
                         image='myrepo/ing:latest', with_repo=True)
    project.set_function('db://project/func-name:version')
    project.set_function('./func.yaml')
    project.set_function(func_object)

once functions are registered or saved in the project we can get their function object using project.get_function(key).

example:

    # get the data-prep function, add volume mount and run it with data input
    project.get_function("data-prep").apply(v3io_mount())
    run = project.run_function("data-prep", inputs={"data": data_url})

Run, Build, and Deploy functions

there are a set of methods used to deploy and run project functions, those can be used interactively or inside a pipeline (inside a pipeline it will be automatically mapped to the relevant pipeline engine command).

  • run_function() - Run a local or remote task as part of a local run or pipeline

  • build_function() - deploy ML function, build container with its dependencies for use in runs

  • deploy_function() - deploy real-time/online (nuclio or serving based) functions

You can use those methods as project methods, or as global (mlrun.) methods, the current project will be assumed for the later case.

run = myproject.run_function("train", inputs={"data": data_url})  # will run the "train" function in myproject
run = mlrun.run_function("train", inputs={"data": data_url})  # will run the "train" function in the current/active project

The first parameter in those three methods is the function name (in the project), or it can be a function object if we want to use functions we imported/created ad hoc, example:

# import a serving function from the marketplace and deploy a tarined model over it
serving = import_function("hub://v2_model_server", new_name="serving")
deploy = deploy_function(
    serving,
    models=[{"key": "mymodel", "model_path": train.outputs["model"]}],
)