Create and load projects

Projects refer to a context directory that holds all the project code and configuration. The context dir is usually mapped to a git repository and/or to an IDE (PyCharm, VSCode, etc.) project.

There are three ways to create/load a project object:

  • new_project() — Create a new MLRun project and optionally load it from a yaml/zip/git template.

  • load_project() — Load a project from a context directory or remote git/zip/tar archive.

  • get_or_create_project() — Load a project from the MLRun DB if it exists, or from a specified context/archive.

Projects can also be loaded and workflows/pipelines can be executed using the CLI (using the mlrun project command).

In this section

Creating a new project

To define a new project from scratch, use new_project(). You must specify a name, location for the context directory (e.g. ./) and other optional parameters (see below). The context dir holds the configuration, code, and workflow files. File paths in the project are relative to the context root.

    # create a project with local and marketplace functions
    project = mlrun.new_project("myproj", "./", init_git=True, description="my new project")
    project.set_function('prep_data.py', 'prep-data', image='mlrun/mlrun', handler='prep_data')
    project.set_function('hub://sklearn_classifier', 'train')
    
    # register a simple named artifact in the project (to be used in workflows)  
    data_url = 'https://s3.wasabisys.com/iguazio/data/iris/iris.data.raw.csv'
    project.set_workflow('main', "./myflow.py")

    # add a multi-stage workflow (./myflow.py) to the project with the name 'main' and save the project 
    project.set_artifact('data', Artifact(target_path=data_url))
    project.save()

    # run the "main" workflow (watch=True to wait for run completion)
    project.run("main", watch=True)

When projects are saved a project.yaml file with project definitions is written to the context dir. Alternatively you can manually create the project.yaml file and load it using load_project() or the from_template parameter. The generated project.yaml for the above project looks like:

kind: project
metadata:
  name: myproj
spec:
  description: my new project
  functions:
  - url: prep_data.py
    name: prep-data
    image: mlrun/mlrun
    handler: prep_data
  - url: hub://sklearn_classifier
    name: train
  workflows:
  - name: main
    path: ./myflow.py
    engine: kfp
  artifacts:
  - kind: ''
    target_path: https://s3.wasabisys.com/iguazio/data/iris/iris.data.raw.csv
    key: data

Projects can also be created from a template (yaml file, zip file, or git repo), allowing users to create reusable skeletons. The content of the zip/tar/git archive is copied into the context dir.

The init_git flag is used to initialize git in the context dir, the remote attribute is used to register the remote git repository URL, and the user_project flag indicates that the project name is unique to the user.

Example of creating a new project from a zip template:

    # create a project from zip, initialize a local git, and register the git remote path
    project = mlrun.new_project("myproj", "./", init_git=True, user_project=True,
                                remote="git://github.com/mlrun/demo-xgb-project.git",
                                from_template="http://mysite/proj.zip")
    # add another marketplace function and save
    project.set_function('hub://test_classifier', 'test')  
    project.save()      

Note

  • Projects are visible in the MLRun dashboard only after they’re saved to the MLRun database (with .save()) or after the workflows are executed (with .run()).

  • You can ensure the project name is unique per user by setting the the user_project parameter to True.

Load and run projects from context, git or archive

When a project is already created and stored in a git archive you can quickly load and use it with the load_project() method. load_project uses a local context directory (with initialized git) or clones a remote repo into the local dir and returns a project object.

You need to provide the path to the context dir and the git/zip/tar archive url. The name can be specified or taken from the project object, they can also specify secrets (repo credentials), init_git flag (to initialize git in the context dir), clone flag (indicating we must clone and ignore/remove local copy), and user_project flag (indicate the project name is unique to the user).

Example of loading a project from git and running the main workflow:

    project = mlrun.load_project("./", "git://github.com/mlrun/project-demo.git")
    project.run("main", arguments={'data': data_url})

Note

If the url parameter is not specified it searches for Git repo inside the context dir and uses its metadata, or uses the init_git=True flag to initialize a Git repo in the target context directory.

Load and run using the CLI

Loading a project from git into ./ :

mlrun project -n myproj -u "git://github.com/mlrun/project-demo.git" .

Running a specific workflow (main) from the project stored in . (current dir):

mlrun project -r main -w .

CLI usage details:

Usage: mlrun project [OPTIONS] [CONTEXT]

Options:
  -n, --name TEXT           project name
  -u, --url TEXT            remote git or archive url
  -r, --run TEXT            run workflow name of .py file
  -a, --arguments TEXT      pipeline arguments name and value tuples (with -r flag),
                            e.g. -a x=6

  -p, --artifact-path TEXT  output artifacts path if not default
  -x, --param TEXT          mlrun project parameter name and value tuples,
                            e.g. -p x=37 -p y='text'

  -s, --secrets TEXT        secrets file=<filename> or env=ENV_KEY1,..
  --init-git                for new projects init git context
  -c, --clone               force override/clone into the context dir
  --sync                    sync functions into db
  -w, --watch               wait for pipeline completion (with -r flag)
  -d, --dirty               allow run with uncommitted git changes

Get a project from DB or create it (get_or_create_project)

If you already have a project saved in the DB and you need to access/use it (for example from a different notebook or file), use the get_or_create_project() method. It first tries to read the project from the DB, and only if it doesn’t exist in the DB it loads/creates it.

Note

If you update the project object from different files/notebooks/users, make sure you .save() your project after a change, and run get_or_create_project to load changes made by others.

Example:

    # load project from the DB (if exist) or the source repo
    project = mlrun.get_or_create_project("myproj", "./", "git://github.com/mlrun/demo-xgb-project.git")
    project.pull("development")  # pull the latest code from git
    project.run("main", arguments={'data': data_url})  # run the workflow "main"

Working with Git

You can update the code using the standard Git process (commit, push). If you update/edit the project object you need to run project.save(), which updates the project.yaml file in your context directory, followed by pushing your updates.

You can use the standard git cli to pull, commit, push, etc. MLRun project syncs with the local git state. You can also use project methods with the same functionality. It simplifies the work for common task but does not expose the full git functionality.

  • pull() — pull/update sources from git or tar into the context dir

  • create_remote() — create remote for the project git

  • push() — save project state and commit/push updates to remote git repo

For example: proj.push(branch, commit_message, add=[]) saves the state to DB & yaml, commits updates, push

Note

You must push updates before you build functions or run workflows which use code from git, since the builder or containers pull the code from the git repo.

If you are using containerized Jupyter you might need to first set your Git parameters, e.g. using the following commands:

git config --global user.email "<my@email.com>"
git config --global user.name "<name>"
git config --global credential.helper store

After that you need to login once to git with your password, as well as restart the notebook.

project.push('master', 'some edits')