mlrun.projects#

class mlrun.projects.MlrunProject(metadata: ProjectMetadata | dict | None = None, spec: ProjectSpec | dict | None = None)[source]#

Bases: ModelObj

add_custom_packager(packager: str, is_mandatory: bool)[source]#

Add a custom packager from the custom packagers list. All project's custom packagers are added to each project function.

Notice that in order to run a function with the custom packagers included, you must set a source for the project (using the project.set_source method) with the parameter pull_at_runtime=True so the source code of the packagers will be able to be imported.

Parameters:
  • packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".

  • is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.

property artifact_path: str#
build_config(image: str | None = None, set_as_default: bool = False, with_mlrun: bool | None = None, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, builder_env: dict | None = None, extra_args: str | None = None, source_code_target_dir: str | None = None)[source]#

specify builder configuration for the project

Parameters:
  • image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used

  • set_as_default -- set image to be the project's default image (default False)

  • with_mlrun -- add the current mlrun package to the container build

  • base_image -- base image name/path

  • commands -- list of docker build (RUN) commands e.g. ['pip install pandas']

  • secret_name -- k8s secret for accessing the docker registry

  • requirements -- a list of packages to install on the built image

  • requirements_file -- requirements file to install on the built image

  • overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones

  • builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP

  • extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"

  • source_code_target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)

build_function(function: str | BaseRuntime, with_mlrun: bool | None = None, skip_deployed: bool = False, image: str | None = None, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, mlrun_version_specifier: str | None = None, builder_env: dict | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, extra_args: str | None = None, force_build: bool = False) BuildStatus | ContainerOp[source]#

deploy ML function, build container with its dependencies

Parameters:
  • function -- name of the function (in the project) or function object

  • with_mlrun -- add the current mlrun package to the container build

  • skip_deployed -- skip the build if we already have an image for the function

  • image -- target image name/path

  • base_image -- base image name/path (commands and source code will be added to it)

  • commands -- list of docker build (RUN) commands e.g. ['pip install pandas']

  • secret_name -- k8s secret for accessing the docker registry

  • requirements -- list of python packages, defaults to None

  • requirements_file -- pip requirements file path, defaults to None

  • mlrun_version_specifier -- which mlrun package version to include (if not current)

  • builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP

  • overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones

  • extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"

  • force_build -- force building the image, even when no changes were made

build_image(image: str | None = None, set_as_default: bool = True, with_mlrun: bool | None = None, skip_deployed: bool = False, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, mlrun_version_specifier: str | None = None, builder_env: dict | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, extra_args: str | None = None, target_dir: str | None = None) BuildStatus | ContainerOp[source]#

Builder docker image for the project, based on the project's build config. Parameters allow to override the build config. If the project has a source configured and pull_at_runtime is not configured, this source will be cloned to the image built. The target_dir parameter allows specifying the target path where the code will be extracted.

Parameters:
  • image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used

  • set_as_default -- set image to be the project's default image (default False)

  • with_mlrun -- add the current mlrun package to the container build

  • skip_deployed -- Deprecated parameter is ignored

  • base_image -- base image name/path (commands and source code will be added to it) defaults to mlrun.mlconf.default_base_image

  • commands -- list of docker build (RUN) commands e.g. ['pip install pandas']

  • secret_name -- k8s secret for accessing the docker registry

  • requirements -- list of python packages, defaults to None

  • requirements_file -- pip requirements file path, defaults to None

  • mlrun_version_specifier -- which mlrun package version to include (if not current)

  • builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP

  • overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones

  • extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"

  • target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)

property context: str#
create_model_monitoring_function(func: str | None = None, application_class: str | ModelMonitoringApplicationBase | None = None, name: str | None = None, image: str | None = None, handler: str | None = None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '', **application_kwargs) BaseRuntime[source]#

Create a monitoring function object without setting it to the project

examples::
project.create_model_monitoring_function(application_class_name="MyApp",

image="mlrun/mlrun", name="myApp")

Parameters:
  • func -- Code url, None refers to current Notebook

  • name -- Name of the function, can be specified with a tag to support versions (e.g. myfunc:v1) Default: job

  • image -- Docker image to be used, can also be specified in the function object/yaml

  • handler -- Default function handler to invoke (can only be set with .py/.ipynb files)

  • with_repo -- Add (clone) the current repo to the build source

  • tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')

  • requirements -- A list of python packages

  • requirements_file -- Path to a python requirements file

  • application_class -- Name or an Instance of a class that implementing the monitoring application.

  • application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.

create_remote(url, name='origin', branch=None)[source]#

Create remote for the project git

This method creates a new remote repository associated with the project's Git repository. If a remote with the specified name already exists, it will not be overwritten.

If you wish to update the URL of an existing remote, use the set_remote method instead.

Parameters:
  • url -- remote git url

  • name -- name for the remote (default is 'origin')

  • branch -- Git branch to use as source

property default_image: str#
delete_api_gateway(name: str)[source]#

Deletes an API gateway by name.

Parameters:

name -- The name of the API gateway to delete.

delete_datastore_profile(profile: str)[source]#
deploy_function(function: str | BaseRuntime, models: list | None = None, env: dict | None = None, tag: str | None = None, verbose: bool | None = None, builder_env: dict | None = None, mock: bool | None = None) DeployStatus | ContainerOp[source]#

deploy real-time (nuclio based) functions

Parameters:
  • function -- name of the function (in the project) or function object

  • models -- list of model items

  • env -- dict of extra environment variables

  • tag -- extra version tag

  • verbose -- add verbose prints/logs

  • builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}

  • mock -- deploy mock server vs a real Nuclio function (for local simulations)

deploy_histogram_data_drift_app(*, image: str = 'mlrun/mlrun', db: RunDBInterface | None = None) None[source]#

Deploy the histogram data drift application.

Parameters:
  • image -- The image on which the application will run.

  • db -- An optional DB object.

property description: str#
disable_model_monitoring(*, delete_histogram_data_drift_app: bool = True) None[source]#

Note: This method is currently not advised for use. See ML-3432. Disable model monitoring by deleting the underlying functions infrastructure from MLRun database.

Parameters:

delete_histogram_data_drift_app -- Whether to delete the histogram data drift app.

enable_model_monitoring(default_controller_image: str = 'mlrun/mlrun', base_period: int = 10, image: str = 'mlrun/mlrun', deploy_histogram_data_drift_app: bool = True) None[source]#

Deploy model monitoring application controller, writer and stream functions. While the main goal of the controller function is to handle the monitoring processing and triggering applications, the goal of the model monitoring writer function is to write all the monitoring application results to the databases. The stream function goal is to monitor the log of the data stream. It is triggered when a new log entry is detected. It processes the new events into statistics that are then written to statistics databases.

Parameters:
  • default_controller_image -- Deprecated.

  • base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes.

  • image -- The image of the model monitoring controller, writer, monitoring stream & histogram data drift functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.

  • deploy_histogram_data_drift_app -- If true, deploy the default histogram-based data drift application.

export(filepath=None, include_files: str | None = None)[source]#

save the project object into a yaml file or zip archive (default to project.yaml)

By default, the project object is exported to a yaml file, when the filepath suffix is '.zip' the project context dir (code files) are also copied into the zip, the archive path can include DataItem urls (for remote object storage, e.g. s3://<bucket>/<path>).

Parameters:
  • filepath -- path to store project .yaml or .zip (with the project dir content)

  • include_files -- glob filter string for selecting files to include in the zip archive

get_api_gateway(name: str) APIGateway[source]#

Retrieves an API gateway by name instance.

Parameters:

name -- The name of the API gateway to retrieve.

Returns:

An instance of APIGateway.

Return type:

mlrun.runtimes.nuclio.APIGateway

get_artifact(key, tag=None, iter=None, tree=None)[source]#

Return an artifact object

Parameters:
  • key -- artifact key

  • tag -- version tag

  • iter -- iteration number (for hyper-param tasks)

  • tree -- the producer id (tree)

Returns:

Artifact object

get_artifact_uri(key: str, category: str = 'artifact', tag: str | None = None, iter: int | None = None) str[source]#

return the project artifact uri (store://..) from the artifact key

example:

uri = project.get_artifact_uri("my_model", category="model", tag="prod", iter=0)
Parameters:
  • key -- artifact key/name

  • category -- artifact category (artifact, model, feature-vector, ..)

  • tag -- artifact version tag, default to latest version

  • iter -- iteration number, default to no iteration

get_custom_packagers() list[tuple[str, bool]][source]#

Get the custom packagers registered in the project.

Returns:

A list of the custom packagers module paths.

get_datastore_profile(profile: str) DatastoreProfile[source]#
get_function(key, sync=False, enrich=False, ignore_cache=False, copy_function=True, tag: str = '') BaseRuntime[source]#

get function object by name

Parameters:
  • key -- name of key for search

  • sync -- will reload/reinit the function from the project spec

  • enrich -- add project info/config/source info to the function object

  • ignore_cache -- read the function object from the DB (ignore the local cache)

  • copy_function -- return a copy of the function object

  • tag -- provide if the function key is tagged under the project (function was set with a tag)

Returns:

function object

get_function_names() list[str][source]#

get a list of all the project function names

get_function_objects() FunctionsDict[source]#

"get a virtual dict with all the project functions ready for use in a pipeline

get_item_absolute_path(url: str, check_path_in_context: bool = False) tuple[str, bool][source]#

Get the absolute path of the artifact or function file :param url: remote url, absolute path or relative path :param check_path_in_context: if True, will check if the path exists when in the context (temporary parameter to allow for backwards compatibility)

Returns:

absolute path / url, whether the path is in the project context

get_param(key: str, default=None)[source]#

get project param by key

get_run_status(run, timeout=None, expected_statuses=None, notifiers: CustomNotificationPusher | None = None)[source]#
get_secret(key: str)[source]#

get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through files, env, ..

get_store_resource(uri)[source]#

get store resource object by uri

import_artifact(item_path: str, new_key=None, artifact_path=None, tag=None)[source]#

Import an artifact object/package from .yaml, .json, or .zip file

Parameters:
  • item_path -- dataitem url or file path to the file/package

  • new_key -- overwrite the artifact key/name

  • artifact_path -- target artifact path (when not using the default)

  • tag -- artifact tag to set

Returns:

artifact object

kind = 'project'#
list_api_gateways() list[mlrun.runtimes.nuclio.api_gateway.APIGateway][source]#

Retrieves a list of Nuclio API gateways associated with the project.

@return: List of APIGateway objects representing the Nuclio API gateways associated with the project.

list_artifacts(name=None, tag=None, labels: dict[str, str] | list[str] | None = None, since=None, until=None, iter: int | None = None, best_iteration: bool = False, kind: str | None = None, category: str | ArtifactCategories | None = None, tree: str | None = None) ArtifactList[source]#

List artifacts filtered by various parameters.

The returned result is an ArtifactList (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, and .to_df() to convert to a DataFrame.

Examples:

# Get latest version of all artifacts in project
latest_artifacts = project.list_artifacts('', tag='latest')
# check different artifact versions for a specific artifact, return as objects list
result_versions = project.list_artifacts('results', tag='*').to_objects()
Parameters:
  • name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for ~name may return artifacts named my_Name_1 or surname.

  • tag -- Return artifacts assigned this tag.

  • labels -- Return artifacts that have these labels. Labels can either be a dictionary {"label": "value"} or a list of "label=value" (match label key and value) or "label" (match just label key) strings.

  • since -- Not in use in HTTPRunDB.

  • until -- Not in use in HTTPRunDB.

  • iter -- Return artifacts from a specific iteration (where iter=0 means the root iteration). If None (default) return artifacts from all iterations.

  • best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using best_iter, the iter parameter must not be used.

  • kind -- Return artifacts of the requested kind.

  • category -- Return artifacts of the requested category.

  • tree -- Return artifacts of the requested tree.

list_datastore_profiles() list[mlrun.datastore.datastore_profile.DatastoreProfile][source]#

Returns a list of datastore profiles associated with the project. The information excludes private details, showcasing only public data.

list_functions(name=None, tag=None, labels=None)[source]#

Retrieve a list of functions, filtered by specific criteria.

example:

functions = project.list_functions(tag="latest")
Parameters:
  • name -- Return only functions with a specific name.

  • tag -- Return function versions with specific tags.

  • labels -- Return functions that have specific labels assigned to them.

Returns:

List of function objects.

list_model_monitoring_functions(name: str | None = None, tag: str | None = None, labels: list[str] | None = None) list | None[source]#

Retrieve a list of all the model monitoring functions. Example:

functions = project.list_model_monitoring_functions()
Parameters:
  • name -- Return only functions with a specific name.

  • tag -- Return function versions with specific tags.

  • labels -- Return functions that have specific labels assigned to them.

Returns:

List of function objects.

list_models(name=None, tag=None, labels: dict[str, str] | list[str] | None = None, since=None, until=None, iter: int | None = None, best_iteration: bool = False, tree: str | None = None)[source]#

List models in project, filtered by various parameters.

Examples:

# Get latest version of all models in project
latest_models = project.list_models('', tag='latest')
Parameters:
  • name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for ~name may return artifacts named my_Name_1 or surname.

  • tag -- Return artifacts assigned this tag.

  • labels -- Return artifacts that have these labels. Labels can either be a dictionary {"label": "value"} or a list of "label=value" (match label key and value) or "label" (match just label key) strings.

  • since -- Not in use in HTTPRunDB.

  • until -- Not in use in HTTPRunDB.

  • iter -- Return artifacts from a specific iteration (where iter=0 means the root iteration). If None (default) return artifacts from all iterations.

  • best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using best_iter, the iter parameter must not be used.

  • tree -- Return artifacts of the requested tree.

list_runs(name: str | None = None, uid: str | list[str] | None = None, labels: str | list[str] | None = None, state: str | None = None, sort: bool = True, last: int = 0, iter: bool = False, start_time_from: datetime | None = None, start_time_to: datetime | None = None, last_update_time_from: datetime | None = None, last_update_time_to: datetime | None = None, **kwargs) RunList[source]#

Retrieve a list of runs, filtered by various options.

The returned result is a `` (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, .to_df() to convert to a DataFrame, and compare() to generate comparison table and PCP plot.

Example:

# return a list of runs matching the name and label and compare
runs = project.list_runs(name='download', labels='owner=admin')
runs.compare()

# multi-label filter can also be provided
runs = project.list_runs(name='download', labels=["kind=job", "owner=admin"])

# If running in Jupyter, can use the .show() function to display the results
project.list_runs(name='').show()
Parameters:
  • name -- Name of the run to retrieve.

  • uid -- Unique ID of the run.

  • labels -- A list of labels to filter by. Label filters work by either filtering a specific value of a label (i.e. list("key=value")) or by looking for the existence of a given key (i.e. "key").

  • state -- List only runs whose state is specified.

  • sort -- Whether to sort the result according to their start time. Otherwise, results will be returned by their internal order in the DB (order will not be guaranteed).

  • last -- Deprecated - currently not used (will be removed in 1.8.0).

  • iter -- If True return runs from all iterations. Otherwise, return only runs whose iter is 0.

  • start_time_from -- Filter by run start time in [start_time_from, start_time_to].

  • start_time_to -- Filter by run start time in [start_time_from, start_time_to].

  • last_update_time_from -- Filter by run last update time in (last_update_time_from, last_update_time_to).

  • last_update_time_to -- Filter by run last update time in (last_update_time_from, last_update_time_to).

log_artifact(item, body=None, tag='', local_path='', artifact_path=None, format=None, upload=None, labels=None, target_path=None, **kwargs)[source]#

Log an output artifact and optionally upload it to datastore

If the artifact already exists with the same key and tag, it will be overwritten.

example:

project.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)
Parameters:
  • item -- artifact key or artifact object (can be any type, such as dataset, model, feature store)

  • body -- will use the body as the artifact content

  • local_path -- path to the local file we upload, will also be use as the destination subpath (under "artifact_path")

  • artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')

  • format -- artifact file format: csv, png, ..

  • tag -- version tag

  • target_path -- absolute target path (instead of using artifact_path + local_path)

  • upload -- upload to datastore (default is True)

  • labels -- a set of key/value labels to tag the artifact with

Returns:

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=None, labels=None, format='', preview=None, stats=None, target_path='', extra_data=None, label_column: str | None = None, **kwargs) DatasetArtifact[source]#

Log a dataset artifact and optionally upload it to datastore.

If the dataset already exists with the same key and tag, it will be overwritten.

example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "age", "testScore"])
project.log_dataset("mydf", df=df, stats=True)
Parameters:
  • key -- artifact key

  • df -- dataframe object

  • label_column -- name of the label column (the one holding the target (y) values)

  • local_path -- path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.

  • artifact_path -- target artifact path (when not using the default). to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')

  • tag -- version tag

  • format -- optional, format to use (csv, parquet, pq, tsdb, kv)

  • target_path -- absolute target path (instead of using artifact_path + local_path)

  • preview -- number of lines to store as preview in the artifact metadata

  • stats -- calculate and store dataset stats in the artifact metadata

  • extra_data -- key/value list of extra files/charts to link with this dataset

  • upload -- upload to datastore (default is True)

  • labels -- a set of key/value labels to tag the artifact with

Returns:

artifact object

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=None, labels=None, inputs: list[mlrun.features.Feature] | None = None, outputs: list[mlrun.features.Feature] | None = None, feature_vector: str | None = None, feature_weights: list | None = None, training_set=None, label_column=None, extra_data=None, **kwargs)[source]#

Log a model artifact and optionally upload it to datastore

If the model already exists with the same key and tag, it will be overwritten.

example:

project.log_model("model", body=dumps(model),
                  model_file="model.pkl",
                  metrics=context.results,
                  training_set=training_df,
                  label_column='label',
                  feature_vector=feature_vector_uri,
                  labels={"app": "fraud"})
Parameters:
  • key -- artifact key or artifact class ()

  • body -- will use the body as the artifact content

  • model_file -- path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)

  • model_dir -- path to the local dir holding the model file and extra files

  • artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')

  • framework -- name of the ML framework

  • algorithm -- training algorithm name

  • tag -- version tag

  • metrics -- key/value dict of model metrics

  • parameters -- key/value dict of model parameters

  • inputs -- ordered list of model input features (name, type, ..)

  • outputs -- ordered list of model output/result elements (name, type, ..)

  • upload -- upload to datastore (if not specified, defaults to True (uploads artifact))

  • labels -- a set of key/value labels to tag the artifact with

  • feature_vector -- feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights -- list of feature weights, one per input column

  • training_set -- training set dataframe, used to infer inputs & outputs

  • label_column -- which columns in the training set are the label (target) columns

  • extra_data -- key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object

Returns:

artifact object

property metadata: ProjectMetadata#
property mountdir: str#
property name: str#

Project name, this is a property of the project metadata

property notifiers#
property params: dict#
pull(branch: str | None = None, remote: str | None = None, secrets: SecretsStore | dict | None = None)[source]#

pull/update sources from git or tar into the context dir

Parameters:
  • branch -- git branch, if not the current one

  • remote -- git remote, if other than origin

  • secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}

push(branch, message=None, update=True, remote: str | None = None, add: list | None = None, author_name: str | None = None, author_email: str | None = None, secrets: SecretsStore | dict | None = None)[source]#

update spec and push updates to remote git repo

Parameters:
  • branch -- target git branch

  • message -- git commit message

  • update -- update files (git add update=True)

  • remote -- git remote, default to origin

  • add -- list of files to add

  • author_name -- author's git user name to be used on this commit

  • author_email -- author's git user email to be used on this commit

  • secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}

register_artifacts()[source]#

register the artifacts in the MLRun DB (under this project)

register_datastore_profile(profile: DatastoreProfile)[source]#
reload(sync=False, context=None) MlrunProject[source]#

reload the project and function objects from the project yaml/specs

Parameters:
  • sync -- set to True to load functions objects

  • context -- context directory (where the yaml and code exist)

Returns:

project object

remove_custom_packager(packager: str)[source]#

Remove a custom packager from the custom packagers list.

Parameters:

packager -- The packager module path to remove.

Raises:

MLRunInvalidArgumentError -- In case the packager was not in the list.

remove_function(name)[source]#

remove the specified function from the project

Parameters:

name -- name of the function (under the project)

remove_model_monitoring_function(name)[source]#

remove the specified model-monitoring-app function from the project and from the db

Parameters:

name -- name of the model-monitoring-app function (under the project)

remove_remote(name)[source]#

Remove a remote from the project's Git repository.

This method removes the remote repository associated with the specified name from the project's Git repository.

Parameters:

name -- Name of the remote to remove.

run(name: str | None = None, workflow_path: str | None = None, arguments: dict[str, Any] | None = None, artifact_path: str | None = None, workflow_handler: str | Callable | None = None, namespace: str | None = None, sync: bool = False, watch: bool = False, dirty: bool = False, engine: str | None = None, local: bool | None = None, schedule: str | ScheduleCronTrigger | bool | None = None, timeout: int | None = None, source: str | None = None, cleanup_ttl: int | None = None, notifications: list[mlrun.model.Notification] | None = None) _PipelineRunStatus[source]#

Run a workflow using kubeflow pipelines

Parameters:
  • name -- Name of the workflow

  • workflow_path -- URL to a workflow file, if not a project workflow

  • arguments -- Kubeflow pipelines arguments (parameters)

  • artifact_path -- Target path/URL for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id.

  • workflow_handler -- Workflow function handler (for running workflow function directly)

  • namespace -- Kubernetes namespace if other than default

  • sync -- Force functions sync before run

  • watch -- Wait for pipeline completion

  • dirty -- Allow running the workflow when the git repo is dirty

  • engine -- Workflow engine running the workflow. Supported values are 'kfp' (default), 'local' or 'remote'. For setting engine for remote running use 'remote:local' or 'remote:kfp'.

  • local -- Run local pipeline with local functions (set local=True in function.run())

  • schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron for using the pre-defined workflow's schedule, set schedule=True

  • timeout -- Timeout in seconds to wait for pipeline completion (watch will be activated)

  • source --

    Source to use instead of the actual project.spec.source (used when engine is remote). Can be a one of:

    1. Remote URL which is loaded dynamically to the workflow runner.

    2. A path to the project's context on the workflow runner's image.

    Path can be absolute or relative to project.spec.build.source_code_target_dir if defined (enriched when building a project image with source, see MlrunProject.build_image). For other engines the source is used to validate that the code is up-to-date.

  • cleanup_ttl -- Pipeline cleanup ttl in secs (time to wait after workflow completion, at which point the Workflow and all its resources are deleted)

  • notifications -- List of notifications to send for workflow completion

Returns:

Run id

run_function(function: str | BaseRuntime, handler: str | None = None, name: str = '', params: dict | None = None, hyperparams: dict | None = None, hyper_param_options: HyperParamOptions | None = None, inputs: dict | None = None, outputs: list[str] | None = None, workdir: str = '', labels: dict | None = None, base_task: RunTemplate | None = None, watch: bool = True, local: bool | None = None, verbose: bool | None = None, selector: str | None = None, auto_build: bool | None = None, schedule: str | ScheduleCronTrigger | None = None, artifact_path: str | None = None, notifications: list[mlrun.model.Notification] | None = None, returns: list[Union[str, dict[str, str]]] | None = None, builder_env: dict | None = None) RunObject | ContainerOp[source]#

Run a local or remote task as part of a local/kubeflow pipeline

example (use with project):

# create a project with two functions (local and from hub)
project = mlrun.new_project(project_name, "./proj")
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://auto-trainer", "train")

# run functions (refer to them by name)
run1 = project.run_function("myfunc", params={"x": 7})
run2 = project.run_function("train", params={"label_columns": LABELS},
                                     inputs={"dataset":run1.outputs["data"]})
Parameters:
  • function -- name of the function (in the project) or function object

  • handler -- name of the function handler

  • name -- execution name

  • params -- input parameters (dict)

  • hyperparams -- hyper parameters

  • selector -- selection criteria for hyper params e.g. "max.accuracy"

  • hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions

  • inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".

  • outputs -- list of outputs which can pass in the workflow

  • workdir -- default input artifacts path

  • labels -- labels to tag the job/run with ({key:val, ..})

  • base_task -- task object to use as base

  • watch -- watch/follow run log, True by default

  • local -- run the function locally vs on the runtime/cluster

  • verbose -- add verbose prints/logs

  • auto_build -- when set to True and the function require build it will be built on the first function run, use only if you dont plan on changing the build config between runs

  • schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron

  • artifact_path -- path to store artifacts, when running in a workflow this will be set automatically

  • notifications -- list of notifications to push when the run is completed

  • returns --

    List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:

    • A string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used.

    • A dictionary of configurations to use when logging. Further info per object type and artifact type can be given there. The artifact key must appear in the dictionary as "key": "the_key".

  • builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}

Returns:

MLRun RunObject or KubeFlow containerOp

save(filepath=None, store=True)[source]#

export project to yaml file and save project in database

Store:

if True, allow updating in case project already exists

save_to_db(store=True)[source]#

save project to database

Store:

if True, allow updating in case project already exists

save_workflow(name, target, artifact_path=None, ttl=None)[source]#

create and save a workflow as a yaml or archive file

Parameters:
  • name -- workflow name

  • target -- target file path (can end with .yaml or .zip)

  • artifact_path -- target path/url for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id

  • ttl -- pipeline ttl (time to live) in secs (after that the pods will be removed)

set_artifact(key, artifact: str | dict | Artifact | None = None, target_path: str | None = None, tag: str | None = None)[source]#

add/set an artifact in the project spec (will be registered on load)

example:

# register a simple file artifact
project.set_artifact('data', target_path=data_url)
# register a model artifact
project.set_artifact('model', ModelArtifact(model_file="model.pkl"), target_path=model_dir_url)

# register a path to artifact package (will be imported on project load)
# to generate such package use `artifact.export(target_path)`
project.set_artifact('model', 'https://mystuff.com/models/mymodel.zip')
Parameters:
  • key -- artifact key/name

  • artifact -- mlrun Artifact object/dict (or its subclasses) or path to artifact file to import (yaml/json/zip), relative paths are relative to the context path

  • target_path -- absolute target path url (point to the artifact content location)

  • tag -- artifact tag

set_default_image(default_image: str)[source]#

Set the default image to be used for running runtimes (functions) in this project. This image will be used if an image was not provided for a runtime. In case the default image is replaced, functions already registered with the project that used the previous default image will have their image replaced on next execution.

Parameters:

default_image -- Default image to use

set_function(func: str | BaseRuntime | None = None, name: str = '', kind: str = 'job', image: str | None = None, handler: str | None = None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '') BaseRuntime[source]#

update or add a function object to the project

function can be provided as an object (func) or a .py/.ipynb/.yaml url support url prefixes:

object (s3://, v3io://, ..)
MLRun DB e.g. db://project/func:ver
functions hub/market: e.g. hub://auto-trainer:master

examples:

proj.set_function(func_object)
proj.set_function('./src/mycode.py', 'ingest',
                  image='myrepo/ing:latest', with_repo=True)
proj.set_function('http://.../mynb.ipynb', 'train')
proj.set_function('./func.yaml')
proj.set_function('hub://get_toy_data', 'getdata')

# set function requirements

# by providing a list of packages
proj.set_function('my.py', requirements=["requests", "pandas"])

# by providing a path to a pip requirements file
proj.set_function('my.py', requirements="requirements.txt")
Parameters:
  • func -- Function object or spec/code url, None refers to current Notebook

  • name -- Name of the function (under the project), can be specified with a tag to support Versions (e.g. myfunc:v1). If the tag parameter is provided, the tag in the name must match the tag parameter. Specifying a tag in the name will update the project's tagged function (myfunc:v1)

  • kind -- Runtime kind e.g. job, nuclio, spark, dask, mpijob Default: job

  • image -- Docker image to be used, can also be specified in the function object/yaml

  • handler -- Default function handler to invoke (can only be set with .py/.ipynb files)

  • with_repo -- Add (clone) the current repo to the build source

  • tag -- Function version tag to set (none for current or 'latest') Specifying a tag as a parameter will update the project's tagged function (myfunc:v1) and the untagged function (myfunc)

  • requirements -- A list of python packages

  • requirements_file -- Path to a python requirements file

Returns:

function object

set_model_monitoring_credentials(access_key: str | None = None, endpoint_store_connection: str | None = None, stream_path: str | None = None)[source]#

Set the credentials that will be used by the project's model monitoring infrastructure functions.

Parameters:
  • access_key -- Model Monitoring access key for managing user permissions

  • access_key -- Model Monitoring access key for managing user permissions

  • endpoint_store_connection -- Endpoint store connection string

  • stream_path -- Path to the model monitoring stream

set_model_monitoring_function(func: str | BaseRuntime | None = None, application_class: str | ModelMonitoringApplicationBase | None = None, name: str | None = None, image: str | None = None, handler=None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '', **application_kwargs) BaseRuntime[source]#

Update or add a monitoring function to the project. Note: to deploy the function after linking it to the project, call fn.deploy() where fn is the object returned by this method.

examples::
project.set_model_monitoring_function(

name="myApp", application_class="MyApp", image="mlrun/mlrun"

)

Parameters:
  • func -- Function object or spec/code url, None refers to current Notebook

  • name -- Name of the function (under the project), can be specified with a tag to support versions (e.g. myfunc:v1) Default: job

  • image -- Docker image to be used, can also be specified in the function object/yaml

  • handler -- Default function handler to invoke (can only be set with .py/.ipynb files)

  • with_repo -- Add (clone) the current repo to the build source

  • tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')

  • requirements -- A list of python packages

  • requirements_file -- Path to a python requirements file

  • application_class -- Name or an Instance of a class that implements the monitoring application.

  • application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.

set_remote(url, name='origin', branch=None, overwrite=True)[source]#

Create or update a remote for the project git repository.

This method allows you to manage remote repositories associated with the project. It checks if a remote with the specified name already exists.

If a remote with the same name does not exist, it will be created. If a remote with the same name already exists, the behavior depends on the value of the 'overwrite' flag.

Parameters:
  • url -- remote git url

  • name -- name for the remote (default is 'origin')

  • branch -- Git branch to use as source

  • overwrite -- if True (default), updates the existing remote with the given URL if it already exists. if False, raises an error when attempting to create a remote with a name that already exists.

Raises:

MLRunConflictError -- If a remote with the same name already exists and overwrite is set to False.

set_secrets(secrets: dict | None = None, file_path: str | None = None, provider: str | SecretProviderName | None = None)[source]#

set project secrets from dict or secrets env file when using a secrets file it should have lines in the form KEY=VALUE, comment line start with "#" V3IO paths/credentials and MLrun service API address are dropped from the secrets

example secrets file:

# this is an env file
AWS_ACCESS_KEY_ID-XXXX
AWS_SECRET_ACCESS_KEY=YYYY

usage:

# read env vars from dict or file and set as project secrets
project.set_secrets({"SECRET1": "value"})
project.set_secrets(file_path="secrets.env")
Parameters:
  • secrets -- dict with secrets key/value

  • file_path -- path to secrets file

  • provider -- MLRun secrets provider

set_source(source: str = '', pull_at_runtime: bool = False, workdir: str | None = None)[source]#

set the project source code path(can be git/tar/zip archive)

Parameters:
  • source -- valid absolute path or URL to git, zip, or tar file, (or None for current) e.g. git://github.com/mlrun/something.git http://some/url/file.zip note path source must exist on the image or exist locally when run is local (it is recommended to use 'workdir' when source is a filepath instead)

  • pull_at_runtime -- load the archive into the container at job runtime vs on build/deploy

  • workdir -- workdir path relative to the context dir or absolute

set_workflow(name, workflow_path: str, embed: bool = False, engine: str | None = None, args_schema: list[mlrun.model.EntrypointParam] | None = None, handler: str | None = None, schedule: str | ScheduleCronTrigger | None = None, ttl: int | None = None, image: str | None = None, **args)[source]#

Add or update a workflow, specify a name and the code path

Parameters:
  • name -- Name of the workflow

  • workflow_path -- URL (remote) / Path (absolute or relative to the project code path i.e. <project.spec.get_code_path()>/<workflow_path>) for the workflow file.

  • embed -- Add the workflow code into the project.yaml

  • engine -- Workflow processing engine ("kfp", "local", "remote" or "remote:local")

  • args_schema -- List of arg schema definitions (:py:class`~mlrun.model.EntrypointParam`)

  • handler -- Workflow function handler

  • schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron Note that "local" engine does not support this argument

  • ttl -- Pipeline ttl in secs (after that the pods will be removed)

  • image -- Image for workflow runner job, only for scheduled and remote workflows

  • args -- Argument values (key=value, ..)

setup(save: bool = True) MlrunProject[source]#

Run the project setup file if found

When loading a project MLRun will look for a project_setup.py file, if it is found it will execute the setup(project) handler, which can enrich the project with additional objects, functions, artifacts, etc.

Parameters:

save -- save the project after the setup

property source: str#
property spec: ProjectSpec#
property status: ProjectStatus#
store_api_gateway(api_gateway: APIGateway, wait_for_readiness=True, max_wait_time=90) APIGateway[source]#

Creates or updates a Nuclio API Gateway using the provided APIGateway object.

This method interacts with the MLRun service to create/update a Nuclio API Gateway based on the provided APIGateway object. Once done, it returns the updated APIGateway object containing all fields propagated on MLRun and Nuclio sides, such as the 'host' attribute. Nuclio docs here: https://docs.nuclio.io/en/latest/reference/api-gateway/http.html

Parameters:

api_gateway -- An instance of APIGateway representing the configuration

of the API Gateway to be created or updated. :param wait_for_readiness: (Optional) A boolean indicating whether to wait for the API Gateway to become ready

after creation or update (default is True)

Parameters:

max_wait_time -- (Optional) Maximum time to wait for API Gateway readiness in seconds (default is 90s)

@return: An instance of APIGateway with all fields populated based on the information retrieved from the Nuclio API

sync_functions(names: list | None = None, always=True, save=False)[source]#

reload function objects from specs and files

update_model_monitoring_controller(base_period: int = 10, image: str = 'mlrun/mlrun') None[source]#

Redeploy model monitoring application controller functions.

Parameters:
  • base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes.

  • image -- The image of the model monitoring controller, writer & monitoring stream functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.

Returns:

model monitoring controller job as a dictionary.

with_secrets(kind, source, prefix='')[source]#

register a secrets source (file, env or dict)

read secrets from a source provider to be used in workflows, example:

proj.with_secrets('file', 'file.txt')
proj.with_secrets('inline', {'key': 'val'})
proj.with_secrets('env', 'ENV1,ENV2', prefix='PFX_')

Vault secret source has several options:

proj.with_secrets('vault', {'user': <user name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', {'project': <proj.name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', ['secret1', 'secret2' ...])

The 2nd option uses the current project name as context. Can also use empty secret list:

proj.with_secrets('vault', [])

This will enable access to all secrets in vault registered to the current project.

Parameters:
  • kind -- secret type (file, inline, env, vault)

  • source -- secret data or link (see example)

  • prefix -- add a prefix to the keys in this source

Returns:

project object

property workflows: list#
class mlrun.projects.ProjectMetadata(name=None, created=None, labels=None, annotations=None)[source]#

Bases: ModelObj

property name: str#

Project name

static validate_project_labels(labels: dict, raise_on_failure: bool = True) bool[source]#

This https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set

static validate_project_name(name: str, raise_on_failure: bool = True) bool[source]#
class mlrun.projects.ProjectSpec(description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, source=None, subpath=None, origin_url=None, goals=None, load_source_on_run=None, default_requirements: str | list[str] | None = None, desired_state='online', owner=None, disable_auto_mount=None, workdir=None, default_image=None, build=None, custom_packagers: list[tuple[str, bool]] | None = None, default_function_node_selector=None)[source]#

Bases: ModelObj

add_custom_packager(packager: str, is_mandatory: bool)[source]#

Add a custom packager from the custom packagers list.

Parameters:
  • packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".

  • is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.

property artifacts: list#

list of artifacts used in this project

property build: ImageBuilder#
property functions: list#

list of function object/specs used in this project

get_code_path()[source]#

Get the path to the code root/workdir

property mountdir: str#

specify to mount the context dir inside the function container use '.' to use the same path as in the client e.g. Jupyter

remove_artifact(key)[source]#
remove_custom_packager(packager: str)[source]#

Remove a custom packager from the custom packagers list.

Parameters:

packager -- The packager module path to remove.

Raises:

MLRunInvalidArgumentError -- In case the packager was not in the list.

remove_function(name)[source]#
remove_workflow(name)[source]#
set_artifact(key, artifact)[source]#
set_function(name, function_object, function_dict)[source]#
set_workflow(name, workflow)[source]#
property source: str#

source url or git repo

property workflows: list[dict]#

list of workflows specs dicts used in this project

Type:

returns

class mlrun.projects.ProjectStatus(state=None)[source]#

Bases: ModelObj

mlrun.projects.build_function(function: str | BaseRuntime, with_mlrun: bool | None = None, skip_deployed: bool = False, image=None, base_image=None, commands: list | None = None, secret_name=None, requirements: str | list[str] | None = None, requirements_file: str | None = None, mlrun_version_specifier=None, builder_env: dict | None = None, project_object=None, overwrite_build_params: bool = False, extra_args: str | None = None, force_build: bool = False) BuildStatus | ContainerOp[source]#

deploy ML function, build container with its dependencies

Parameters:
  • function -- Name of the function (in the project) or function object

  • with_mlrun -- Add the current mlrun package to the container build

  • skip_deployed -- Skip the build if we already have an image for the function

  • image -- Target image name/path

  • base_image -- Base image name/path (commands and source code will be added to it)

  • commands -- List of docker build (RUN) commands e.g. ['pip install pandas']

  • secret_name -- K8s secret for accessing the docker registry

  • requirements -- List of python packages, defaults to None

  • requirements_file -- pip requirements file path, defaults to None

  • mlrun_version_specifier -- which mlrun package version to include (if not current)

  • builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP

  • project_object -- Override the project object to use, will default to the project set in the runtime context.

  • overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones

  • extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"

  • force_build -- Force building the image, even when no changes were made

mlrun.projects.deploy_function(function: str | BaseRuntime, models: list | None = None, env: dict | None = None, tag: str | None = None, verbose: bool | None = None, builder_env: dict | None = None, project_object=None, mock: bool | None = None) DeployStatus | ContainerOp[source]#

deploy real-time (nuclio based) functions

Parameters:
  • function -- name of the function (in the project) or function object

  • models -- list of model items

  • env -- dict of extra environment variables

  • tag -- extra version tag

  • verbose -- add verbose prints/logs

  • builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}

  • mock -- deploy mock server vs a real Nuclio function (for local simulations)

  • project_object -- override the project object to use, will default to the project set in the runtime context.

mlrun.projects.get_or_create_project(name: str, context: str = './', url: str | None = None, secrets: dict | None = None, init_git=False, subpath: str | None = None, clone: bool = False, user_project: bool = False, from_template: str | None = None, save: bool = True, parameters: dict | None = None) MlrunProject[source]#

Load a project from MLRun DB, or create/import if it does not exist

MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.

Usage example:

# load project from the DB (if exist) or the source repo
project = get_or_create_project("myproj", "./", "git://github.com/mlrun/demo-xgb-project.git")
project.pull("development")  # pull the latest code from git
project.run("main", arguments={'data': data_url})  # run the workflow "main"

project_setup.py example:

def setup(project):
    train_function = project.set_function(
        "src/trainer.py",
        name="mpi-training",
        kind="mpijob",
        image="mlrun/mlrun",
    )
    # Set the number of replicas for the training from the project parameter
    train_function.spec.replicas = project.spec.params.get("num_replicas", 1)
    return project
Parameters:
  • name -- project name

  • context -- project local directory path (default value = "./")

  • url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip

  • secrets -- key:secret dict or SecretsStore used to download sources

  • init_git -- if True, will execute git init on the context dir

  • subpath -- project subpath (within the archive/context)

  • clone -- if True, always clone (delete any existing content)

  • user_project -- add the current username to the project name (for db:// prefixes)

  • from_template -- path to project YAML file that will be used as from_template (for new projects)

  • save -- whether to save the created project in the DB

  • parameters -- key/value pairs to add to the project.spec.params

Returns:

project object

mlrun.projects.load_project(context: str = './', url: str | None = None, name: str | None = None, secrets: dict | None = None, init_git: bool = False, subpath: str | None = None, clone: bool = False, user_project: bool = False, save: bool = True, sync_functions: bool = False, parameters: dict | None = None) MlrunProject[source]#

Load an MLRun project from git or tar or dir

MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.

Usage example:

# Load the project and run the 'main' workflow.
# When using git as the url source the context directory must be an empty or
# non-existent folder as the git repo will be cloned there
project = load_project("./demo_proj", "git://github.com/mlrun/project-demo.git")
project.run("main", arguments={'data': data_url})

project_setup.py example:

def setup(project):
    train_function = project.set_function(
        "src/trainer.py",
        name="mpi-training",
        kind="mpijob",
        image="mlrun/mlrun",
    )
    # Set the number of replicas for the training from the project parameter
    train_function.spec.replicas = project.spec.params.get("num_replicas", 1)
    return project
Parameters:
  • context -- project local directory path (default value = "./")

  • url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip <project-name> The git project should include the project yaml file. If the project yaml file is in a sub-directory, must specify the sub-directory.

  • name -- project name

  • secrets -- key:secret dict or SecretsStore used to download sources

  • init_git -- if True, will git init the context dir

  • subpath -- project subpath (within the archive)

  • clone -- if True, always clone (delete any existing content)

  • user_project -- add the current username to the project name (for db:// prefixes)

  • save -- whether to save the created project and artifact in the DB

  • sync_functions -- sync the project's functions into the project object (will be saved to the DB if save=True)

  • parameters -- key/value pairs to add to the project.spec.params

Returns:

project object

mlrun.projects.new_project(name, context: str = './', init_git: bool = False, user_project: bool = False, remote: str | None = None, from_template: str | None = None, secrets: dict | None = None, description: str | None = None, subpath: str | None = None, save: bool = True, overwrite: bool = False, parameters: dict | None = None, default_function_node_selector: dict | None = None) MlrunProject[source]#

Create a new MLRun project, optionally load it from a yaml/zip/git template

A new project is created and returned, you can customize the project by placing a project_setup.py file in the project root dir, it will be executed upon project creation or loading.

example:

# create a project with local and hub functions, a workflow, and an artifact
project = mlrun.new_project("myproj", "./", init_git=True, description="my new project")
project.set_function('prep_data.py', 'prep-data', image='mlrun/mlrun', handler='prep_data')
project.set_function('hub://auto-trainer', 'train')
project.set_artifact('data', Artifact(target_path=data_url))
project.set_workflow('main', "./myflow.py")
project.save()

# run the "main" workflow (watch=True to wait for run completion)
project.run("main", watch=True)

example (load from template):

# create a new project from a zip template (can also use yaml/git templates)
# initialize a local git, and register the git remote path
project = mlrun.new_project("myproj", "./", init_git=True,
                            remote="git://github.com/mlrun/project-demo.git",
                            from_template="http://mysite/proj.zip")
project.run("main", watch=True)

example using project_setup.py to init the project objects:

def setup(project):
    project.set_function('prep_data.py', 'prep-data', image='mlrun/mlrun', handler='prep_data')
    project.set_function('hub://auto-trainer', 'train')
    project.set_artifact('data', Artifact(target_path=data_url))
    project.set_workflow('main', "./myflow.py")
    return project
Parameters:
  • name -- project name

  • context -- project local directory path (default value = "./")

  • init_git -- if True, will git init the context dir

  • user_project -- add the current username to the provided project name (making it unique per user)

  • remote -- remote Git url

  • from_template -- path to project YAML/zip file that will be used as a template

  • secrets -- key:secret dict or SecretsStore used to download sources

  • description -- text describing the project

  • subpath -- project subpath (relative to the context dir)

  • save -- whether to save the created project in the DB

  • overwrite -- overwrite project using 'cascade' deletion strategy (deletes project resources) if project with name exists

  • parameters -- key/value pairs to add to the project.spec.params

  • default_function_node_selector -- defines the default node selector for scheduling functions within the project

Returns:

project object

mlrun.projects.run_function(function: str | BaseRuntime, handler: str | None = None, name: str = '', params: dict | None = None, hyperparams: dict | None = None, hyper_param_options: HyperParamOptions | None = None, inputs: dict | None = None, outputs: list[str] | None = None, workdir: str = '', labels: dict | None = None, base_task: RunTemplate | None = None, watch: bool = True, local: bool | None = None, verbose: bool | None = None, selector: str | None = None, project_object=None, auto_build: bool | None = None, schedule: str | ScheduleCronTrigger | None = None, artifact_path: str | None = None, notifications: list[mlrun.model.Notification] | None = None, returns: list[Union[str, dict[str, str]]] | None = None, builder_env: list | None = None) RunObject | ContainerOp[source]#

Run a local or remote task as part of a local/kubeflow pipeline

run_function() allow you to execute a function locally, on a remote cluster, or as part of an automated workflow function can be specified as an object or by name (str), when the function is specified by name it is looked up in the current project eliminating the need to redefine/edit functions.

when functions run as part of a workflow/pipeline (project.run()) some attributes can be set at the run level, e.g. local=True will run all the functions locally, setting artifact_path will direct all outputs to the same path. project runs provide additional notifications/reporting and exception handling. inside a Kubeflow pipeline (KFP) run_function() generates KFP "ContainerOps" which are used to form a DAG some behavior may differ between regular runs and deferred KFP runs.

example (use with function object):

LABELS = "is_error"
MODEL_CLASS = "sklearn.ensemble.RandomForestClassifier"
DATA_PATH = "s3://bigdata/data.parquet"
function = mlrun.import_function("hub://auto-trainer")
run1 = run_function(function, params={"label_columns": LABELS, "model_class": MODEL_CLASS},
                              inputs={"dataset": DATA_PATH})

example (use with project):

# create a project with two functions (local and from hub)
project = mlrun.new_project(project_name, "./proj)
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://auto-trainer", "train")

# run functions (refer to them by name)
run1 = run_function("myfunc", params={"x": 7})
run2 = run_function("train", params={"label_columns": LABELS, "model_class": MODEL_CLASS},
                             inputs={"dataset": run1.outputs["data"]})

example (use in pipeline):

@dsl.pipeline(name="test pipeline", description="test")
def my_pipe(url=""):
    run1 = run_function("loaddata", params={"url": url}, outputs=["data"])
    run2 = run_function("train", params={"label_columns": LABELS, "model_class": MODEL_CLASS},
                                 inputs={"dataset": run1.outputs["data"]})

project.run(workflow_handler=my_pipe, arguments={"param1": 7})
Parameters:
  • function -- name of the function (in the project) or function object

  • handler -- name of the function handler

  • name -- execution name

  • params -- input parameters (dict)

  • hyperparams -- hyper parameters

  • selector -- selection criteria for hyper params e.g. "max.accuracy"

  • hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions

  • inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".

  • outputs -- list of outputs which can pass in the workflow

  • workdir -- default input artifacts path

  • labels -- labels to tag the job/run with ({key:val, ..})

  • base_task -- task object to use as base

  • watch -- watch/follow run log, True by default

  • local -- run the function locally vs on the runtime/cluster

  • verbose -- add verbose prints/logs

  • project_object -- override the project object to use, will default to the project set in the runtime context.

  • auto_build -- when set to True and the function require build it will be built on the first function run, use only if you do not plan on changing the build config between runs

  • schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron

  • artifact_path -- path to store artifacts, when running in a workflow this will be set automatically

  • notifications -- list of notifications to push when the run is completed

  • returns --

    List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:

    • A string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used.

    • A dictionary of configurations to use when logging. Further info per object type and artifact type can be given there. The artifact key must appear in the dictionary as "key": "the_key".

  • builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}

Returns:

MLRun RunObject or KubeFlow containerOp