mlrun.projects

mlrun.projects#

class mlrun.projects.MlrunProject(metadata: ProjectMetadata | dict | None = None, spec: ProjectSpec | dict | None = None)[source]#

Bases: ModelObj

add_custom_packager(packager: str, is_mandatory: bool)[source]#

Add a custom packager from the custom packagers list. All project's custom packagers are added to each project function.

Notice that in order to run a function with the custom packagers included, you must set a source for the project (using the project.set_source method) with the parameter pull_at_runtime=True so the source code of the packagers will be able to be imported.

Parameters:

packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".
is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.

property artifact_path: str#

specify builder configuration for the project

Parameters:

image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used
set_as_default -- set image to be the project's default image (default False)
with_mlrun -- add the current mlrun package to the container build
base_image -- base image name/path
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- a list of packages to install on the built image
requirements_file -- requirements file to install on the built image
overwrite_build_params --
Overwrite existing build configuration (currently only applies to requirements and commands).
- False: The values passed in this call are merged with the project's stored values.
- True: The values passed in this call replace the project's stored values for commands and requirements. Parameters not explicitly passed retain their stored values.
To remove existing stored values, use overwrite_build_params=True and pass the values explicitly like this (commands=[""], requirements=[""]).
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
source_code_target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)

deploy ML function, build container with its dependencies

Parameters:

function -- name of the function (in the project) or function object
with_mlrun -- add the current mlrun package to the container build
skip_deployed -- skip the build if we already have an image for the function
image -- target image name/path
base_image -- base image name/path (commands and source code will be added to it)
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- list of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
overwrite_build_params --
Overwrite existing build configuration (currently only applies to requirements and commands).
- False: The values passed in this call are merged with the project's stored values.
- True: The values passed in this call replace the project's stored values for commands and requirements. Parameters not explicitly passed retain their stored values.
To remove existing stored values, use overwrite_build_params=True and pass the values explicitly like this (commands=[""], requirements=[""]).
extra_args -- A string containing additional builder arguments in the format of, command-line options e.g. extra_args="--skip-tls-verify --build-arg A=val"
force_build -- force building the image, even when no changes were made

Builder docker image for the project, based on the project's build config. Parameters allow to override the build config. If the project has a source configured and pull_at_runtime is not configured, this source will be cloned to the image built. The target_dir parameter allows specifying the target path where the code will be extracted.

Parameters:

image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used
set_as_default -- set image to be the project's default image (default True)
with_mlrun -- add the current mlrun package to the container build
base_image -- base image name/path (commands and source code will be added to it) defaults to mlrun.mlconf.default_base_image
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- list of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
overwrite_build_params --
Overwrite existing build configuration (currently only applies to requirements and commands).
- False: The values passed in this call are merged with the project's stored values.
- True: The values passed in this call replace the project's stored values for commands and requirements. Parameters not explicitly passed retain their stored values.
To remove existing stored values, use overwrite_build_params=True and pass the values explicitly like this (commands=[""], requirements=[""]).
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)

property context: str#

create_model_monitoring_alert_configs(name: str, summary: str, endpoints: ModelEndpointList, events: list[EventKind] | EventKind, notifications: list[AlertNotification], result_names: list[str] | None = None, severity: AlertSeverity = AlertSeverity.MEDIUM, criteria: AlertCriteria = AlertCriteria(count=1, period='10m'), reset_policy: ResetPolicy = ResetPolicy.AUTO) → list[AlertConfig][source]#

Generate alert configurations based on specified model endpoints and result names, which can be defined explicitly or using regex patterns.

Parameters:

name -- The name of the AlertConfig template. It will be combined with mep id, app name and result name to generate a unique name.
summary -- Summary of the alert, will be sent in the generated notifications
endpoints -- The endpoints from which metrics will be retrieved to configure the alerts. The ModelEndpointList object is obtained via the list_model_endpoints method or created manually using ModelEndpoint objects.
events -- AlertTrigger event types (EventKind).
notifications -- List of notifications to invoke once the alert is triggered
result_names --
Optional. Filters the result names used to create the alert configuration, constructed from the app and result_name regex.

For example: [app1.result-*, *.result1] will match "mep_uid1.app1.result.result-1" and "mep_uid1.app2.result.result1". A specific result_name (not a wildcard) will always create a new alert config, regardless of whether the result name exists.
severity -- Severity of the alert.
criteria -- The threshold for triggering the alert based on the specified number of events within the defined time period.
reset_policy -- When to clear the alert. "manual" means the alert stays active after triggering and must be reset explicitly. "auto" means the alert is reset immediately after triggering and sending notifications.

Returns:

List of AlertConfig according to endpoints results, filtered by result_names.

Create a monitoring function object without setting it to the project

Example:

project.create_model_monitoring_function(
    name="myApp", application_class="MyApp", image="mlrun/mlrun"
)

Parameters:

func -- The function's code URL. None refers to the current notebook.
name -- Name of the function, can be specified with a tag to support versions (e.g. myfunc:v1).
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source
tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')
requirements -- A list of python packages
requirements_file -- Path to a python requirements file
application_class -- Name or an Instance of a class that implementing the monitoring application.
application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.

Returns:

The model monitoring remote function object.

create_remote(url, name='origin', branch=None)[source]#

Create remote for the project git

This method creates a new remote repository associated with the project's Git repository. If a remote with the specified name already exists, it will not be overwritten.

If you wish to update the URL of an existing remote, use the set_remote method instead.

Parameters:

url -- remote git url
name -- name for the remote (default is 'origin')
branch -- Git branch to use as source

property default_function_node_selector: dict#

property default_image: str#

delete_alert_config(alert_data: AlertConfig = None, alert_name: str | None = None)[source]#

Delete an alert.

Parameters:

alert_data -- The data of the alert.
alert_name -- The name of the alert to delete.

delete_api_gateway(name: str)[source]#

Deletes an API gateway by name.

Parameters:: name -- The name of the API gateway to delete.

delete_artifact(item: Artifact, deletion_strategy: ArtifactsDeletionStrategies = ArtifactsDeletionStrategies.metadata_only, secrets: dict | None = None)[source]#

Delete an artifact object in the DB and optionally delete the artifact data

Parameters:

item -- Artifact object (can be any type, such as dataset, model, feature store).
deletion_strategy -- The artifact deletion strategy types.
secrets -- Credentials needed to access the artifact data.

delete_datastore_profile(profile: str)[source]#

delete_function(name, delete_from_db=False)[source]#

deletes the specified function from the project

Parameters:

name -- name of the function (under the project)
delete_from_db -- default is False. If False, the function is removed only from the project's cache and spec. If True, the function is also removed from the database.

delete_model_monitoring_function(name: str | list[str])[source]#

delete the specified model-monitoring-app function/s

Parameters:: name -- name of the model-monitoring-function/s (under the project)

delete_model_monitoring_lag_alert() → None[source]#: Delete the lag detection alert for this project.

deploy real-time (nuclio based) functions

Parameters:

function -- name of the function (in the project) or function object
models -- list of model items
env -- dict of extra environment variables
tag -- extra version tag
verbose -- add verbose prints/logs
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
mock -- deploy mock server vs a real Nuclio function (for local simulations)

property description: str#

disable_model_monitoring(*, delete_resources: bool = True, delete_stream_function: bool = False, delete_histogram_data_drift_app: bool = True, delete_user_applications: bool = False, user_application_list: list[str] | None = None) → None[source]#

Disable model monitoring application controller, writer, stream, histogram data drift application and the user's applications functions, according to the given params.

Parameters:

delete_resources -- If True, it would delete the model monitoring controller & writer functions. Default True
delete_stream_function -- If True, it would delete model monitoring stream function, need to use wisely because if you're deleting this function this can cause data loss in case you will want to enable the model monitoring capability to the project. Default False.
delete_histogram_data_drift_app -- If True, it would delete the default histogram-based data drift application. Default False.
delete_user_applications -- If True, it would delete the user's model monitoring application according to user_application_list, Default False.
user_application_list -- List of the user's model monitoring application to disable. Default all the applications. Note: you have to set delete_user_applications to True in order to delete the desired application.

enable_model_monitoring(base_period: int = 10, image: str = 'mlrun/mlrun', *, deploy_histogram_data_drift_app: bool = True, wait_for_deployment: bool = False, fetch_credentials_from_sys_config: bool = False, lag_threshold: int | None = None, lag_event_cooldown: int | None = None) → None[source]#

Deploy model monitoring application controller, writer and stream functions. While the main goal of the controller function is to handle the monitoring processing and triggering applications, the goal of the model monitoring writer function is to write all the monitoring application results to the databases. The stream function goal is to monitor the log of the data stream. It is triggered when a new log entry is detected. It processes the new events into statistics that are then written to statistics databases.

Parameters:

base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes (which is also the minimum value for production environments).
image -- The image of the model monitoring controller, writer, monitoring stream & histogram data drift functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.

deploy_histogram_data_drift_app --

If true, deploy the default histogram-based data drift application: HistogramDataDriftApplication. If false, and you want to deploy the histogram data drift application afterwards, you may use the set_model_monitoring_function() method:

import mlrun.model_monitoring.applications.histogram_data_drift as histogram_data_drift

hist_app = project.set_model_monitoring_function(
    name=histogram_data_drift.HistogramDataDriftApplicationConstants.NAME,  # keep the default name
    func=histogram_data_drift.__file__,
    application_class=histogram_data_drift.HistogramDataDriftApplication.__name__,
)

project.deploy_function(hist_app)

wait_for_deployment -- If true, return only after the deployment is done on the backend. Otherwise, deploy the model monitoring infrastructure on the background, including the histogram data drift app if selected.
fetch_credentials_from_sys_config -- Deprecated. If true, fetch the credentials from the project configuration.
lag_threshold -- Duration in minutes that will be considered as lag in the writer. Must be at least model_endpoint_monitoring.lag_detection.min_lag_threshold_minutes from config. When not provided, computed as min(default_lag_threshold_minutes, base_period) (see model_endpoint_monitoring.lag_detection in config).
lag_event_cooldown -- Duration in minutes between consecutive lag events per worker. When not provided, computed as min(default_lag_event_cooldown_minutes, base_period // 2) (see model_endpoint_monitoring.lag_detection in config).

export(filepath=None, include_files: str | None = None)[source]#

save the project object into a yaml file or zip archive (default to project.yaml)

By default, the project object is exported to a yaml file, when the filepath suffix is '.zip' the project context dir (code files) are also copied into the zip, the archive path can include DataItem urls (for remote object storage, e.g. s3://<bucket>/<path>).

Parameters:

filepath -- path to store project .yaml or .zip (with the project dir content)
include_files -- glob filter string for selecting files to include in the zip archive

get_alert_config(alert_name: str) → AlertConfig[source]#

Retrieve an alert.

Parameters:: alert_name -- The name of the alert to retrieve.
Returns:: The alert object.

get_alert_template(template_name: str) → AlertTemplate[source]#

Retrieve a specific alert template.

Parameters:: template_name -- The name of the template to retrieve.
Returns:: The template object.

get_api_gateway(name: str) → APIGateway[source]#

Retrieves an API gateway by name instance.

Parameters:: name -- The name of the API gateway to retrieve.
Returns:: An instance of APIGateway.
Return type:: mlrun.runtimes.nuclio.APIGateway

get_artifact(key, tag=None, iter=None, tree=None, uid=None) → Artifact | None[source]#

Return an artifact object

Parameters:

key -- Artifact key
tag -- Version tag
iter -- Iteration number (for hyper-param tasks)
tree -- The producer id (tree)
uid -- The artifact uid

Returns:

Artifact object

get_artifact_uri(key: str, category: str = 'artifact', tag: str | None = None, iter: int | None = None) → str[source]#

return the project artifact uri (store://..) from the artifact key

Example:

uri = project.get_artifact_uri(
    "my_model", category="model", tag="prod", iter=0
)

Parameters:

key -- artifact key/name
category -- artifact category (artifact, model, feature-vector, ..)
tag -- artifact version tag, default to latest version
iter -- iteration number, default to no iteration

get_config_profile_attributes(name: str) → dict[source]#

Get the merged attributes from a named configuration profile.

Retrieves a profile from the datastore using the provided name and returns its merged public and private attributes as a dictionary.

Parameters:

name (str) -- Name of the configuration profile to retrieve. Will be prefixed with "ds://" to form the full profile path.

Returns:

The merged attributes dictionary containing both public and private: configuration settings from the profile. Returns nested dictionaries if the profile contains nested configurations.

Return type:

dict

get_custom_packagers() → list[tuple[str, bool]][source]#

Get the custom packagers registered in the project.

Returns:: A list of the custom packagers module paths.

get_datastore_profile(profile: str) → DatastoreProfile[source]#

get_drift_over_time(start: datetime | None = None, end: datetime | None = None) → ModelEndpointDriftValues[source]#

Get drift counts over time for the project.

This method returns a list of tuples, each representing a time-interval (in a granularity set by the duration of the given time range) and the number of suspected drifts and detected drifts in that interval. For a range of 6 hours or less, the granularity is 10 minute, for a range of 2 hours to 72 hours, the granularity is 1 hour, and for a range of more than 72 hours, the granularity is 24 hours.

Parameters:

start -- Start time of the range to retrieve drift counts from.
end -- End time of the range to retrieve drift counts from.

Returns:

A ModelEndpointDriftValues object containing the drift counts over time.

get_function(key, sync=False, enrich=False, ignore_cache=False, copy_function=True, tag: str = '') → BaseRuntime[source]#

get function object by name

Parameters:

key -- name of key for search
sync -- will reload/reinit the function from the project spec
enrich -- add project info/config/source info to the function object
ignore_cache -- read the function object from the DB (ignore the local cache)
copy_function -- return a copy of the function object
tag -- provide if the function key is tagged under the project (function was set with a tag)

Returns:

function object

get_function_names() → list[str][source]#: get a list of all the project function names

get_function_objects() → FunctionsDict[source]#: "get a virtual dict with all the project functions ready for use in a pipeline

get_item_absolute_path(url: str, check_path_in_context: bool = False) → tuple[str, bool][source]#

Get the absolute path of the artifact or function file :param url: remote url, absolute path or relative path :param check_path_in_context: if True, will check if the path exists when in the context (temporary parameter to allow for backwards compatibility)

Returns:: absolute path / url, whether the path is in the project context

Get monitoring function summaries for the specified project.

Parameters:

start -- The start time of the monitoring applications’ statistics. If not defined, the default is 24 hours ago. Required timezone, applicable only when include_stats is set to True.
end -- The end time of the monitoring applications’ statistics. If not defined, the default is now. Required timezone, applicable only when include_stats is set to True.
names -- List of function names to filter by (optional).
labels -- Labels to filter by (optional).
include_stats -- Whether to include statistics in the response (default is False).
include_infra -- Whether to include model monitoring infrastructure functions (default is True).

Returns:

A list of FunctionSummary objects containing information about the monitoring functions.

get_monitoring_function_summary(name: str, start: datetime | None = None, end: datetime | None = None, include_latest_metrics: bool = False) → FunctionSummary[source]#

Get a monitoring function summary for the specified project and function name.

Parameters:

name -- Name of the monitoring function to retrieve the summary for.
start -- The start time of the monitoring applications’ statistics. If not defined, the default is 24 hours ago. Required timezone.
end -- The end time of the monitoring applications’ statistics. If not defined, the default is now. Required timezone.
include_latest_metrics -- Whether to include the latest metrics in the response (default is False).

Returns:

A FunctionSummary object containing information about the monitoring function.

get_param(key: str, default=None)[source]#: get project param by key

get_run_status(run, timeout=None, expected_statuses=None, notifiers: CustomNotificationPusher = None)[source]#

get_secret(key: str)[source]#: get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through files, env, ..

get_store_resource(uri)[source]#: get store resource object by uri

get_vector_store_collection(vector_store: VectorStore, collection_name: str | None = None) → VectorStoreCollection[source]#

Create a VectorStoreCollection wrapper for a given vector store instance.

This method wraps a vector store implementation (like Milvus, Chroma) with MLRun integration capabilities. The wrapper provides access to the underlying vector store's functionality while adding MLRun-specific features like document and artifact management.

Parameters:

vector_store -- The vector store instance to wrap (e.g., Milvus, Chroma). This is the underlying implementation that will handle vector storage and retrieval.
collection_name -- Optional name for the collection. If not provided, will attempt to extract it from the vector_store object by looking for 'collection_name', '_collection_name', 'index_name', or '_index_name' attributes.

Returns:

A wrapped vector store instance with MLRun integration.: This wrapper provides both access to the original vector store's capabilities and additional MLRun functionality.

Return type:

VectorStoreCollection

Example:

vector_store = Chroma(embedding_function=embeddings)
collection = project.get_vector_store_collection(
    vector_store, collection_name="my_collection"
)

import_artifact(item_path: str, new_key=None, artifact_path=None, tag=None)[source]#

Import an artifact object/package from .yaml, .json, or .zip file

Parameters:

item_path -- dataitem url or file path to the file/package
new_key -- overwrite the artifact key/name
artifact_path -- target artifact path (when not using the default)
tag -- artifact tag to set

Returns:

artifact object

kind = 'project'#

Retrieve a list of alert activations for a project.

Parameters:

name -- The alert name to filter by. Supports exact matching or partial matching if prefixed with ~.
since -- Filters for alert activations occurring after this timestamp.
until -- Filters for alert activations occurring before this timestamp.
entity -- The entity ID to filter by. Supports wildcard matching if prefixed with ~.
severity -- A list of severity levels to filter by (e.g., ["high", "low"]).
entity_kind -- The kind of entity (e.g., "job", "endpoint") to filter by.
event_kind -- The kind of event (e.g., ""data-drift-detected"", "failed") to filter by.

Returns:

A list of alert activations matching the provided filters.

list_alert_templates() → list[AlertTemplate][source]#

Retrieve list of all alert templates.

Returns:: All the alert template objects in the database.

list_alerts_configs(limit: int | None = None, offset: int | None = None) → list[AlertConfig][source]#

Retrieve list of alerts of a project.

Parameters:

limit -- The maximum number of alerts to return. Defaults to mlconf.alerts.default_list_alert_configs_limit if not provided.
offset -- The number of alerts to skip before starting to collect alerts.

Returns:

All the alerts objects of the project.

list_api_gateways() → list[APIGateway][source]#

Retrieves a list of Nuclio API gateways associated with the project.

Returns:: List of APIGateway objects representing the Nuclio API gateways associated with the project.

List artifacts filtered by various parameters.

The returned result is an ArtifactList (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, and .to_df() to convert to a DataFrame.

Examples:

# Get latest version of all artifacts in project
latest_artifacts = project.list_artifacts(tag="latest")
# check different artifact versions for a specific artifact, return as objects list
result_versions = project.list_artifacts("results", tag="*").to_objects()

Parameters:

name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for ~name may return artifacts named my_Name_1 or surname.
tag -- Return artifacts assigned this tag.
labels --
Filter artifacts by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
since -- Not in use in HTTPRunDB.
until -- Not in use in HTTPRunDB.
iter -- Return artifacts from a specific iteration (where iter=0 means the root iteration). If None (default) return artifacts from all iterations.
best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using best_iter, the iter parameter must not be used.
kind -- Return artifacts of the requested kind.
category -- Return artifacts of the requested category.
tree -- Return artifacts of the requested tree.
format -- The format in which to return the artifacts. Default is 'full'.
partition_by -- Field to group results by. When partition_by is specified, the partition_sort_by parameter must be provided as well.
rows_per_partition -- How many top rows (per sorting defined by partition_sort_by and partition_order) to return per group. Default value is 1.
partition_sort_by -- What field to sort the results by, within each partition defined by partition_by. Currently the only allowed values are created and updated.
partition_order -- Order of sorting within partitions - asc or desc. Default is desc.

list_datastore_profiles() → list[DatastoreProfile][source]#: Returns a list of datastore profiles associated with the project. The information excludes private details, showcasing only public data.

Retrieve a list of functions, filtered by specific criteria.

Example:

functions = project.list_functions(tag="latest")

Parameters:

name -- Return only functions with a specific name.
tag -- Return function versions with specific tags. To return only tagged functions, set tag to "*".
labels --
Filter functions by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
kind -- Return functions of the specified kind. If not provided, all function kinds will be returned.
format -- The format in which to return the functions. Default is 'full'.

Returns:

List of function objects.

List LLM prompt artifacts in the project with support for filtering.

This method returns a list of LLM prompt artifacts, filtered by parameters such as name, tag, labels, model association, iteration, and more. It can be used to retrieve the latest, best, or specific versions of prompts tied to a model or general project context.

Examples:

# Get all latest tagged prompts
prompts = project.list_llm_prompts(tag="latest")

# Get prompts associated with a specific model
prompts = project.list_llm_prompts(model=ModelArtifact("m1"))

# Get prompts filtered by label
prompts = project.list_llm_prompts(labels={"use_case": "chatbot"})

# Get prompts using a name wildcard
prompts = project.list_llm_prompts(name="~chat")

Parameters:

name -- Name of the prompt artifact. Prefix with '~' for wildcard search (case-insensitive).
tag -- Filter artifacts by this tag (e.g., 'latest', 'prod').
labels --
Filter llm-prompt artifacts by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
since -- Return artifacts updated after this date (as datetime object).
until -- Return artifacts updated before this date (as datetime object).
iter -- Retrieve a specific iteration. Use 0 for root; None for all.
best_iteration -- Returns the llm-prompt artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using best_iter, the iter parameter must not be used.
tree -- Filter by artifact tree ID (e.g., for lineage filtering).
model -- Return prompts associated with this model (can be Artifact URI or Artifact object).
format -- The format in which to return the artifacts. Default is 'full'.
partition_by -- Field to group results by. When partition_by is specified, the partition_sort_by parameter must be provided as well.
rows_per_partition -- How many top rows (per sorting defined by partition_sort_by and partition_order) to return per group. Default value is 1.
partition_sort_by -- What field to sort the results by, within each partition defined by partition_by. Currently the only allowed values are created and updated.
partition_order -- Order of sorting within partitions - asc or desc. Default is desc.

Returns:

A list of filtered LLMPromptArtifact objects matching the given parameters.

Returns a list of ModelEndpoint objects. Each ModelEndpoint object represents the current state of a model endpoint. This functions supports filtering by the following parameters: 1) name 2) model_name 3) model_tag 4) function_name 5) function_tag 6) labels 7) top level 8) modes 9) uids 10) start and end time, corresponding to the created field. By default, when no filters are applied, all available endpoints for the given project will be listed.

In addition, this functions provides a facade for listing endpoint related metrics. This facade is time-based and depends on the 'start' and 'end' parameters.

Parameters:

names -- The name of the model to filter by
model_name -- The name of the model to filter by
function_name -- The name of the function to filter by
function_tag -- The tag of the function to filter by
labels --
Filter model endpoints by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
start -- The start time to filter by.Corresponding to the created field.
end -- The end time to filter by. Corresponding to the created field.
top_level -- If true will return only routers and endpoint that are NOT children of any router.
modes -- Specifies the mode of the model endpoint. Can be "real-time" (0), "batch" (1), "batch_legacy" (2). If set to None, all are included.
uids -- If passed will return a list ModelEndpoint object with uid in uids.
tsdb_metrics -- When True, the time series metrics will be added to the output of the resulting.
metric_list -- List of metrics to include from the time series DB. Defaults to all metrics. If tsdb_metrics=False, this parameter will be ignored and no tsdb metrics will be included.

Returns:

Returns a list of ModelEndpoint objects.

Retrieve a list of all the model monitoring functions. Example:

functions = project.list_model_monitoring_functions()

Parameters:

name -- Return only functions with a specific name.
tag -- Return function versions with specific tags.
labels --
Filter functions by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.

Returns:

List of function objects.

List models in project, filtered by various parameters.

Examples:

# Get latest version of all models in project
latest_models = project.list_models(tag="latest")

Parameters:

name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for ~name may return artifacts named my_Name_1 or surname.
tag -- Return artifacts assigned this tag.
labels --
Filter model artifacts by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
since -- Not in use in HTTPRunDB.
until -- Not in use in HTTPRunDB.
iter -- Return artifacts from a specific iteration (where iter=0 means the root iteration). If None (default) return artifacts from all iterations.
best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using best_iter, the iter parameter must not be used.
tree -- Return artifacts of the requested tree.
format -- The format in which to return the artifacts. Default is 'full'.

Retrieve a list of runs. The default returns the runs from the last week, partitioned by name. To override the default, specify any filter.

The returned result is a `` (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, .to_df() to convert to a DataFrame, and compare() to generate comparison table and PCP plot.

Example:

# return a list of runs matching the name and label and compare
runs = project.list_runs(name="download", labels="owner=admin")
runs.compare()

# multi-label filter can also be provided
runs = project.list_runs(
    name="download", labels=["kind=job", "owner=admin"]
)

# If running in Jupyter, can use the .show() function to display the results
project.list_runs(name="").show()

Parameters:

name -- Name of the run to retrieve.
uid -- Unique ID of the run.
labels --
Filter runs by label key-value pairs or key existence. This can be provided as:
- A dictionary in the format {"label": "value"} to match specific label key-value pairs, or {"label": None} to check for key existence.
- A list of strings formatted as "label=value" to match specific label key-value pairs, or just "label" for key existence.
- A comma-separated string formatted as "label1=value1,label2" to match entities with the specified key-value pairs or key existence.
states -- List only runs whose state is one of the provided states.
sort -- Whether to sort the result according to their start time. Otherwise, results will be returned by their internal order in the DB (order will not be guaranteed).
iter -- If True return runs from all iterations. Otherwise, return only runs whose iter is 0.
start_time_from -- Filter by run start time in [start_time_from, start_time_to].
start_time_to -- Filter by run start time in [start_time_from, start_time_to].
last_update_time_from -- Filter by run last update time in (last_update_time_from, last_update_time_to).
last_update_time_to -- Filter by run last update time in (last_update_time_from, last_update_time_to).
end_time_from -- Filter by run end time in [end_time_from, end_time_to].
end_time_to -- Filter by run end time in [end_time_from, end_time_to].

log_artifact(item, body=None, tag: str = '', local_path: str = '', artifact_path: str | None = None, format: str | None = None, upload: bool | None = None, labels: dict[str, str] | None = None, target_path: str | None = None, **kwargs) → Artifact[source]#

Log an output artifact and optionally upload it to datastore

If the artifact already exists with the same key and tag, it will be overwritten.

Example:

project.log_artifact(
    "some-data",
    body=b"abc is 123",
    local_path="model.txt",
    labels={"framework": "xgboost"},
)

Parameters:

item -- artifact key or artifact object (can be any type, such as dataset, model, feature store)
body -- will use the body as the artifact content
local_path -- path to the local file we upload, will also be use as the destination subpath (under "artifact_path")
artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
format -- artifact file format: csv, png, ..
tag -- version tag
target_path -- absolute target path (instead of using artifact_path + local_path)
upload -- Whether to upload the artifact to the datastore. If not provided, and the local_path is not a directory, upload occurs by default. Directories are uploaded only when this flag is explicitly set to True.
labels -- a set of key/value labels to tag the artifact with

Returns:

artifact object

log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=None, labels=None, format='', preview=None, stats=None, target_path='', extra_data=None, label_column: str | None = None, **kwargs) → DatasetArtifact[source]#

Log a dataset artifact and optionally upload it to datastore.

If the dataset already exists with the same key and tag, it will be overwritten.

Example:

raw_data = {
    "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
    "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
    "age": [42, 52, 36, 24, 73],
    "testScore": [25, 94, 57, 62, 70],
}
df = pd.DataFrame(
    raw_data, columns=["first_name", "last_name", "age", "testScore"]
)
project.log_dataset("mydf", df=df, stats=True)

Parameters:

key -- artifact key
df -- dataframe object
label_column -- name of the label column (the one holding the target (y) values)
local_path -- path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.
artifact_path -- target artifact path (when not using the default). to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- version tag
format -- optional, format to use (csv, parquet, pq, tsdb, kv)
target_path -- absolute target path (instead of using artifact_path + local_path)
preview -- number of lines to store as preview in the artifact metadata
stats -- calculate and store dataset stats in the artifact metadata
extra_data -- key/value list of extra files/charts to link with this dataset
upload -- upload to datastore (default is True)
labels -- a set of key/value labels to tag the artifact with

Returns:

dataset artifact object

log_document(key: str = '', tag: str = '', local_path: str = '', artifact_path: str | None = None, document_loader_spec: DocumentLoaderSpec | None = None, upload: bool | None = False, labels: dict[str, str] | None = None, target_path: str | None = None, **kwargs) → DocumentArtifact[source]#

Log a document as an artifact.

Parameters:

key -- Optional artifact key. If not provided, will be derived from local_path or target_path using DocumentArtifact.key_from_source()
tag -- Version tag
local_path -- path to the local file we upload, will also be use as the destination subpath (under "artifact_path")
artifact_path -- Target path for artifact storage
document_loader_spec --
Spec to use to load the artifact as langchain document.

By default, uses DocumentLoaderSpec() which initializes with:
- loader_class_name="langchain_community.document_loaders.TextLoader"
- src_name="file_path"
- kwargs=None
Can be customized for different document types, e.g.:
```
DocumentLoaderSpec(
    loader_class_name="langchain_community.document_loaders.PDFLoader",
    src_name="file_path",
    kwargs={"extract_images": True}
)
```
upload -- Whether to upload the artifact
labels -- Key-value labels. A 'source' label is automatically added using either local_path or target_path to facilitate easier document searching.
target_path -- Target file path
kwargs -- Additional keyword arguments

Returns:

DocumentArtifact object

Example:

# Log a PDF document with custom loader
project.log_document(
    local_path="path/to/doc.pdf",
    document_loader=DocumentLoaderSpec(
        loader_class_name="langchain_community.document_loaders.PDFLoader",
        src_name="file_path",
        kwargs={"extract_images": True},
    ),
)

Log an LLM prompt artifact to the project.

This method creates and logs an LLMPromptArtifact which captures a prompt definition for large language model (LLM) interactions. The prompt can be provided as a string or a file, and may include metadata like generation parameters, a legend for variable injection, and references to a parent model artifact.

If the prompt content exceeds a certain length, it may be stored in a temporary file and logged accordingly.

Examples:

# Log directly with an inline prompt template
project.log_llm_prompt(
    key="customer_support_prompt",
    prompt_template=[
        {
            "role": "system",
            "content": "You are a helpful customer support assistant.",
        },
        {
            "role": "user",
            "content": "The customer reports: {issue_description}",
        },
    ],
    prompt_legend={
        "issue_description": {
            "field": "user_issue",
            "description": "Detailed description of the customer's issue",
        },
        "solution": {
            "field": "proposed_solution",
            "description": "Suggested fix for the customer's issue",
        },
    },
    model_artifact=model,
    invocation_config={"temperature": 0.5, "max_tokens": 200},
    description="Prompt for handling customer support queries",
    tag="support-v1",
    labels={"domain": "support"},
)

# Log a prompt from file
project.log_llm_prompt(
    key="qa_prompt",
    prompt_path="prompts/template.json",
    prompt_legend={
        "question": {
            "field": "user_question",
            "description": "The actual question asked by the user",
        }
    },
    model_artifact=model,
    invocation_config={"temperature": 0.7, "max_tokens": 256},
    description="Q&A prompt template with user-provided question",
    tag="v2",
    labels={"task": "qa", "stage": "experiment"},
)

Parameters:

key -- Unique key for the prompt artifact.
prompt_template -- Raw prompt list of dicts - [{"role": "system", "content": "You are a {profession} advisor"}, "role": "user", "content": "I need your help with {profession}"]. only "role" and "content" keys allow in any str format (upper/lower case), keys will be modified to lower case. Cannot be used with prompt_path.
prompt_path -- Path to a JSON file containing the prompt template. Cannot be used together with prompt_template. The file should define a list of dictionaries in the same format supported by prompt_template.
prompt_legend -- A dictionary where each key is a placeholder in the prompt (e.g., {user_name}) and the value is a dictionary holding two keys, "field", "description". "field" points to the field in the event where the value of the place-holder inside the event, if None or not exist will be replaced with the place-holder name. "description" will point to explanation of what that placeholder represents. Useful for documenting and clarifying dynamic parts of the prompt.
model_artifact -- Reference to the parent model (either ModelArtifact or model URI string).
invocation_config -- Configuration dictionary for model generation parameters (e.g., temperature, max tokens).
description -- Optional description of the prompt.
target_path -- Absolute target path (instead of using artifact_path + local_path)
artifact_path -- Target artifact path (when not using the default) To define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- Version tag for the artifact (e.g., "v1", "latest").
labels -- Labels to tag the artifact for filtering and organization.
upload -- Whether to upload the artifact to a remote datastore. Defaults to True.
kwargs -- Additional attributes to pass into the LLMPromptArtifact.

Returns:

The logged LLMPromptArtifact object.

log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=None, labels=None, inputs: list[Feature] | None = None, outputs: list[Feature] | None = None, feature_vector: str | None = None, feature_weights: list | None = None, training_set=None, label_column=None, extra_data=None, model_url: str | None = None, default_config=None, **kwargs) → ModelArtifact[source]#

Log a model artifact and optionally upload it to datastore

If the model already exists with the same key and tag, it will be overwritten.

Example:

project.log_model(
    "model",
    body=dumps(model),
    model_file="model.pkl",
    metrics=context.results,
    training_set=training_df,
    label_column="label",
    feature_vector=feature_vector_uri,
    labels={"app": "fraud"},
)

Parameters:

key -- artifact key or artifact class ()
body -- will use the body as the artifact content
model_file -- path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir -- path to the local dir holding the model file and extra files
artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
framework -- name of the ML framework
algorithm -- training algorithm name
tag -- version tag
metrics -- key/value dict of model metrics
parameters -- key/value dict of model parameters
inputs -- ordered list of model input features (name, type, ..)
outputs -- ordered list of model output/result elements (name, type, ..)
upload -- upload to datastore (if not specified, defaults to True (uploads artifact))
labels -- a set of key/value labels to tag the artifact with
feature_vector -- feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights -- list of feature weights, one per input column
training_set -- training set dataframe, used to infer inputs & outputs
label_column -- which columns in the training set are the label (target) columns
extra_data -- key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
model_url -- Remote model url.
default_config -- Default configuration for client building Saved as a sub-dictionary under the parameter.

Returns:

model artifact object

property metadata: ProjectMetadata#

property mountdir: str#

property name: str#: Project name, this is a property of the project metadata

property notifiers#

paginated_list_alert_activations(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[AlertActivation, str | None][source]#

List alerts activations with support for pagination and various filtering options.

This method retrieves a paginated list of alert activations based on the specified filter parameters. Pagination is controlled using the page, page_size, and page_token parameters. The method will return a list of alert activations that match the filtering criteria provided.

For detailed information about the parameters, refer to the list_alert_activations method:: See list_alert_activations() for more details.

Examples:

# Fetch first page of alert activations with page size of 5
alert_activations, token = project.paginated_list_alert_activations(
    page_size=5
)
# Fetch next page using the pagination token from the previous response
alert_activations, token = project.paginated_list_alert_activations(
    page_token=token
)
# Fetch alert activations for a specific page (e.g., page 3)
alert_activations, token = project.paginated_list_alert_activations(
    page=3, page_size=5
)

# Automatically iterate over all pages without explicitly specifying the page number
alert_activations = []
token = None
while True:
    page_alert_activations, token = (
        project.paginated_list_alert_activations(
            page_token=token, page_size=5
        )
    )
    alert_activations.extend(page_alert_activations)

    # If token is None and page_alert_activations is empty, we've reached the end (no more activations).
    # If token is None and page_alert_activations is not empty, we've fetched the last page of activations.
    if not token:
        break
print(f"Total alert activations retrieved: {len(alert_activations)}")

Parameters:

page -- The page number to retrieve. If not provided, the next page will be retrieved.
page_size -- The number of items per page to retrieve. Up to page_size responses are expected. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size if not provided.
page_token -- A pagination token used to retrieve the next page of results. Should not be provided for the first request.

Returns:

A tuple containing the list of alert activations and an optional page_token for pagination.

paginated_list_artifacts(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[ArtifactList, str | None][source]#

List artifacts with support for pagination and various filtering options.

This method retrieves a paginated list of artifacts based on the specified filter parameters. Pagination is controlled using the page, page_size, and page_token parameters. The method will return a list of artifacts that match the filtering criteria provided.

The returned result is an ArtifactList (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, and .to_df() to convert to a DataFrame.

For detailed information about the parameters, refer to the list_artifacts method:: See list_artifacts() for more details.

Examples:

# Fetch first page of artifacts with page size of 5
artifacts, token = project.paginated_list_artifacts("results", page_size=5)
# Fetch next page using the pagination token from the previous response
artifacts, token = project.paginated_list_artifacts(
    "results", page_token=token
)
# Fetch artifacts for a specific page (e.g., page 3)
artifacts, token = project.paginated_list_artifacts(
    "results", page=3, page_size=5
)

# Automatically iterate over all pages without explicitly specifying the page number
artifacts = []
token = None
while True:
    page_artifacts, token = project.paginated_list_artifacts(
        page_token=token, page_size=5
    )
    artifacts.extend(page_artifacts)

    # If token is None and page_artifacts is empty, we've reached the end (no more artifacts).
    # If token is None and page_artifacts is not empty, we've fetched the last page of artifacts.
    if not token:
        break
print(f"Total artifacts retrieved: {len(artifacts)}")

Parameters:

page -- The page number to retrieve. If not provided, the next page will be retrieved.
page_size -- The number of items per page to retrieve. Up to page_size responses are expected. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size if not provided.
page_token -- A pagination token used to retrieve the next page of results. Should not be provided for the first request.

Returns:

A tuple containing the list of artifacts and an optional page_token for pagination.

paginated_list_functions(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[list, str | None][source]#

List functions with support for pagination and various filtering options.

This method retrieves a paginated list of functions based on the specified filter parameters. Pagination is controlled using the page, page_size, and page_token parameters. The method will return a list of functions that match the filtering criteria provided.

For detailed information about the parameters, refer to the list_functions method:: See list_functions() for more details.

Examples:

# Fetch first page of functions with page size of 5
functions, token = project.paginated_list_functions(page_size=5)
# Fetch next page using the pagination token from the previous response
functions, token = project.paginated_list_functions(page_token=token)
# Fetch functions for a specific page (e.g., page 3)
functions, token = project.paginated_list_functions(page=3, page_size=5)

# Automatically iterate over all pages without explicitly specifying the page number
functions = []
token = None
while True:
    page_functions, token = project.paginated_list_functions(
        page_token=token, page_size=5
    )
    functions.extend(page_functions)

    # If token is None and page_functions is empty, we've reached the end (no more functions).
    # If token is None and page_functions is not empty, we've fetched the last page of functions.
    if not token:
        break
print(f"Total functions retrieved: {len(functions)}")

Parameters:

page -- The page number to retrieve. If not provided, the next page will be retrieved.
page_size -- The number of items per page to retrieve. Up to page_size responses are expected. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size if not provided.
page_token -- A pagination token used to retrieve the next page of results. Should not be provided for the first request.

Returns:

A tuple containing the list of functions and an optional page_token for pagination.

paginated_list_llm_prompts(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[ArtifactList, str | None][source]#

Retrieve a paginated list of LLM prompt artifacts for the current project.

This method returns a list of LLM prompt artifacts, supporting both token-based and page-number-based pagination. You can filter and navigate through the results using the optional page, page_size, and page_token parameters.

Examples:

# Fetch the first page with up to 5 prompt artifacts
prompts, token = project.paginated_list_llm_prompts(page_size=5)

# Fetch the next page using the page token
prompts, token = project.paginated_list_llm_prompts(page_token=token)

# Fetch a specific page (e.g., page 3)
prompts, token = project.paginated_list_llm_prompts(page=3, page_size=5)

# Retrieve all prompt artifacts across pages
all_prompts = []
token = None
while True:
    page_prompts, token = project.paginated_list_llm_prompts(
        page_token=token, page_size=5
    )
    all_prompts.extend(page_prompts)
    if not token:
        break
print(f"Total retrieved prompts: {len(all_prompts)}")

Parameters:

page -- Page number to retrieve (alternative to page_token).
page_size -- Number of items per page. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size.
page_token -- Token for retrieving the next page of results (used for continuous iteration).

Returns:

A tuple of (ArtifactList of LLM prompts, next page_token or None if no more pages).

paginated_list_models(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[ArtifactList, str | None][source]#

List models in project with support for pagination and various filtering options.

This method retrieves a paginated list of artifacts based on the specified filter parameters. Pagination is controlled using the page, page_size, and page_token parameters. The method will return a list of artifacts that match the filtering criteria provided.

For detailed information about the parameters, refer to the list_models method:: See list_models() for more details.

Examples:

# Fetch first page of artifacts with page size of 5
artifacts, token = project.paginated_list_models("results", page_size=5)
# Fetch next page using the pagination token from the previous response
artifacts, token = project.paginated_list_models(
    "results", page_token=token
)
# Fetch artifacts for a specific page (e.g., page 3)
artifacts, token = project.paginated_list_models(
    "results", page=3, page_size=5
)

# Automatically iterate over all pages without explicitly specifying the page number
artifacts = []
token = None
while True:
    page_artifacts, token = project.paginated_list_models(
        page_token=token, page_size=5
    )
    artifacts.extend(page_artifacts)

    # If token is None and page_artifacts is empty, we've reached the end (no more artifacts).
    # If token is None and page_artifacts is not empty, we've fetched the last page of artifacts.
    if not token:
        break
print(f"Total artifacts retrieved: {len(artifacts)}")

Parameters:

page -- The page number to retrieve. If not provided, the next page will be retrieved.
page_size -- The number of items per page to retrieve. Up to page_size responses are expected. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size if not provided.
page_token -- A pagination token used to retrieve the next page of results. Should not be provided for the first request.

Returns:

A tuple containing the list of artifacts and an optional page_token for pagination.

paginated_list_runs(*args, page: int | None = None, page_size: int | None = None, page_token: str | None = None, **kwargs) → tuple[RunList, str | None][source]#

List runs with support for pagination and various filtering options.

This method retrieves a paginated list of runs based on the specified filter parameters. Pagination is controlled using the page, page_size, and page_token parameters. The method will return a list of runs that match the filtering criteria provided.

The returned result is a `` (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, .to_df() to convert to a DataFrame, and compare() to generate comparison table and PCP plot.

For detailed information about the parameters, refer to the list_runs method:: See list_runs() for more details.

Examples:

# Fetch first page of runs with page size of 5
runs, token = project.paginated_list_runs(page_size=5)
# Fetch next page using the pagination token from the previous response
runs, token = project.paginated_list_runs(page_token=token)
# Fetch runs for a specific page (e.g., page 3)
runs, token = project.paginated_list_runs(page=3, page_size=5)

# Automatically iterate over all pages without explicitly specifying the page number
runs = []
token = None
while True:
    page_runs, token = project.paginated_list_runs(
        page_token=token, page_size=5
    )
    runs.extend(page_runs)

    # If token is None and page_runs is empty, we've reached the end (no more runs).
    # If token is None and page_runs is not empty, we've fetched the last page of runs.
    if not token:
        break
print(f"Total runs retrieved: {len(runs)}")

Parameters:

page -- The page number to retrieve. If not provided, the next page will be retrieved.
page_size -- The number of items per page to retrieve. Up to page_size responses are expected. Defaults to mlrun.mlconf.httpdb.pagination.default_page_size if not provided.
page_token -- A pagination token used to retrieve the next page of results. Should not be provided for the first request.

Returns:

A tuple containing the list of runs and an optional page_token for pagination.

property params: dict#

pull(branch: str | None = None, remote: str | None = None, secrets: SecretsStore | dict = None)[source]#

pull/update sources from git or tar into the context dir

Parameters:

branch -- git branch, if not the current one
remote -- git remote, if other than origin
secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}

update spec and push updates to remote git repo

Parameters:

branch -- target git branch
message -- git commit message
update -- update files (git add update=True)
remote -- git remote, default to origin
add -- list of files to add
author_name -- author's git user name to be used on this commit
author_email -- author's git user email to be used on this commit
secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}

push_pipeline_notification_kfp_runner(pipeline_id: str, current_run_state: RunStatuses, message: str, notifications: list | None = None)[source]#

Push notifications for a pipeline run(KFP).

Parameters:

pipeline_id -- Unique ID of the pipeline run.
current_run_state -- Current run state of the pipeline.
message -- Message to send in the notification.
notifications -- List of notifications to send.

push_run_notifications(uid, timeout=45)[source]#

Push notifications for a run.

Parameters:: uid -- Unique ID of the run.
Returns:: BackgroundTask.

register_artifacts()[source]#: register the artifacts in the MLRun DB (under this project)

register_datastore_profile(profile: DatastoreProfile)[source]#

reload(sync=False, context=None) → MlrunProject[source]#

reload the project and function objects from the project yaml/specs

Parameters:

sync -- set to True to load functions objects
context -- context directory (where the yaml and code exist)

Returns:

project object

remove_custom_packager(packager: str)[source]#

Remove a custom packager from the custom packagers list.

Parameters:: packager -- The packager module path to remove.
Raises:: MLRunInvalidArgumentError -- In case the packager was not in the list.

remove_remote(name)[source]#

Remove a remote from the project's Git repository.

This method removes the remote repository associated with the specified name from the project's Git repository.

Parameters:: name -- Name of the remote to remove.

reset_alert_config(alert_data: AlertConfig = None, alert_name: str | None = None)[source]#

Reset an alert.

Parameters:

alert_data -- The data of the alert.
alert_name -- The name of the alert to reset.

Run a workflow using kubeflow pipelines

Parameters:

name -- Name of the workflow
workflow_path -- URL to a workflow file, if not a project workflow
arguments -- Kubeflow pipelines arguments (parameters)
artifact_path -- Target path/URL for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id.
workflow_handler -- Workflow function handler (for running workflow function directly)
namespace -- Kubernetes namespace if other than default
sync -- Force functions sync before run
watch -- Wait for pipeline completion
dirty -- Allow running the workflow when the git repo is dirty
engine -- Workflow engine running the workflow. Supported values are 'kfp' (default), 'local' or 'remote'. For setting engine for remote running use 'remote:local' or 'remote:kfp'.
local -- Run local pipeline with local functions (set local=True in function.run())
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron For using the pre-defined workflow's schedule, set schedule=True
timeout -- Timeout in seconds to wait for pipeline completion (watch will be activated)
source --
Source to use instead of the actual project.spec.source (used when engine is remote). Can be one of:
- Remote URL which is loaded dynamically to the workflow runner.
- A path to the project's context on the workflow runner's image. Path can be absolute or relative to project.spec.build.source_code_target_dir if defined (enriched when building a project image with source, see MlrunProject.build_image). For other engines the source is used to validate that the code is up-to-date.
cleanup_ttl -- Pipeline cleanup ttl in secs (time to wait after workflow completion, at which point the workflow and all its resources are deleted)
notifications -- List of notifications to send for workflow completion
workflow_runner_node_selector -- Defines the node selector for the workflow runner pod when using a remote engine. This allows you to control and specify where the workflow runner pod will be scheduled. This setting is only relevant when the engine is set to 'remote' or for scheduled workflows, and it will be ignored if the workflow is not run on a remote engine.
context -- mlrun context.

Returns:

~py:class:~mlrun.projects.pipelines._PipelineRunStatus instance

Run a local or remote task as part of a local/kubeflow pipeline

Example (use with project):

# create a project with two functions (local and from hub)
project = mlrun.new_project(project_name, "./proj")
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://auto-trainer", "train")

# run functions (refer to them by name)
run1 = project.run_function("myfunc", params={"x": 7})
run2 = project.run_function(
    "train",
    params={"label_columns": LABELS},
    inputs={"dataset": run1.outputs["data"]},
)

Parameters:

function -- name of the function (in the project) or function object
handler -- name of the function handler. should not be set for serving graph deployed as job, as in this case the handler is predefined
name -- execution name
params -- input parameters (dict)
hyperparams -- hyper parameters
selector -- selection criteria for hyper params e.g. "max.accuracy"
hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions
inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".
outputs -- list of outputs which can pass in the workflow
workdir -- working directory of the executed job and the default path for artifact inputs
labels -- labels to tag the job/run with ({key:val, ..})
base_task -- task object to use as base
watch -- watch/follow run log, True by default
local -- run the function locally vs on the runtime/cluster
verbose -- add verbose prints/logs
auto_build -- when set to True and the function require build it will be built on the first function run, use only if you dont plan on changing the build config between runs
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron
artifact_path -- (deprecated) path to store artifacts, when running in a workflow this will be set automatically
notifications -- list of notifications to push when the run is completed
returns --
List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:
- A LogHint object with the key and extra configurations.
- A "shortcut" string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used. Packing kwargs can be passed alongside the artifact type using square brackets: "<key> : <type>[<kwarg1>=<value1>, <kwarg2>=<value2>]". Itemization can also be specified before the key using the following structure: "<unbundle-level> * <key>". If unbundle level is not specified, the default is full unbundling.
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
reset_on_run -- When True, function python modules would reload prior to code execution. This ensures latest code changes are executed. This argument must be used in conjunction with the local=True argument.
output_path -- path to store artifacts, when running in a workflow this will be set automatically
retry -- Retry configuration for the run, can be a dict or an instance of Retry. The count field in the Retry object specifies the number of retry attempts. If count=0, the run will not be retried. The backoff field specifies the retry backoff strategy between retry attempts. If not provided, the default backoff delay is 30 seconds.

Returns:

MLRun RunObject or PipelineNodeWrapper

save(filepath=None, store=True)[source]#

export project to yaml file and save project in database

Store:: if True, allow updating in case project already exists

save_to_db(store=True) → MlrunProject[source]#

save project to database

Store:: if True, allow updating in case project already exists

save_workflow(name, target, artifact_path=None, ttl=None)[source]#

create and save a workflow as a yaml or archive file

Parameters:

name -- workflow name
target -- target file path (can end with .yaml or .zip)
artifact_path -- target path/url for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id
ttl -- pipeline ttl (time to live) in secs (after that the pods will be removed)

set_artifact(key, artifact: str | dict | Artifact = None, target_path: str | None = None, tag: str | None = None)[source]#

add/set an artifact in the project spec (will be registered on load)

Example:

# register a simple file artifact
project.set_artifact("data", target_path=data_url)
# register a model artifact
project.set_artifact(
    "model",
    ModelArtifact(model_file="model.pkl"),
    target_path=model_dir_url,
)

# register a path to artifact package (will be imported on project load)
# to generate such package use `artifact.export(target_path)`
project.set_artifact("model", "https://mystuff.com/models/mymodel.zip")

Parameters:

key -- artifact key/name
artifact -- mlrun Artifact object/dict (or its subclasses) or path to artifact file to import (yaml/json/zip), relative paths are relative to the context path
target_path -- absolute target path url (point to the artifact content location)
tag -- artifact tag

set_default_image(default_image: str)[source]#

Set the default image to be used for running runtimes (functions) in this project. This image will be used if an image was not provided for a runtime. In case the default image is replaced, functions already registered with the project that used the previous default image will have their image replaced on next execution.

Parameters:: default_image -- Default image to use

Update or add a function object to the project. Function can be provided as an object (func) or a .py/.ipynb/.yaml URL.

Creating a function from a single file is done by specifying func and disabling with_repo.

Creating a function with project source (specify with_repo=True):

1. Specify a relative func path.

2. Specify a module handler (e.g. handler=package.package.func) without func.

Creating a function with non project source is done by specifying a module handler and on the returned function set the source with function.with_source_archive(<source>).

Supported URL prefixes:

Object (s3://, v3io://, ..)

MLRun DB e.g. db://project/func:ver

Hub/market: e.g. hub://auto-trainer:master

Examples:

proj.set_function(func_object)
proj.set_function(
    "http://.../mynb.ipynb", "train", kind="job", image="mlrun/mlrun"
)
proj.set_function("./func.yaml")
proj.set_function("hub://get_toy_data", "getdata")

# Create a function from a single file
proj.set_function("./src/mycode.py", "ingest")

# Creating a function with project source
proj.set_function(
    "./src/mycode.py", "ingest", image="myrepo/ing:latest", with_repo=True
)
proj.set_function("ingest", handler="package.package.func", with_repo=True)

# Creating a function with non project source
func = proj.set_function(
    "ingest", handler="package.package.func", with_repo=False
)
func.with_source_archive("git://github.com/mlrun/something.git")

# Set function requirements

# By providing a list of packages
proj.set_function("my.py", requirements=["requests", "pandas"])

# By providing a path to a pip requirements file
proj.set_function("my.py", requirements="requirements.txt")

Learn more about Kinds of functions (runtimes).

Parameters:

func -- Function object or spec/code url, None refers to current Notebook
name -- Name of the function (under the project), can be specified with a tag to support Versions (e.g. myfunc:v1). If the tag parameter is provided, the tag in the name must match the tag parameter. Specifying a tag in the name will update the project's tagged function (myfunc:v1)
kind --
Default: job. One of
- local: execute a local python or shell script
- job: insert the code into a Kubernetes pod and execute it
- nuclio: insert the code into a real-time serverless nuclio function
- serving: insert code into orchestrated nuclio function(s) forming a DAG
- dask: run the specified python code / script as Dask Distributed job
- mpijob: run distributed Horovod jobs over the MPI job operator
- spark: run distributed Spark job using Spark Kubernetes Operator
- remote-spark: run distributed Spark job on remote Spark service
- databricks: run code on Databricks cluster (python scripts, Spark etc.)
- application: run a long living application (e.g. a web server, UI, etc.)
- handler: execute a python handler (used automatically in notebooks or for debug)
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source - use when the function code is in the project repo (project.spec.source).
tag -- Function version tag to set (none for current or 'latest') Specifying a tag as a parameter will update the project's tagged function (myfunc:v1) and the untagged function (myfunc)
requirements -- A list of python packages
requirements_file -- Path to a python requirements file

Returns:

BaseRuntime

set_model_monitoring_credentials(*, tsdb_profile_name: str, stream_profile_name: str, replace_creds: bool = False) → None[source]#

Set the credentials that will be used by the project's model monitoring infrastructure functions. Please note that you have to set the credentials before deploying any model monitoring application or a tracked serving function.

For example, the full flow for enabling model monitoring infrastructure with TimescaleDB and Kafka, is:

import mlrun
from mlrun.datastore.datastore_profile import (
    DatastoreProfileKafkaStream,
    DatastoreProfilePostgreSQL,
)

project = mlrun.get_or_create_project("mm-infra-setup")

# Create and register TSDB profile
tsdb_profile = DatastoreProfilePostgreSQL(
    name="my-timescaledb",
    host="<timescaledb-server-ip-address>",
    port=5432,
    user="postgres",
    password="<timescaledb-password>",
    database="mlrun",
)
project.register_datastore_profile(tsdb_profile)

# Create and register stream profile
stream_profile = DatastoreProfileKafkaStream(
    name="my-kafka",
    brokers=["<kafka-broker-ip-address>:9094"],
    topics=[],  # Keep the topics list empty
    ## SASL is supported
    # sasl_user="<kafka-sasl-user>",
    # sasl_pass="<kafka-sasl-password>",
)
project.register_datastore_profile(stream_profile)

# Set model monitoring credentials and enable the infrastructure
project.set_model_monitoring_credentials(
    tsdb_profile_name=tsdb_profile.name,
    stream_profile_name=stream_profile.name,
)
project.enable_model_monitoring()

Note that you will need to change the profiles if you want to use V3IO TSDB and stream:

from mlrun.datastore.datastore_profile import DatastoreProfileV3io

# Create and register TSDB profile
tsdb_profile = DatastoreProfileV3io(
    name="my-v3io-tsdb",
)
project.register_datastore_profile(tsdb_profile)

# Create and register stream profile
stream_profile = DatastoreProfileV3io(
    name="my-v3io-stream",
    v3io_access_key=mlrun.mlconf.get_v3io_access_key(),
)
project.register_datastore_profile(stream_profile)

In the V3IO datastore, you must provide an explicit access key to the stream, but not to the TSDB.

An external Confluent Kafka stream is also supported. Here is an example:

from mlrun.datastore.datastore_profile import DatastoreProfileKafkaStream

stream_profile = DatastoreProfileKafkaStream(
    name="confluent-kafka",
    brokers=["<server-domain-start>.confluent.cloud:9092"],
    topics=[],
    sasl_user="<API-key>",
    sasl_pass="<API-secret>",
    kwargs_public={
        "security_protocol": "SASL_SSL",
        "api_version_auto_timeout_ms": 15_000,  # 15 seconds
        "tls": {"enable": True},
        "new_topic": {"replication_factor": 3},
    },
)

The replication factor and timeout configuration might need to be adjusted according to your Confluent cluster type and settings. Nuclio annotations for the model monitoring infrastructure and application functions are supported through kwargs_public={"nuclio_annotations": {...}, ...}.

Parameters:

tsdb_profile_name --
The datastore profile name of the time-series database to be used in model monitoring. The supported profiles are:
- DatastoreProfileV3io
- DatastoreProfilePostgreSQL
You need to register one of them, and pass the profile's name.
stream_profile_name --
The datastore profile name of the stream to be used in model monitoring. The supported profiles are:
- DatastoreProfileV3io
- DatastoreProfileKafkaStream
You need to register one of them, and pass the profile's name.
replace_creds -- If True - override the existing credentials. Please keep in mind that if you have already enabled model monitoring on your project, replacing the credentials can cause data loss, and will require redeploying all the model monitoring functions, model monitoring infrastructure, and tracked model servers.

Update or add a monitoring function to the project. Note: to deploy the function after linking it to the project, call fn.deploy() where fn is the object returned by this method.

Example:

project.set_model_monitoring_function(
    name="myApp", application_class="MyApp", image="mlrun/mlrun"
)

Parameters:

func -- Remote function object or spec/code URL. None refers to the current notebook. May also be a hub URL of a module of kind model-monitoring-app in the format: hub://[{source}/]{name}[:{tag}].
name -- Name of the function (under the project), can be specified with a tag to support versions (e.g. myfunc:v1).
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source
tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')
requirements -- A list of python packages
requirements_file -- Path to a python requirements file
application_class -- Name or an Instance of a class that implements the monitoring application.
application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.
local_path -- Path to a local directory to save the downloaded monitoring-app code files in, in case 'func' is a hub URL (defaults to current working directory).

Returns:

The model monitoring remote function object.

set_model_monitoring_lag_alert(notifications: list[Notification] | Notification, *, period: str | None = None, count: int | None = None) → None[source]#

Configure an alert for model monitoring lag detection.

When the monitoring infrastructure detects that it is processing events whose inference timestamp is significantly in the past, it emits a MODEL_MONITORING_LAG_DETECTED event. This method creates a single alert configuration (using a wildcard entity) that matches lag events from any monitoring component.

Parameters:

notifications -- One or more notification objects to attach to the alert (e.g. Slack webhook, email).
period -- Optional sliding-window period for the alert criteria (e.g. "10m").
count -- Number of events within period before the alert fires. Defaults to 1.

set_remote(url, name='origin', branch=None, overwrite=True)[source]#

Create or update a remote for the project git repository.

This method allows you to manage remote repositories associated with the project. It checks if a remote with the specified name already exists.

If a remote with the same name does not exist, it will be created. If a remote with the same name already exists, the behavior depends on the value of the 'overwrite' flag.

Parameters:

url -- remote git url
name -- name for the remote (default is 'origin')
branch -- Git branch to use as source
overwrite -- if True (default), updates the existing remote with the given URL if it already exists. if False, raises an error when attempting to create a remote with a name that already exists.

Raises:

MLRunConflictError -- If a remote with the same name already exists and overwrite is set to False.

set_secrets(secrets: dict | None = None, file_path: str | None = None, provider: str | SecretProviderName = None)[source]#

Set project secrets from dict or secrets env file when using a secrets file it should have lines in the form KEY=VALUE, comment line start with "#" V3IO paths/credentials and MLrun service API address are dropped from the secrets

Example secrets file:

# this is an env file
AWS_ACCESS_KEY_ID=XXXX
AWS_SECRET_ACCESS_KEY=YYYY

usage:

# read env vars from dict or file and set as project secrets project.set_secrets({"SECRET1": "value"}) project.set_secrets(file_path="secrets.env")

Parameters:

secrets -- dict with secrets key/value
file_path -- path to secrets file
provider -- MLRun secrets provider

set_source(source: str = '', pull_at_runtime: bool = False, workdir: str | None = None)[source]#

set the project source code path(can be git/tar/zip archive)

Parameters:

source -- valid absolute path or URL to git, zip, or tar file, (or None for current) e.g. git://github.com/mlrun/something.git http://some/url/file.zip note path source must exist on the image or exist locally when run is local (it is recommended to use 'workdir' when source is a filepath instead)
pull_at_runtime -- load the archive into the container at job runtime vs on build/deploy
workdir -- workdir path relative to the context dir or absolute

Add or update a workflow, specify a name and the code path

Parameters:

name -- Name of the workflow
workflow_path -- URL (remote) / Path (absolute or relative to the project code path i.e. <project.spec.get_code_path()>/<workflow_path>) for the workflow file.
embed -- Add the workflow code into the project.yaml
engine -- Workflow processing engine ("kfp", "local", "remote" or "remote:local")
args_schema -- List of arg schema definitions (:py:class`~mlrun.model.EntrypointParam`)
handler -- Workflow function handler
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron Note that "local" engine does not support this argument
ttl -- Pipeline ttl in secs (after that the pods will be removed)
image -- Image for workflow runner job, only for scheduled and remote workflows. The image must have mlrun[kfp] installed which requires python 3.9. Therefore, the project default image will not be used for the workflow, and the image must be specified explicitly.
args -- Argument values (key=value, ..)

setup(save: bool = True) → MlrunProject[source]#

Run the project setup file if found

When loading a project MLRun will look for a project_setup.py file, if it is found it will execute the setup(project) handler, which can enrich the project with additional objects, functions, artifacts, etc.

Parameters:: save -- save the project after the setup

property source: str#

property spec: ProjectSpec#

property status: ProjectStatus#

store_alert_config(alert_data: AlertConfig, alert_name: str | None = None, force_reset: bool = False) → AlertConfig[source]#

Create/modify an alert.

Parameters:

alert_data -- The data of the alert.
alert_name -- The name of the alert.
force_reset -- If True and the alert already exists, the alert would be reset.

Returns:

the created/modified alert.

store_api_gateway(api_gateway: APIGateway, wait_for_readiness=True, max_wait_time=90) → APIGateway[source]#

Creates or updates a Nuclio API Gateway using the provided APIGateway object.

This method interacts with the MLRun service to create/update a Nuclio API Gateway based on the provided APIGateway object. Once done, it returns the updated APIGateway object containing all fields propagated on MLRun and Nuclio sides, such as the 'host' attribute. Nuclio docs here: https://docs.nuclio.io/en/latest/reference/api-gateway/http.html

Parameters:

api_gateway -- An instance of APIGateway representing the configuration of the API Gateway to be created or updated.
wait_for_readiness -- (Optional) A boolean indicating whether to wait for the API Gateway to become ready after creation or update (default is True).
max_wait_time -- (Optional) Maximum time to wait for API Gateway readiness in seconds (default is 90s)

Returns:

An instance of APIGateway with all fields populated based on the information retrieved from the Nuclio API

sync_functions(names: list | None = None, always: bool = True, save: bool = False, silent: bool = False)[source]#

Reload function objects from specs and files. The function objects are synced against the definitions spec in self.spec._function_definitions. Referenced files/URLs in the function spec will be reloaded. Function definitions are parsed by the following precedence:

Contains runtime spec.
Contains module in the project's context.
Contains path to function definition (yaml, DB, Hub).
Contains path to .ipynb or .py files.
Contains a Nuclio/Serving function image / an 'Application' kind definition.

If function definition is already an object, some project metadata updates will apply however, it will not be reloaded.

Parameters:

names -- Names of functions to reload, defaults to self.spec._function_definitions.keys().
always -- Force reloading the functions.
save -- Whether to save the loaded functions or not.
silent -- Whether to raise an exception when a function fails to load.

Returns:

Dictionary of function objects

update_artifact(artifact_object: Artifact)[source]#

update_model_monitoring_controller(base_period: int = 10, image: str = 'mlrun/mlrun', *, wait_for_deployment: bool = False) → None[source]#

Redeploy model monitoring application controller functions.

Parameters:

base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes.
image -- The image of the model monitoring controller, writer & monitoring stream functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.
wait_for_deployment -- If true, return only after the deployment is done on the backend. Otherwise, deploy the controller on the background.

with_secrets(kind, source, prefix='')[source]#

register a secrets source (file, env or dict)

read secrets from a source provider to be used in workflows, example:

proj.with_secrets("file", "file.txt")
proj.with_secrets("inline", {"key": "val"})
proj.with_secrets("env", "ENV1,ENV2", prefix="PFX_")

Vault secret source has several options:

proj.with_secrets('vault', {'user': <user name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', {'project': <proj.name>, 'secrets': ['secret1', 'secret2' ...]})
proj.with_secrets('vault', ['secret1', 'secret2' ...])

The 2nd option uses the current project name as context. Can also use empty secret list:

proj.with_secrets("vault", [])

This will enable access to all secrets in vault registered to the current project.

Parameters:

kind -- secret type (file, inline, env, vault, azure_vault)
source -- secret data or link (see example)
prefix -- add a prefix to the keys in this source

Returns:

project object

property workflows: list#

class mlrun.projects.ProjectMetadata(name=None, created=None, labels=None, annotations=None)[source]#

Bases: ModelObj

property name: str#: Project name

static validate_project_labels(labels: dict, raise_on_failure: bool = True) → bool[source]#: This https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set

static validate_project_name(name: str, raise_on_failure: bool = True) → bool[source]#

class mlrun.projects.ProjectSpec(description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, source=None, subpath=None, origin_url=None, goals=None, load_source_on_run=None, default_requirements: str | list[str] | None = None, desired_state='online', owner=None, disable_auto_mount=None, workdir=None, default_image=None, build=None, custom_packagers: list[tuple[str, bool]] | None = None, default_function_node_selector=None, notifications=None)[source]#

Bases: ModelObj

add_custom_packager(packager: str, is_mandatory: bool)[source]#

Add a custom packager from the custom packagers list.

Parameters:

packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".
is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.

property artifacts: list#: list of artifacts used in this project

property build: ImageBuilder#

property default_function_node_selector#

property functions: list#: list of function object/specs used in this project

get_code_path()[source]#: Get the path to the code root/workdir

property mountdir: str#: specify to mount the context dir inside the function container use '.' to use the same path as in the client e.g. Jupyter

remove_artifact(key)[source]#

remove_custom_packager(packager: str)[source]#

Remove a custom packager from the custom packagers list.

Parameters:: packager -- The packager module path to remove.
Raises:: MLRunInvalidArgumentError -- In case the packager was not in the list.

remove_function(name)[source]#

remove_workflow(name)[source]#

set_artifact(key, artifact)[source]#

set_function(name, function_object, function_dict)[source]#

set_workflow(name, workflow)[source]#

property source: str#: source url or git repo

property workflows: list[dict]#

list of workflows specs dicts used in this project

Type:: returns

class mlrun.projects.ProjectStatus(state=None)[source]#: Bases: ModelObj

mlrun.projects.build_function(function: str | BaseRuntime, with_mlrun: bool | None = None, skip_deployed: bool = False, image=None, base_image=None, commands: list | None = None, secret_name=None, requirements: str | list[str] | None = None, requirements_file: str | None = None, mlrun_version_specifier=None, builder_env: dict | None = None, project_object=None, overwrite_build_params: bool = True, extra_args: str | None = None, force_build: bool = False) → BuildStatus | DummyContainerOp[source]#

deploy ML function, build container with its dependencies

Parameters:

function -- Name of the function (in the project) or function object
with_mlrun -- Add the current mlrun package to the container build
skip_deployed -- Skip the build if we already have an image for the function
image -- Target image name/path
base_image -- Base image name/path (commands and source code will be added to it)
commands -- List of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- K8s secret for accessing the docker registry
requirements -- List of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
project_object -- Override the project object to use, will default to the project set in the runtime context.
overwrite_build_params --
Overwrite existing build configuration (currently only applies to requirements and commands).
- False: The values passed in this call are merged with the project's stored values.
- True: The values passed in this call replace the project's stored values for commands and requirements. Parameters not explicitly passed retain their stored values.
To remove existing stored values, use overwrite_build_params=True and pass the values explicitly like this (commands=[""], requirements=[""]).
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
force_build -- Force building the image, even when no changes were made

deploy real-time (nuclio based) functions

Parameters:

function -- name of the function (in the project) or function object
models -- list of model items
env -- dict of extra environment variables
tag -- extra version tag
verbose -- add verbose prints/logs
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
mock -- deploy mock server vs a real Nuclio function (for local simulations)
project_object -- override the project object to use, will default to the project set in the runtime context.

mlrun.projects.get_or_create_project(name: str, context: str = './', url: str | None = None, secrets: dict | None = None, init_git=False, subpath: str | None = None, clone: bool = False, user_project: bool = False, from_template: str | None = None, save: bool = True, parameters: dict | None = None, allow_cross_project: bool | None = None) → MlrunProject[source]#

Load a project from MLRun DB, or create/import if it does not exist. The project will become the active project for the current session.

MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.

Usage example:

# load project from the DB (if exist) or the source repo
project = get_or_create_project(
    "myproj", "./", "git://github.com/mlrun/demo-xgb-project.git"
)
project.pull("development")  # pull the latest code from git
project.run("main", arguments={"data": data_url})  # run the workflow "main"

project_setup.py example:

def setup(project):
    train_function = project.set_function(
        "src/trainer.py",
        name="mpi-training",
        kind="mpijob",
        image="mlrun/mlrun",
    )
    # Set the number of replicas for the training from the project parameter
    train_function.spec.replicas = project.spec.params.get("num_replicas", 1)
    return project

Parameters:

name -- project name
context -- project local directory path (default value = "./")
url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip
secrets -- key:secret dict or SecretsStore used to download sources
init_git -- if True, will execute git init on the context dir
subpath -- project subpath (within the archive/context)
clone -- if True, always clone (delete any existing content)
user_project -- add the current username to the project name (for db:// prefixes)
from_template -- path to project YAML file that will be used as from_template (for new projects)
save -- whether to save the created project in the DB
parameters -- key/value pairs to add to the project.spec.params
allow_cross_project -- if True, override the loaded project name. This flag ensures awareness of loading an existing project yaml as a baseline for a new project with a different name

Returns:

project object

mlrun.projects.load_project(context: str = './', url: str | None = None, name: str | None = None, secrets: dict | None = None, init_git: bool = False, subpath: str | None = None, clone: bool = False, user_project: bool = False, save: bool = True, sync_functions: bool = False, parameters: dict | None = None, allow_cross_project: bool | None = None) → MlrunProject[source]#

Load an MLRun project from git or tar or dir. The project will become the active project for the current session.

MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.

Usage example:

# Load the project and run the 'main' workflow.
# When using git as the url source the context directory must be an empty or
# non-existent folder as the git repo will be cloned there
project = load_project("./demo_proj", "git://github.com/mlrun/project-demo.git")
project.run("main", arguments={"data": data_url})

project_setup.py example:

def setup(project):
    train_function = project.set_function(
        "src/trainer.py",
        name="mpi-training",
        kind="mpijob",
        image="mlrun/mlrun",
    )
    # Set the number of replicas for the training from the project parameter
    train_function.spec.replicas = project.spec.params.get("num_replicas", 1)
    return project

Parameters:

context -- project local directory path (default value = "./")
url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip <project-name> The git project should include the project yaml file. If the project yaml file is in a sub-directory, must specify the sub-directory.
name -- project name
secrets -- key:secret dict or SecretsStore used to download sources
init_git -- if True, will git init the context dir
subpath -- project subpath (within the archive)
clone -- if True, always clone (delete any existing content)
user_project -- add the current username to the project name (for db:// prefixes)
save -- whether to save the created project and artifact in the DB
sync_functions -- sync the project's functions into the project object (will be saved to the DB if save=True)
parameters -- key/value pairs to add to the project.spec.params
allow_cross_project -- if True, override the loaded project name. This flag ensures awareness of loading an existing project yaml as a baseline for a new project with a different name

Returns:

project object

mlrun.projects.new_project(name, context: str = './', init_git: bool = False, user_project: bool = False, remote: str | None = None, from_template: str | None = None, secrets: dict | None = None, description: str | None = None, subpath: str | None = None, save: bool = True, overwrite: bool = False, parameters: dict | None = None, default_function_node_selector: dict | None = None) → MlrunProject[source]#

Create a new MLRun project, optionally load it from a yaml/zip/git template. The project will become the active project for the current session.

A new project is created and returned, you can customize the project by placing a project_setup.py file in the project root dir, it will be executed upon project creation or loading.

Example:

# create a project with local and hub functions, a workflow, and an artifact
project = mlrun.new_project(
    "myproj", "./", init_git=True, description="my new project"
)
project.set_function(
    "prep_data.py", "prep-data", image="mlrun/mlrun", handler="prep_data"
)
project.set_function("hub://auto-trainer", "train")
project.set_artifact("data", Artifact(target_path=data_url))
project.set_workflow("main", "./myflow.py")
project.save()

# run the "main" workflow (watch=True to wait for run completion)
project.run("main", watch=True)

Example (load from template):

# create a new project from a zip template (can also use yaml/git templates)
# initialize a local git, and register the git remote path
project = mlrun.new_project(
    "myproj",
    "./",
    init_git=True,
    remote="git://github.com/mlrun/project-demo.git",
    from_template="http://mysite/proj.zip",
)
project.run("main", watch=True)

Example using project_setup.py to init the project objects:

def setup(project):
    project.set_function(
        "prep_data.py",
        "prep-data",
        image="mlrun/mlrun",
        handler="prep_data",
    )
    project.set_function("hub://auto-trainer", "train")
    project.set_artifact("data", Artifact(target_path=data_url))
    project.set_workflow("main", "./myflow.py")
    return project

Parameters:

name -- project name
context -- project local directory path (default value = "./")
init_git -- if True, will git init the context dir
user_project -- add the current username to the provided project name (making it unique per user)
remote -- remote Git url
from_template -- path to project YAML/zip file that will be used as a template
secrets -- key:secret dict or SecretsStore used to download sources
description -- text describing the project
subpath -- project subpath (relative to the context dir)
save -- whether to save the created project in the DB
overwrite -- overwrite project using 'cascade' deletion strategy (deletes project resources) if project with name exists
parameters -- key/value pairs to add to the project.spec.params
default_function_node_selector -- defines the default node selector for scheduling functions within the project

Returns:

project object

Run a local or remote task as part of a local/kubeflow pipeline

run_function() allows you to execute a function locally, on a remote cluster, or as part of an automated workflow. The function can be specified as an object or by name (str). When the function is specified by name it is looked up in the current project, eliminating the need to redefine/edit functions.

When functions run as part of a workflow/pipeline (project.run()) some attributes can be set at the run level, e.g. local=True will run all the functions locally, setting artifact_path will direct all outputs to the same path. Project runs provide additional notifications/reporting and exception handling. Inside a Kubeflow pipeline (KFP) run_function() generates KFP node (see PipelineNodeWrapper) which forms a DAG. Some behavior may differ between regular runs and deferred KFP runs.

Example (use with function object):

LABELS = "is_error"
MODEL_CLASS = "sklearn.ensemble.RandomForestClassifier"
DATA_PATH = "s3://bigdata/data.parquet"
function = mlrun.import_function("hub://auto-trainer")
run1 = run_function(
    function,
    params={"label_columns": LABELS, "model_class": MODEL_CLASS},
    inputs={"dataset": DATA_PATH},
)

Example (use with project):

# create a project with two functions (local and from hub)
project = mlrun.new_project(project_name, "./proj)
project.set_function("mycode.py", "myfunc", image="mlrun/mlrun")
project.set_function("hub://auto-trainer", "train")

# run functions (refer to them by name)
run1 = run_function("myfunc", params={"x": 7})
run2 = run_function("train", params={"label_columns": LABELS, "model_class": MODEL_CLASS},
                             inputs={"dataset": run1.outputs["data"]})

Example (use in pipeline):

@dsl.pipeline(name="test pipeline", description="test")
def my_pipe(url=""):
    run1 = run_function("loaddata", params={"url": url}, outputs=["data"])
    run2 = run_function(
        "train",
        params={"label_columns": LABELS, "model_class": MODEL_CLASS},
        inputs={"dataset": run1.outputs["data"]},
    )


project.run(workflow_handler=my_pipe, arguments={"param1": 7})

Parameters:

function -- name of the function (in the project) or function object
handler -- name of the function handler
name -- execution name
params -- input parameters (dict)
hyperparams -- hyper parameters
selector -- selection criteria for hyper params e.g. "max.accuracy"
hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see: HyperParamOptions
inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".
outputs -- list of outputs which can pass in the workflow
workdir -- working directory of the executed job and the default path for artifact inputs
labels -- labels to tag the job/run with ({key:val, ..})
base_task -- task object to use as base
watch -- watch/follow run log, True by default
local -- run the function locally vs on the runtime/cluster
verbose -- add verbose prints/logs
project_object -- override the project object to use, will default to the project set in the runtime context.
auto_build -- when set to True and the function require build it will be built on the first function run, use only if you do not plan on changing the build config between runs
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron
artifact_path -- (deprecated) path to store artifacts, when running in a workflow this will be set automatically
notifications -- list of notifications to push when the run is completed
returns --
List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:
- A LogHint object with the key and extra configurations.
- A "shortcut" string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used. Packing kwargs can be passed alongside the artifact type using square brackets: "<key> : <type>[<kwarg1>=<value1>, <kwarg2>=<value2>]". Itemization can also be specified before the key using the following structure: "<unbundle-level> * <key>". If unbundle level is not specified, the default is full unbundling.
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
reset_on_run -- When True, function python modules would reload prior to code execution. This ensures latest code changes are executed. This argument must be used in conjunction with the local=True argument.
output_path -- path to store artifacts, when running in a workflow this will be set automatically
retry -- Retry configuration for the run, can be a dict or an instance of Retry. The count field in the Retry object specifies the number of retry attempts. If count=0, the run will not be retried. The backoff field specifies the retry backoff strategy between retry attempts. If not provided, the default backoff delay is 30 seconds.

Returns:

MLRun RunObject or PipelineNodeWrapper

mlrun.projects

Contents

mlrun.projects#