mlrun.projects#
- class mlrun.projects.MlrunProject(metadata: ProjectMetadata | dict | None = None, spec: ProjectSpec | dict | None = None)[source]#
Bases:
ModelObj
- add_custom_packager(packager: str, is_mandatory: bool)[source]#
Add a custom packager from the custom packagers list. All project's custom packagers are added to each project function.
Notice that in order to run a function with the custom packagers included, you must set a source for the project (using the project.set_source method) with the parameter pull_at_runtime=True so the source code of the packagers will be able to be imported.
- Parameters:
packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".
is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.
- property artifact_path: str#
- build_config(image: str | None = None, set_as_default: bool = False, with_mlrun: bool | None = None, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, builder_env: dict | None = None, extra_args: str | None = None, source_code_target_dir: str | None = None)[source]#
specify builder configuration for the project
- Parameters:
image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used
set_as_default -- set image to be the project's default image (default False)
with_mlrun -- add the current mlrun package to the container build
base_image -- base image name/path
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- a list of packages to install on the built image
requirements_file -- requirements file to install on the built image
overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
source_code_target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)
- build_function(function: str | BaseRuntime, with_mlrun: bool | None = None, skip_deployed: bool = False, image: str | None = None, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, mlrun_version_specifier: str | None = None, builder_env: dict | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, extra_args: str | None = None, force_build: bool = False) BuildStatus | ContainerOp [source]#
deploy ML function, build container with its dependencies
- Parameters:
function -- name of the function (in the project) or function object
with_mlrun -- add the current mlrun package to the container build
skip_deployed -- skip the build if we already have an image for the function
image -- target image name/path
base_image -- base image name/path (commands and source code will be added to it)
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- list of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
force_build -- force building the image, even when no changes were made
- build_image(image: str | None = None, set_as_default: bool = True, with_mlrun: bool | None = None, base_image: str | None = None, commands: list | None = None, secret_name: str | None = None, requirements: str | list[str] | None = None, mlrun_version_specifier: str | None = None, builder_env: dict | None = None, overwrite_build_params: bool = False, requirements_file: str | None = None, extra_args: str | None = None, target_dir: str | None = None) BuildStatus | ContainerOp [source]#
Builder docker image for the project, based on the project's build config. Parameters allow to override the build config. If the project has a source configured and pull_at_runtime is not configured, this source will be cloned to the image built. The target_dir parameter allows specifying the target path where the code will be extracted.
- Parameters:
image -- target image name/path. If not specified the project's existing default_image name will be used. If not set, the mlconf.default_project_image_name value will be used
set_as_default -- set image to be the project's default image (default False)
with_mlrun -- add the current mlrun package to the container build
base_image -- base image name/path (commands and source code will be added to it) defaults to mlrun.mlconf.default_base_image
commands -- list of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- k8s secret for accessing the docker registry
requirements -- list of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
target_dir -- Path on the image where source code would be extracted (by default /home/mlrun_code)
- property context: str#
- create_model_monitoring_function(func: str | None = None, application_class: str | ModelMonitoringApplicationBase | None = None, name: str | None = None, image: str | None = None, handler: str | None = None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '', **application_kwargs) BaseRuntime [source]#
Create a monitoring function object without setting it to the project
examples:
project.create_model_monitoring_function( application_class_name="MyApp", image="mlrun/mlrun", name="myApp" )
- Parameters:
func -- Code url, None refers to current Notebook
name -- Name of the function, can be specified with a tag to support versions (e.g. myfunc:v1) Default: job
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source
tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')
requirements -- A list of python packages
requirements_file -- Path to a python requirements file
application_class -- Name or an Instance of a class that implementing the monitoring application.
application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.
- create_remote(url, name='origin', branch=None)[source]#
Create remote for the project git
This method creates a new remote repository associated with the project's Git repository. If a remote with the specified name already exists, it will not be overwritten.
If you wish to update the URL of an existing remote, use the set_remote method instead.
- Parameters:
url -- remote git url
name -- name for the remote (default is 'origin')
branch -- Git branch to use as source
- property default_function_node_selector: dict#
- property default_image: str#
- delete_alert_config(alert_data: AlertConfig | None = None, alert_name: str | None = None)[source]#
Delete an alert.
- Parameters:
alert_data -- The data of the alert.
alert_name -- The name of the alert to delete.
- delete_api_gateway(name: str)[source]#
Deletes an API gateway by name.
- Parameters:
name -- The name of the API gateway to delete.
- delete_artifact(item: Artifact, deletion_strategy: ArtifactsDeletionStrategies = ArtifactsDeletionStrategies.metadata_only, secrets: dict | None = None)[source]#
Delete an artifact object in the DB and optionally delete the artifact data
- Parameters:
item -- Artifact object (can be any type, such as dataset, model, feature store).
deletion_strategy -- The artifact deletion strategy types.
secrets -- Credentials needed to access the artifact data.
- delete_model_monitoring_function(name: str | list[str])[source]#
delete the specified model-monitoring-app function/s
- Parameters:
name -- name of the model-monitoring-function/s (under the project)
- deploy_function(function: str | BaseRuntime, models: list | None = None, env: dict | None = None, tag: str | None = None, verbose: bool | None = None, builder_env: dict | None = None, mock: bool | None = None) DeployStatus | ContainerOp [source]#
deploy real-time (nuclio based) functions
- Parameters:
function -- name of the function (in the project) or function object
models -- list of model items
env -- dict of extra environment variables
tag -- extra version tag
verbose -- add verbose prints/logs
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
mock -- deploy mock server vs a real Nuclio function (for local simulations)
- deploy_histogram_data_drift_app(*, image: str = 'mlrun/mlrun', db: RunDBInterface | None = None, wait_for_deployment: bool = False) None [source]#
Deploy the histogram data drift application.
- Parameters:
image -- The image on which the application will run.
db -- An optional DB object.
wait_for_deployment -- If true, return only after the deployment is done on the backend. Otherwise, deploy the application on the background.
- property description: str#
- disable_model_monitoring(*, delete_resources: bool = True, delete_stream_function: bool = False, delete_histogram_data_drift_app: bool = True, delete_user_applications: bool = False, user_application_list: list[str] | None = None) None [source]#
Disable model monitoring application controller, writer, stream, histogram data drift application and the user's applications functions, according to the given params.
- Parameters:
delete_resources -- If True, it would delete the model monitoring controller & writer functions. Default True
delete_stream_function -- If True, it would delete model monitoring stream function, need to use wisely because if you're deleting this function this can cause data loss in case you will want to enable the model monitoring capability to the project. Default False.
delete_histogram_data_drift_app -- If True, it would delete the default histogram-based data drift application. Default False.
delete_user_applications -- If True, it would delete the user's model monitoring application according to user_application_list, Default False.
user_application_list -- List of the user's model monitoring application to disable. Default all the applications. Note: you have to set delete_user_applications to True in order to delete the desired application.
- enable_model_monitoring(default_controller_image: str = 'mlrun/mlrun', base_period: int = 10, image: str = 'mlrun/mlrun', *, deploy_histogram_data_drift_app: bool = True, wait_for_deployment: bool = False, rebuild_images: bool = False, fetch_credentials_from_sys_config: bool = False) None [source]#
Deploy model monitoring application controller, writer and stream functions. While the main goal of the controller function is to handle the monitoring processing and triggering applications, the goal of the model monitoring writer function is to write all the monitoring application results to the databases. The stream function goal is to monitor the log of the data stream. It is triggered when a new log entry is detected. It processes the new events into statistics that are then written to statistics databases.
- Parameters:
default_controller_image -- Deprecated.
base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes (which is also the minimum value for production environments).
image -- The image of the model monitoring controller, writer, monitoring stream & histogram data drift functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.
deploy_histogram_data_drift_app -- If true, deploy the default histogram-based data drift application.
wait_for_deployment -- If true, return only after the deployment is done on the backend. Otherwise, deploy the model monitoring infrastructure on the background, including the histogram data drift app if selected.
rebuild_images -- If true, force rebuild of model monitoring infrastructure images.
fetch_credentials_from_sys_config -- If true, fetch the credentials from the system configuration.
- export(filepath=None, include_files: str | None = None)[source]#
save the project object into a yaml file or zip archive (default to project.yaml)
By default, the project object is exported to a yaml file, when the filepath suffix is '.zip' the project context dir (code files) are also copied into the zip, the archive path can include DataItem urls (for remote object storage, e.g. s3://<bucket>/<path>).
- Parameters:
filepath -- path to store project .yaml or .zip (with the project dir content)
include_files -- glob filter string for selecting files to include in the zip archive
- get_alert_config(alert_name: str) AlertConfig [source]#
Retrieve an alert.
- Parameters:
alert_name -- The name of the alert to retrieve.
- Returns:
The alert object.
- get_alert_template(template_name: str) AlertTemplate [source]#
Retrieve a specific alert template.
- Parameters:
template_name -- The name of the template to retrieve.
- Returns:
The template object.
- get_api_gateway(name: str) APIGateway [source]#
Retrieves an API gateway by name instance.
- Parameters:
name -- The name of the API gateway to retrieve.
- Returns:
An instance of APIGateway.
- Return type:
mlrun.runtimes.nuclio.APIGateway
- get_artifact(key, tag=None, iter=None, tree=None)[source]#
Return an artifact object
- Parameters:
key -- artifact key
tag -- version tag
iter -- iteration number (for hyper-param tasks)
tree -- the producer id (tree)
- Returns:
Artifact object
- get_artifact_uri(key: str, category: str = 'artifact', tag: str | None = None, iter: int | None = None) str [source]#
return the project artifact uri (store://..) from the artifact key
example:
uri = project.get_artifact_uri("my_model", category="model", tag="prod", iter=0)
- Parameters:
key -- artifact key/name
category -- artifact category (artifact, model, feature-vector, ..)
tag -- artifact version tag, default to latest version
iter -- iteration number, default to no iteration
- get_custom_packagers() list[tuple[str, bool]] [source]#
Get the custom packagers registered in the project.
- Returns:
A list of the custom packagers module paths.
- get_datastore_profile(profile: str) DatastoreProfile [source]#
- get_function(key, sync=False, enrich=False, ignore_cache=False, copy_function=True, tag: str = '') BaseRuntime [source]#
get function object by name
- Parameters:
key -- name of key for search
sync -- will reload/reinit the function from the project spec
enrich -- add project info/config/source info to the function object
ignore_cache -- read the function object from the DB (ignore the local cache)
copy_function -- return a copy of the function object
tag -- provide if the function key is tagged under the project (function was set with a tag)
- Returns:
function object
- get_function_objects() FunctionsDict [source]#
"get a virtual dict with all the project functions ready for use in a pipeline
- get_item_absolute_path(url: str, check_path_in_context: bool = False) tuple[str, bool] [source]#
Get the absolute path of the artifact or function file :param url: remote url, absolute path or relative path :param check_path_in_context: if True, will check if the path exists when in the context (temporary parameter to allow for backwards compatibility)
- Returns:
absolute path / url, whether the path is in the project context
- get_run_status(run, timeout=None, expected_statuses=None, notifiers: CustomNotificationPusher | None = None)[source]#
- get_secret(key: str)[source]#
get a key based secret e.g. DB password from the context secrets can be specified when invoking a run through files, env, ..
- import_artifact(item_path: str, new_key=None, artifact_path=None, tag=None)[source]#
Import an artifact object/package from .yaml, .json, or .zip file
- Parameters:
item_path -- dataitem url or file path to the file/package
new_key -- overwrite the artifact key/name
artifact_path -- target artifact path (when not using the default)
tag -- artifact tag to set
- Returns:
artifact object
- kind = 'project'#
- list_alert_templates() list[mlrun.common.schemas.alert.AlertTemplate] [source]#
Retrieve list of all alert templates.
- Returns:
All the alert template objects in the database.
- list_alerts_configs() list[mlrun.alerts.alert.AlertConfig] [source]#
Retrieve list of alerts of a project.
- Returns:
All the alerts objects of the project.
- list_api_gateways() list[mlrun.runtimes.nuclio.api_gateway.APIGateway] [source]#
Retrieves a list of Nuclio API gateways associated with the project.
- Returns:
List of
APIGateway
objects representing the Nuclio API gateways associated with the project.
- list_artifacts(name=None, tag=None, labels: dict[str, str] | list[str] | None = None, since=None, until=None, iter: int | None = None, best_iteration: bool = False, kind: str | None = None, category: str | ArtifactCategories | None = None, tree: str | None = None, limit: int | None = None) ArtifactList [source]#
List artifacts filtered by various parameters.
The returned result is an ArtifactList (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, and .to_df() to convert to a DataFrame.
Examples:
# Get latest version of all artifacts in project latest_artifacts = project.list_artifacts("", tag="latest") # check different artifact versions for a specific artifact, return as objects list result_versions = project.list_artifacts("results", tag="*").to_objects()
- Parameters:
name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for
~name
may return artifacts namedmy_Name_1
orsurname
.tag -- Return artifacts assigned this tag.
labels -- Return artifacts that have these labels. Labels can either be a dictionary {"label": "value"} or a list of "label=value" (match label key and value) or "label" (match just label key) strings.
since -- Not in use in
HTTPRunDB
.until -- Not in use in
HTTPRunDB
.iter -- Return artifacts from a specific iteration (where
iter=0
means the root iteration). IfNone
(default) return artifacts from all iterations.best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using
best_iter
, theiter
parameter must not be used.kind -- Return artifacts of the requested kind.
category -- Return artifacts of the requested category.
tree -- Return artifacts of the requested tree.
limit -- Maximum number of artifacts to return.
- list_datastore_profiles() list[mlrun.datastore.datastore_profile.DatastoreProfile] [source]#
Returns a list of datastore profiles associated with the project. The information excludes private details, showcasing only public data.
- list_functions(name=None, tag=None, labels=None)[source]#
Retrieve a list of functions, filtered by specific criteria.
example:
functions = project.list_functions(tag="latest")
- Parameters:
name -- Return only functions with a specific name.
tag -- Return function versions with specific tags. To return only tagged functions, set tag to
"*"
.labels -- Return functions that have specific labels assigned to them.
- Returns:
List of function objects.
- list_model_monitoring_functions(name: str | None = None, tag: str | None = None, labels: list[str] | None = None) list | None [source]#
Retrieve a list of all the model monitoring functions. Example:
functions = project.list_model_monitoring_functions()
- Parameters:
name -- Return only functions with a specific name.
tag -- Return function versions with specific tags.
labels -- Return functions that have specific labels assigned to them.
- Returns:
List of function objects.
- list_models(name=None, tag=None, labels: dict[str, str] | list[str] | None = None, since=None, until=None, iter: int | None = None, best_iteration: bool = False, tree: str | None = None)[source]#
List models in project, filtered by various parameters.
Examples:
# Get latest version of all models in project latest_models = project.list_models("", tag="latest")
- Parameters:
name -- Name of artifacts to retrieve. Name with '~' prefix is used as a like query, and is not case-sensitive. This means that querying for
~name
may return artifacts namedmy_Name_1
orsurname
.tag -- Return artifacts assigned this tag.
labels -- Return artifacts that have these labels. Labels can either be a dictionary {"label": "value"} or a list of "label=value" (match label key and value) or "label" (match just label key) strings.
since -- Not in use in
HTTPRunDB
.until -- Not in use in
HTTPRunDB
.iter -- Return artifacts from a specific iteration (where
iter=0
means the root iteration). IfNone
(default) return artifacts from all iterations.best_iteration -- Returns the artifact which belongs to the best iteration of a given run, in the case of artifacts generated from a hyper-param run. If only a single iteration exists, will return the artifact from that iteration. If using
best_iter
, theiter
parameter must not be used.tree -- Return artifacts of the requested tree.
- list_runs(name: str | None = None, uid: str | list[str] | None = None, labels: str | list[str] | None = None, state: RunStates | None = None, states: list[mlrun.common.runtimes.constants.RunStates] | None = None, sort: bool = True, last: int = 0, iter: bool = False, start_time_from: datetime | None = None, start_time_to: datetime | None = None, last_update_time_from: datetime | None = None, last_update_time_to: datetime | None = None, **kwargs) RunList [source]#
Retrieve a list of runs, filtered by various options.
The returned result is a `` (list of dict), use .to_objects() to convert it to a list of RunObjects, .show() to view graphically in Jupyter, .to_df() to convert to a DataFrame, and compare() to generate comparison table and PCP plot.
Example:
# return a list of runs matching the name and label and compare runs = project.list_runs(name="download", labels="owner=admin") runs.compare() # multi-label filter can also be provided runs = project.list_runs(name="download", labels=["kind=job", "owner=admin"]) # If running in Jupyter, can use the .show() function to display the results project.list_runs(name="").show()
- Parameters:
name -- Name of the run to retrieve.
uid -- Unique ID of the run.
labels -- A list of labels to filter by. Label filters work by either filtering a specific value of a label (i.e. list("key=value")) or by looking for the existence of a given key (i.e. "key").
state -- Deprecated - List only runs whose state is specified.
states -- List only runs whose state is one of the provided states.
sort -- Whether to sort the result according to their start time. Otherwise, results will be returned by their internal order in the DB (order will not be guaranteed).
last -- Deprecated - currently not used (will be removed in 1.9.0).
iter -- If
True
return runs from all iterations. Otherwise, return only runs whoseiter
is 0.start_time_from -- Filter by run start time in
[start_time_from, start_time_to]
.start_time_to -- Filter by run start time in
[start_time_from, start_time_to]
.last_update_time_from -- Filter by run last update time in
(last_update_time_from, last_update_time_to)
.last_update_time_to -- Filter by run last update time in
(last_update_time_from, last_update_time_to)
.
- log_artifact(item, body=None, tag: str = '', local_path: str = '', artifact_path: str | None = None, format: str | None = None, upload: bool | None = None, labels: dict[str, str] | None = None, target_path: str | None = None, **kwargs) Artifact [source]#
Log an output artifact and optionally upload it to datastore
If the artifact already exists with the same key and tag, it will be overwritten.
example:
project.log_artifact( "some-data", body=b"abc is 123", local_path="model.txt", labels={"framework": "xgboost"}, )
- Parameters:
item -- artifact key or artifact object (can be any type, such as dataset, model, feature store)
body -- will use the body as the artifact content
local_path -- path to the local file we upload, will also be use as the destination subpath (under "artifact_path")
artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
format -- artifact file format: csv, png, ..
tag -- version tag
target_path -- absolute target path (instead of using artifact_path + local_path)
upload -- Whether to upload the artifact to the datastore. If not provided, and the local_path is not a directory, upload occurs by default. Directories are uploaded only when this flag is explicitly set to True.
labels -- a set of key/value labels to tag the artifact with
- Returns:
artifact object
- log_dataset(key, df, tag='', local_path=None, artifact_path=None, upload=None, labels=None, format='', preview=None, stats=None, target_path='', extra_data=None, label_column: str | None = None, **kwargs) DatasetArtifact [source]#
Log a dataset artifact and optionally upload it to datastore.
If the dataset already exists with the same key and tag, it will be overwritten.
example:
raw_data = { "first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"], "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"], "age": [42, 52, 36, 24, 73], "testScore": [25, 94, 57, 62, 70], } df = pd.DataFrame( raw_data, columns=["first_name", "last_name", "age", "testScore"] ) project.log_dataset("mydf", df=df, stats=True)
- Parameters:
key -- artifact key
df -- dataframe object
label_column -- name of the label column (the one holding the target (y) values)
local_path -- path to the local dataframe file that exists locally. The given file extension will be used to save the dataframe to a file If the file exists, it will be uploaded to the datastore instead of the given df.
artifact_path -- target artifact path (when not using the default). to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
tag -- version tag
format -- optional, format to use (csv, parquet, pq, tsdb, kv)
target_path -- absolute target path (instead of using artifact_path + local_path)
preview -- number of lines to store as preview in the artifact metadata
stats -- calculate and store dataset stats in the artifact metadata
extra_data -- key/value list of extra files/charts to link with this dataset
upload -- upload to datastore (default is True)
labels -- a set of key/value labels to tag the artifact with
- Returns:
artifact object
- log_model(key, body=None, framework='', tag='', model_dir=None, model_file=None, algorithm=None, metrics=None, parameters=None, artifact_path=None, upload=None, labels=None, inputs: list[mlrun.features.Feature] | None = None, outputs: list[mlrun.features.Feature] | None = None, feature_vector: str | None = None, feature_weights: list | None = None, training_set=None, label_column=None, extra_data=None, **kwargs) ModelArtifact [source]#
Log a model artifact and optionally upload it to datastore
If the model already exists with the same key and tag, it will be overwritten.
example:
project.log_model( "model", body=dumps(model), model_file="model.pkl", metrics=context.results, training_set=training_df, label_column="label", feature_vector=feature_vector_uri, labels={"app": "fraud"}, )
- Parameters:
key -- artifact key or artifact class ()
body -- will use the body as the artifact content
model_file -- path to the local model file we upload (see also model_dir) or to a model file data url (e.g. http://host/path/model.pkl)
model_dir -- path to the local dir holding the model file and extra files
artifact_path -- target artifact path (when not using the default) to define a subpath under the default location use: artifact_path=context.artifact_subpath('data')
framework -- name of the ML framework
algorithm -- training algorithm name
tag -- version tag
metrics -- key/value dict of model metrics
parameters -- key/value dict of model parameters
inputs -- ordered list of model input features (name, type, ..)
outputs -- ordered list of model output/result elements (name, type, ..)
upload -- upload to datastore (if not specified, defaults to True (uploads artifact))
labels -- a set of key/value labels to tag the artifact with
feature_vector -- feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])
feature_weights -- list of feature weights, one per input column
training_set -- training set dataframe, used to infer inputs & outputs
label_column -- which columns in the training set are the label (target) columns
extra_data -- key/value list of extra files/charts to link with this dataset value can be absolute path | relative path (to model dir) | bytes | artifact object
- Returns:
artifact object
- property metadata: ProjectMetadata#
- property mountdir: str#
- property name: str#
Project name, this is a property of the project metadata
- property notifiers#
- property params: dict#
- pull(branch: str | None = None, remote: str | None = None, secrets: SecretsStore | dict | None = None)[source]#
pull/update sources from git or tar into the context dir
- Parameters:
branch -- git branch, if not the current one
remote -- git remote, if other than origin
secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}
- push(branch, message=None, update=True, remote: str | None = None, add: list | None = None, author_name: str | None = None, author_email: str | None = None, secrets: SecretsStore | dict | None = None)[source]#
update spec and push updates to remote git repo
- Parameters:
branch -- target git branch
message -- git commit message
update -- update files (git add update=True)
remote -- git remote, default to origin
add -- list of files to add
author_name -- author's git user name to be used on this commit
author_email -- author's git user email to be used on this commit
secrets -- dict or SecretsStore with Git credentials e.g. secrets={"GIT_TOKEN": token}
- register_datastore_profile(profile: DatastoreProfile)[source]#
- reload(sync=False, context=None) MlrunProject [source]#
reload the project and function objects from the project yaml/specs
- Parameters:
sync -- set to True to load functions objects
context -- context directory (where the yaml and code exist)
- Returns:
project object
- remove_custom_packager(packager: str)[source]#
Remove a custom packager from the custom packagers list.
- Parameters:
packager -- The packager module path to remove.
- Raises:
MLRunInvalidArgumentError -- In case the packager was not in the list.
- remove_function(name)[source]#
remove the specified function from the project
- Parameters:
name -- name of the function (under the project)
- remove_model_monitoring_function(name: str | list[str])[source]#
delete the specified model-monitoring-app function/s
- Parameters:
name -- name of the model-monitoring-function/s (under the project)
- remove_remote(name)[source]#
Remove a remote from the project's Git repository.
This method removes the remote repository associated with the specified name from the project's Git repository.
- Parameters:
name -- Name of the remote to remove.
- reset_alert_config(alert_data: AlertConfig | None = None, alert_name: str | None = None)[source]#
Reset an alert.
- Parameters:
alert_data -- The data of the alert.
alert_name -- The name of the alert to reset.
- run(name: str | None = None, workflow_path: str | None = None, arguments: dict[str, Any] | None = None, artifact_path: str | None = None, workflow_handler: str | Callable | None = None, namespace: str | None = None, sync: bool = False, watch: bool = False, dirty: bool = False, engine: str | None = None, local: bool | None = None, schedule: str | ScheduleCronTrigger | bool | None = None, timeout: int | None = None, source: str | None = None, cleanup_ttl: int | None = None, notifications: list[mlrun.model.Notification] | None = None, workflow_runner_node_selector: dict[str, str] | None = None) _PipelineRunStatus [source]#
Run a workflow using kubeflow pipelines
- Parameters:
name -- Name of the workflow
workflow_path -- URL to a workflow file, if not a project workflow
arguments -- Kubeflow pipelines arguments (parameters)
artifact_path -- Target path/URL for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id.
workflow_handler -- Workflow function handler (for running workflow function directly)
namespace -- Kubernetes namespace if other than default
sync -- Force functions sync before run
watch -- Wait for pipeline completion
dirty -- Allow running the workflow when the git repo is dirty
engine -- Workflow engine running the workflow. Supported values are 'kfp' (default), 'local' or 'remote'. For setting engine for remote running use 'remote:local' or 'remote:kfp'.
local -- Run local pipeline with local functions (set local=True in function.run())
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron For using the pre-defined workflow's schedule, set schedule=True
timeout -- Timeout in seconds to wait for pipeline completion (watch will be activated)
source --
Source to use instead of the actual project.spec.source (used when engine is remote). Can be one of:
Remote URL which is loaded dynamically to the workflow runner.
A path to the project's context on the workflow runner's image. Path can be absolute or relative to project.spec.build.source_code_target_dir if defined (enriched when building a project image with source, see MlrunProject.build_image). For other engines the source is used to validate that the code is up-to-date.
cleanup_ttl -- Pipeline cleanup ttl in secs (time to wait after workflow completion, at which point the workflow and all its resources are deleted)
notifications -- List of notifications to send for workflow completion
workflow_runner_node_selector -- Defines the node selector for the workflow runner pod when using a remote engine. This allows you to control and specify where the workflow runner pod will be scheduled. This setting is only relevant when the engine is set to 'remote' or for scheduled workflows, and it will be ignored if the workflow is not run on a remote engine.
- Returns:
~py:class:~mlrun.projects.pipelines._PipelineRunStatus instance
- run_function(function: str | BaseRuntime, handler: str | None = None, name: str = '', params: dict | None = None, hyperparams: dict | None = None, hyper_param_options: HyperParamOptions | None = None, inputs: dict | None = None, outputs: list[str] | None = None, workdir: str = '', labels: dict | None = None, base_task: RunTemplate | None = None, watch: bool = True, local: bool | None = None, verbose: bool | None = None, selector: str | None = None, auto_build: bool | None = None, schedule: str | ScheduleCronTrigger | None = None, artifact_path: str | None = None, notifications: list[mlrun.model.Notification] | None = None, returns: list[Union[str, dict[str, str]]] | None = None, builder_env: dict | None = None, reset_on_run: bool | None = None) RunObject | ContainerOp [source]#
Run a local or remote task as part of a local/kubeflow pipeline
example (use with project):
# create a project with two functions (local and from hub) project = mlrun.new_project(project_name, "./proj") project.set_function("mycode.py", "myfunc", image="mlrun/mlrun") project.set_function("hub://auto-trainer", "train") # run functions (refer to them by name) run1 = project.run_function("myfunc", params={"x": 7}) run2 = project.run_function( "train", params={"label_columns": LABELS}, inputs={"dataset": run1.outputs["data"]}, )
- Parameters:
function -- name of the function (in the project) or function object
handler -- name of the function handler
name -- execution name
params -- input parameters (dict)
hyperparams -- hyper parameters
selector -- selection criteria for hyper params e.g. "max.accuracy"
hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see:
HyperParamOptions
inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".
outputs -- list of outputs which can pass in the workflow
workdir -- default input artifacts path
labels -- labels to tag the job/run with ({key:val, ..})
base_task -- task object to use as base
watch -- watch/follow run log, True by default
local -- run the function locally vs on the runtime/cluster
verbose -- add verbose prints/logs
auto_build -- when set to True and the function require build it will be built on the first function run, use only if you dont plan on changing the build config between runs
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron
artifact_path -- path to store artifacts, when running in a workflow this will be set automatically
notifications -- list of notifications to push when the run is completed
returns --
List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:
A string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used.
A dictionary of configurations to use when logging. Further info per object type and artifact type can be given there. The artifact key must appear in the dictionary as "key": "the_key".
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
reset_on_run -- When True, function python modules would reload prior to code execution. This ensures latest code changes are executed. This argument must be used in conjunction with the local=True argument.
- Returns:
MLRun RunObject or PipelineNodeWrapper
- save(filepath=None, store=True)[source]#
export project to yaml file and save project in database
- Store:
if True, allow updating in case project already exists
- save_to_db(store=True)[source]#
save project to database
- Store:
if True, allow updating in case project already exists
- save_workflow(name, target, artifact_path=None, ttl=None)[source]#
create and save a workflow as a yaml or archive file
- Parameters:
name -- workflow name
target -- target file path (can end with .yaml or .zip)
artifact_path -- target path/url for workflow artifacts, the string '{{workflow.uid}}' will be replaced by workflow id
ttl -- pipeline ttl (time to live) in secs (after that the pods will be removed)
- set_artifact(key, artifact: str | dict | Artifact | None = None, target_path: str | None = None, tag: str | None = None)[source]#
add/set an artifact in the project spec (will be registered on load)
example:
# register a simple file artifact project.set_artifact("data", target_path=data_url) # register a model artifact project.set_artifact( "model", ModelArtifact(model_file="model.pkl"), target_path=model_dir_url ) # register a path to artifact package (will be imported on project load) # to generate such package use `artifact.export(target_path)` project.set_artifact("model", "https://mystuff.com/models/mymodel.zip")
- Parameters:
key -- artifact key/name
artifact -- mlrun Artifact object/dict (or its subclasses) or path to artifact file to import (yaml/json/zip), relative paths are relative to the context path
target_path -- absolute target path url (point to the artifact content location)
tag -- artifact tag
- set_default_image(default_image: str)[source]#
Set the default image to be used for running runtimes (functions) in this project. This image will be used if an image was not provided for a runtime. In case the default image is replaced, functions already registered with the project that used the previous default image will have their image replaced on next execution.
- Parameters:
default_image -- Default image to use
- set_function(func: str | BaseRuntime | None = None, name: str = '', kind: str = 'job', image: str | None = None, handler: str | None = None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '') BaseRuntime [source]#
- Update or add a function object to the project.Function can be provided as an object (func) or a .py/.ipynb/.yaml URL.Creating a function from a single file is done by specifying
func
and disablingwith_repo
.Creating a function with project source (specifywith_repo=True
):1. Specify a relativefunc
path.2. Specify a modulehandler
(e.g.handler=package.package.func
) withoutfunc
.Creating a function with non project source is done by specifying a modulehandler
and on the returned function set the source withfunction.with_source_archive(<source>)
.Support URL prefixes:
Object (s3://, v3io://, ..)MLRun DB e.g. db://project/func:verFunctions hub/market: e.g. hub://auto-trainer:masterExamples:
proj.set_function(func_object) proj.set_function("http://.../mynb.ipynb", "train") proj.set_function("./func.yaml") proj.set_function("hub://get_toy_data", "getdata") # Create a function from a single file proj.set_function("./src/mycode.py", "ingest") # Creating a function with project source proj.set_function( "./src/mycode.py", "ingest", image="myrepo/ing:latest", with_repo=True ) proj.set_function("ingest", handler="package.package.func", with_repo=True) # Creating a function with non project source func = proj.set_function( "ingest", handler="package.package.func", with_repo=False ) func.with_source_archive("git://github.com/mlrun/something.git") # Set function requirements # By providing a list of packages proj.set_function("my.py", requirements=["requests", "pandas"]) # By providing a path to a pip requirements file proj.set_function("my.py", requirements="requirements.txt")
- Parameters:
func -- Function object or spec/code url, None refers to current Notebook
name -- Name of the function (under the project), can be specified with a tag to support Versions (e.g. myfunc:v1). If the tag parameter is provided, the tag in the name must match the tag parameter. Specifying a tag in the name will update the project's tagged function (myfunc:v1)
kind -- Runtime kind e.g. job, nuclio, spark, dask, mpijob Default: job
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source - use when the function code is in the project repo (project.spec.source).
tag -- Function version tag to set (none for current or 'latest') Specifying a tag as a parameter will update the project's tagged function (myfunc:v1) and the untagged function (myfunc)
requirements -- A list of python packages
requirements_file -- Path to a python requirements file
- Returns:
- set_model_monitoring_credentials(access_key: str | None = None, endpoint_store_connection: str | None = None, stream_path: str | None = None, tsdb_connection: str | None = None, replace_creds: bool = False)[source]#
Set the credentials that will be used by the project's model monitoring infrastructure functions. Important to note that you have to set the credentials before deploying any model monitoring or serving function.
- Parameters:
access_key -- Model monitoring access key for managing user permissions.
endpoint_store_connection --
Endpoint store connection string. By default, None. Options:
None - will be set from the system configuration.
v3io - for v3io endpoint store, pass v3io and the system will generate the exact path.
MySQL/SQLite - for SQL endpoint store, provide the full connection string, for example: mysql+pymysql://<username>:<password>@<host>:<port>/<db_name>
stream_path --
Path to the model monitoring stream. By default, None. Options:
None - will be set from the system configuration.
v3io - for v3io stream, pass v3io and the system will generate the exact path.
Kafka - for Kafka stream, provide the full connection string without custom topic, for example kafka://<some_kafka_broker>:<port>.
tsdb_connection --
Connection string to the time series database. By default, None. Options:
None - will be set from the system configuration.
v3io - for v3io stream, pass v3io and the system will generate the exact path.
TDEngine - for TDEngine tsdb, provide the full websocket connection URL, for example taosws://<username>:<password>@<host>:<port>.
replace_creds -- If True, will override the existing credentials. Please keep in mind that if you already enabled model monitoring on your project this action can cause data loose and will require redeploying all model monitoring functions & model monitoring infra & tracked model server.
- set_model_monitoring_function(func: str | BaseRuntime | None = None, application_class: str | ModelMonitoringApplicationBase | None = None, name: str | None = None, image: str | None = None, handler=None, with_repo: bool | None = None, tag: str | None = None, requirements: str | list[str] | None = None, requirements_file: str = '', **application_kwargs) BaseRuntime [source]#
Update or add a monitoring function to the project. Note: to deploy the function after linking it to the project, call fn.deploy() where fn is the object returned by this method.
examples:
project.set_model_monitoring_function( name="myApp", application_class="MyApp", image="mlrun/mlrun" )
- Parameters:
func -- Function object or spec/code url, None refers to current Notebook
name -- Name of the function (under the project), can be specified with a tag to support versions (e.g. myfunc:v1) Default: job
image -- Docker image to be used, can also be specified in the function object/yaml
handler -- Default function handler to invoke (can only be set with .py/.ipynb files)
with_repo -- Add (clone) the current repo to the build source
tag -- Function version tag (none for 'latest', can only be set with .py/.ipynb files) if tag is specified and name is empty, the function key (under the project) will be enriched with the tag value. (i.e. 'function-name:tag')
requirements -- A list of python packages
requirements_file -- Path to a python requirements file
application_class -- Name or an Instance of a class that implements the monitoring application.
application_kwargs -- Additional keyword arguments to be passed to the monitoring application's constructor.
- set_remote(url, name='origin', branch=None, overwrite=True)[source]#
Create or update a remote for the project git repository.
This method allows you to manage remote repositories associated with the project. It checks if a remote with the specified name already exists.
If a remote with the same name does not exist, it will be created. If a remote with the same name already exists, the behavior depends on the value of the 'overwrite' flag.
- Parameters:
url -- remote git url
name -- name for the remote (default is 'origin')
branch -- Git branch to use as source
overwrite -- if True (default), updates the existing remote with the given URL if it already exists. if False, raises an error when attempting to create a remote with a name that already exists.
- Raises:
MLRunConflictError -- If a remote with the same name already exists and overwrite is set to False.
- set_secrets(secrets: dict | None = None, file_path: str | None = None, provider: str | SecretProviderName | None = None)[source]#
Set project secrets from dict or secrets env file when using a secrets file it should have lines in the form KEY=VALUE, comment line start with "#" V3IO paths/credentials and MLrun service API address are dropped from the secrets
example secrets file:
# this is an env file AWS_ACCESS_KEY_ID=XXXX AWS_SECRET_ACCESS_KEY=YYYY
usage:
# read env vars from dict or file and set as project secrets project.set_secrets({"SECRET1": "value"}) project.set_secrets(file_path="secrets.env")
- Parameters:
secrets -- dict with secrets key/value
file_path -- path to secrets file
provider -- MLRun secrets provider
- set_source(source: str = '', pull_at_runtime: bool = False, workdir: str | None = None)[source]#
set the project source code path(can be git/tar/zip archive)
- Parameters:
source -- valid absolute path or URL to git, zip, or tar file, (or None for current) e.g. git://github.com/mlrun/something.git http://some/url/file.zip note path source must exist on the image or exist locally when run is local (it is recommended to use 'workdir' when source is a filepath instead)
pull_at_runtime -- load the archive into the container at job runtime vs on build/deploy
workdir -- workdir path relative to the context dir or absolute
- set_workflow(name, workflow_path: str, embed: bool = False, engine: str | None = None, args_schema: list[mlrun.model.EntrypointParam] | None = None, handler: str | None = None, schedule: str | ScheduleCronTrigger | None = None, ttl: int | None = None, image: str | None = None, **args)[source]#
Add or update a workflow, specify a name and the code path
- Parameters:
name -- Name of the workflow
workflow_path -- URL (remote) / Path (absolute or relative to the project code path i.e. <project.spec.get_code_path()>/<workflow_path>) for the workflow file.
embed -- Add the workflow code into the project.yaml
engine -- Workflow processing engine ("kfp", "local", "remote" or "remote:local")
args_schema -- List of arg schema definitions (:py:class`~mlrun.model.EntrypointParam`)
handler -- Workflow function handler
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron Note that "local" engine does not support this argument
ttl -- Pipeline ttl in secs (after that the pods will be removed)
image -- Image for workflow runner job, only for scheduled and remote workflows
args -- Argument values (key=value, ..)
- setup(save: bool = True) MlrunProject [source]#
Run the project setup file if found
When loading a project MLRun will look for a project_setup.py file, if it is found it will execute the setup(project) handler, which can enrich the project with additional objects, functions, artifacts, etc.
- Parameters:
save -- save the project after the setup
- property source: str#
- property spec: ProjectSpec#
- property status: ProjectStatus#
- store_alert_config(alert_data: AlertConfig, alert_name: str | None = None) AlertConfig [source]#
Create/modify an alert.
- Parameters:
alert_data -- The data of the alert.
alert_name -- The name of the alert.
- Returns:
the created/modified alert.
- store_api_gateway(api_gateway: APIGateway, wait_for_readiness=True, max_wait_time=90) APIGateway [source]#
Creates or updates a Nuclio API Gateway using the provided APIGateway object.
This method interacts with the MLRun service to create/update a Nuclio API Gateway based on the provided APIGateway object. Once done, it returns the updated APIGateway object containing all fields propagated on MLRun and Nuclio sides, such as the 'host' attribute. Nuclio docs here: https://docs.nuclio.io/en/latest/reference/api-gateway/http.html
- Parameters:
api_gateway -- An instance of
APIGateway
representing the configuration of the API Gateway to be created or updated.wait_for_readiness -- (Optional) A boolean indicating whether to wait for the API Gateway to become ready after creation or update (default is True).
max_wait_time -- (Optional) Maximum time to wait for API Gateway readiness in seconds (default is 90s)
- Returns:
An instance of
APIGateway
with all fields populated based on the information retrieved from the Nuclio API
- sync_functions(names: list | None = None, always: bool = True, save: bool = False, silent: bool = False)[source]#
Reload function objects from specs and files. The function objects are synced against the definitions spec in self.spec._function_definitions. Referenced files/URLs in the function spec will be reloaded. Function definitions are parsed by the following precedence:
Contains runtime spec.
Contains module in the project's context.
Contains path to function definition (yaml, DB, Hub).
Contains path to .ipynb or .py files.
Contains a Nuclio/Serving function image / an 'Application' kind definition.
If function definition is already an object, some project metadata updates will apply however, it will not be reloaded.
- Parameters:
names -- Names of functions to reload, defaults to self.spec._function_definitions.keys().
always -- Force reloading the functions.
save -- Whether to save the loaded functions or not.
silent -- Whether to raise an exception when a function fails to load.
- Returns:
Dictionary of function objects
- update_model_monitoring_controller(base_period: int = 10, image: str = 'mlrun/mlrun', *, wait_for_deployment: bool = False) None [source]#
Redeploy model monitoring application controller functions.
- Parameters:
base_period -- The time period in minutes in which the model monitoring controller function is triggered. By default, the base period is 10 minutes.
image -- The image of the model monitoring controller, writer & monitoring stream functions, which are real time nuclio functions. By default, the image is mlrun/mlrun.
wait_for_deployment -- If true, return only after the deployment is done on the backend. Otherwise, deploy the controller on the background.
- with_secrets(kind, source, prefix='')[source]#
register a secrets source (file, env or dict)
read secrets from a source provider to be used in workflows, example:
proj.with_secrets("file", "file.txt") proj.with_secrets("inline", {"key": "val"}) proj.with_secrets("env", "ENV1,ENV2", prefix="PFX_")
Vault secret source has several options:
proj.with_secrets('vault', {'user': <user name>, 'secrets': ['secret1', 'secret2' ...]}) proj.with_secrets('vault', {'project': <proj.name>, 'secrets': ['secret1', 'secret2' ...]}) proj.with_secrets('vault', ['secret1', 'secret2' ...])
The 2nd option uses the current project name as context. Can also use empty secret list:
proj.with_secrets("vault", [])
This will enable access to all secrets in vault registered to the current project.
- Parameters:
kind -- secret type (file, inline, env, vault)
source -- secret data or link (see example)
prefix -- add a prefix to the keys in this source
- Returns:
project object
- property workflows: list#
- class mlrun.projects.ProjectMetadata(name=None, created=None, labels=None, annotations=None)[source]#
Bases:
ModelObj
- property name: str#
Project name
- static validate_project_labels(labels: dict, raise_on_failure: bool = True) bool [source]#
This https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
- class mlrun.projects.ProjectSpec(description=None, params=None, functions=None, workflows=None, artifacts=None, artifact_path=None, conda=None, source=None, subpath=None, origin_url=None, goals=None, load_source_on_run=None, default_requirements: str | list[str] | None = None, desired_state='online', owner=None, disable_auto_mount=None, workdir=None, default_image=None, build=None, custom_packagers: list[tuple[str, bool]] | None = None, default_function_node_selector=None)[source]#
Bases:
ModelObj
- add_custom_packager(packager: str, is_mandatory: bool)[source]#
Add a custom packager from the custom packagers list.
- Parameters:
packager -- The packager module path to add. For example, if a packager MyPackager is in the project's source at my_module.py, then the module path is: "my_module.MyPackager".
is_mandatory -- Whether this packager must be collected during a run. If False, failing to collect it won't raise an error during the packagers collection phase.
- property artifacts: list#
list of artifacts used in this project
- property build: ImageBuilder#
- property default_function_node_selector#
- property functions: list#
list of function object/specs used in this project
- property mountdir: str#
specify to mount the context dir inside the function container use '.' to use the same path as in the client e.g. Jupyter
- remove_custom_packager(packager: str)[source]#
Remove a custom packager from the custom packagers list.
- Parameters:
packager -- The packager module path to remove.
- Raises:
MLRunInvalidArgumentError -- In case the packager was not in the list.
- property source: str#
source url or git repo
- property workflows: list[dict]#
list of workflows specs dicts used in this project
- Type:
returns
- mlrun.projects.build_function(function: str | BaseRuntime, with_mlrun: bool | None = None, skip_deployed: bool = False, image=None, base_image=None, commands: list | None = None, secret_name=None, requirements: str | list[str] | None = None, requirements_file: str | None = None, mlrun_version_specifier=None, builder_env: dict | None = None, project_object=None, overwrite_build_params: bool = False, extra_args: str | None = None, force_build: bool = False) BuildStatus | ContainerOp [source]#
deploy ML function, build container with its dependencies
- Parameters:
function -- Name of the function (in the project) or function object
with_mlrun -- Add the current mlrun package to the container build
skip_deployed -- Skip the build if we already have an image for the function
image -- Target image name/path
base_image -- Base image name/path (commands and source code will be added to it)
commands -- List of docker build (RUN) commands e.g. ['pip install pandas']
secret_name -- K8s secret for accessing the docker registry
requirements -- List of python packages, defaults to None
requirements_file -- pip requirements file path, defaults to None
mlrun_version_specifier -- which mlrun package version to include (if not current)
builder_env -- Kaniko builder pod env vars dict (for config/credentials) e.g. builder_env={"GIT_TOKEN": token}, does not work yet in KFP
project_object -- Override the project object to use, will default to the project set in the runtime context.
overwrite_build_params -- Overwrite existing build configuration (currently applies to requirements and commands) * False: The new params are merged with the existing * True: The existing params are replaced by the new ones
extra_args -- A string containing additional builder arguments in the format of command-line options, e.g. extra_args="--skip-tls-verify --build-arg A=val"
force_build -- Force building the image, even when no changes were made
- mlrun.projects.deploy_function(function: str | BaseRuntime, models: list | None = None, env: dict | None = None, tag: str | None = None, verbose: bool | None = None, builder_env: dict | None = None, project_object=None, mock: bool | None = None) DeployStatus | ContainerOp [source]#
deploy real-time (nuclio based) functions
- Parameters:
function -- name of the function (in the project) or function object
models -- list of model items
env -- dict of extra environment variables
tag -- extra version tag
verbose -- add verbose prints/logs
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
mock -- deploy mock server vs a real Nuclio function (for local simulations)
project_object -- override the project object to use, will default to the project set in the runtime context.
- mlrun.projects.get_or_create_project(name: str, context: str = './', url: str | None = None, secrets: dict | None = None, init_git=False, subpath: str | None = None, clone: bool = False, user_project: bool = False, from_template: str | None = None, save: bool = True, parameters: dict | None = None, allow_cross_project: bool | None = None) MlrunProject [source]#
Load a project from MLRun DB, or create/import if it does not exist
MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.
Usage example:
# load project from the DB (if exist) or the source repo project = get_or_create_project( "myproj", "./", "git://github.com/mlrun/demo-xgb-project.git" ) project.pull("development") # pull the latest code from git project.run("main", arguments={"data": data_url}) # run the workflow "main"
project_setup.py example:
def setup(project): train_function = project.set_function( "src/trainer.py", name="mpi-training", kind="mpijob", image="mlrun/mlrun", ) # Set the number of replicas for the training from the project parameter train_function.spec.replicas = project.spec.params.get("num_replicas", 1) return project
- Parameters:
name -- project name
context -- project local directory path (default value = "./")
url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip
secrets -- key:secret dict or SecretsStore used to download sources
init_git -- if True, will execute git init on the context dir
subpath -- project subpath (within the archive/context)
clone -- if True, always clone (delete any existing content)
user_project -- add the current username to the project name (for db:// prefixes)
from_template -- path to project YAML file that will be used as from_template (for new projects)
save -- whether to save the created project in the DB
parameters -- key/value pairs to add to the project.spec.params
allow_cross_project -- if True, override the loaded project name. This flag ensures awareness of loading an existing project yaml as a baseline for a new project with a different name
- Returns:
project object
- mlrun.projects.load_project(context: str = './', url: str | None = None, name: str | None = None, secrets: dict | None = None, init_git: bool = False, subpath: str | None = None, clone: bool = False, user_project: bool = False, save: bool = True, sync_functions: bool = False, parameters: dict | None = None, allow_cross_project: bool | None = None) MlrunProject [source]#
Load an MLRun project from git or tar or dir
MLRun looks for a project.yaml file with project definition and objects in the project root path and use it to initialize the project, in addition it runs the project_setup.py file (if it exists) for further customization.
Usage example:
# Load the project and run the 'main' workflow. # When using git as the url source the context directory must be an empty or # non-existent folder as the git repo will be cloned there project = load_project("./demo_proj", "git://github.com/mlrun/project-demo.git") project.run("main", arguments={"data": data_url})
project_setup.py example:
def setup(project): train_function = project.set_function( "src/trainer.py", name="mpi-training", kind="mpijob", image="mlrun/mlrun", ) # Set the number of replicas for the training from the project parameter train_function.spec.replicas = project.spec.params.get("num_replicas", 1) return project
- Parameters:
context -- project local directory path (default value = "./")
url -- name (in DB) or git or tar.gz or .zip sources archive path e.g.: git://github.com/mlrun/demo-xgb-project.git http://mysite/archived-project.zip <project-name> The git project should include the project yaml file. If the project yaml file is in a sub-directory, must specify the sub-directory.
name -- project name
secrets -- key:secret dict or SecretsStore used to download sources
init_git -- if True, will git init the context dir
subpath -- project subpath (within the archive)
clone -- if True, always clone (delete any existing content)
user_project -- add the current username to the project name (for db:// prefixes)
save -- whether to save the created project and artifact in the DB
sync_functions -- sync the project's functions into the project object (will be saved to the DB if save=True)
parameters -- key/value pairs to add to the project.spec.params
allow_cross_project -- if True, override the loaded project name. This flag ensures awareness of loading an existing project yaml as a baseline for a new project with a different name
- Returns:
project object
- mlrun.projects.new_project(name, context: str = './', init_git: bool = False, user_project: bool = False, remote: str | None = None, from_template: str | None = None, secrets: dict | None = None, description: str | None = None, subpath: str | None = None, save: bool = True, overwrite: bool = False, parameters: dict | None = None, default_function_node_selector: dict | None = None) MlrunProject [source]#
Create a new MLRun project, optionally load it from a yaml/zip/git template
A new project is created and returned, you can customize the project by placing a project_setup.py file in the project root dir, it will be executed upon project creation or loading.
example:
# create a project with local and hub functions, a workflow, and an artifact project = mlrun.new_project( "myproj", "./", init_git=True, description="my new project" ) project.set_function( "prep_data.py", "prep-data", image="mlrun/mlrun", handler="prep_data" ) project.set_function("hub://auto-trainer", "train") project.set_artifact("data", Artifact(target_path=data_url)) project.set_workflow("main", "./myflow.py") project.save() # run the "main" workflow (watch=True to wait for run completion) project.run("main", watch=True)
example (load from template):
# create a new project from a zip template (can also use yaml/git templates) # initialize a local git, and register the git remote path project = mlrun.new_project( "myproj", "./", init_git=True, remote="git://github.com/mlrun/project-demo.git", from_template="http://mysite/proj.zip", ) project.run("main", watch=True)
example using project_setup.py to init the project objects:
def setup(project): project.set_function( "prep_data.py", "prep-data", image="mlrun/mlrun", handler="prep_data" ) project.set_function("hub://auto-trainer", "train") project.set_artifact("data", Artifact(target_path=data_url)) project.set_workflow("main", "./myflow.py") return project
- Parameters:
name -- project name
context -- project local directory path (default value = "./")
init_git -- if True, will git init the context dir
user_project -- add the current username to the provided project name (making it unique per user)
remote -- remote Git url
from_template -- path to project YAML/zip file that will be used as a template
secrets -- key:secret dict or SecretsStore used to download sources
description -- text describing the project
subpath -- project subpath (relative to the context dir)
save -- whether to save the created project in the DB
overwrite -- overwrite project using 'cascade' deletion strategy (deletes project resources) if project with name exists
parameters -- key/value pairs to add to the project.spec.params
default_function_node_selector -- defines the default node selector for scheduling functions within the project
- Returns:
project object
- mlrun.projects.run_function(function: str | BaseRuntime, handler: str | None = None, name: str = '', params: dict | None = None, hyperparams: dict | None = None, hyper_param_options: HyperParamOptions | None = None, inputs: dict | None = None, outputs: list[str] | None = None, workdir: str = '', labels: dict | None = None, base_task: RunTemplate | None = None, watch: bool = True, local: bool | None = None, verbose: bool | None = None, selector: str | None = None, project_object=None, auto_build: bool | None = None, schedule: str | ScheduleCronTrigger | None = None, artifact_path: str | None = None, notifications: list[mlrun.model.Notification] | None = None, returns: list[Union[str, dict[str, str]]] | None = None, builder_env: list | None = None, reset_on_run: bool | None = None) RunObject | ContainerOp [source]#
Run a local or remote task as part of a local/kubeflow pipeline
run_function() allow you to execute a function locally, on a remote cluster, or as part of an automated workflow function can be specified as an object or by name (str), when the function is specified by name it is looked up in the current project eliminating the need to redefine/edit functions.
when functions run as part of a workflow/pipeline (project.run()) some attributes can be set at the run level, e.g. local=True will run all the functions locally, setting artifact_path will direct all outputs to the same path. project runs provide additional notifications/reporting and exception handling. inside a Kubeflow pipeline (KFP) run_function() generates KFP node (see PipelineNodeWrapper) which forms a DAG some behavior may differ between regular runs and deferred KFP runs.
example (use with function object):
LABELS = "is_error" MODEL_CLASS = "sklearn.ensemble.RandomForestClassifier" DATA_PATH = "s3://bigdata/data.parquet" function = mlrun.import_function("hub://auto-trainer") run1 = run_function( function, params={"label_columns": LABELS, "model_class": MODEL_CLASS}, inputs={"dataset": DATA_PATH}, )
example (use with project):
# create a project with two functions (local and from hub) project = mlrun.new_project(project_name, "./proj) project.set_function("mycode.py", "myfunc", image="mlrun/mlrun") project.set_function("hub://auto-trainer", "train") # run functions (refer to them by name) run1 = run_function("myfunc", params={"x": 7}) run2 = run_function("train", params={"label_columns": LABELS, "model_class": MODEL_CLASS}, inputs={"dataset": run1.outputs["data"]})
example (use in pipeline):
@dsl.pipeline(name="test pipeline", description="test") def my_pipe(url=""): run1 = run_function("loaddata", params={"url": url}, outputs=["data"]) run2 = run_function( "train", params={"label_columns": LABELS, "model_class": MODEL_CLASS}, inputs={"dataset": run1.outputs["data"]}, ) project.run(workflow_handler=my_pipe, arguments={"param1": 7})
- Parameters:
function -- name of the function (in the project) or function object
handler -- name of the function handler
name -- execution name
params -- input parameters (dict)
hyperparams -- hyper parameters
selector -- selection criteria for hyper params e.g. "max.accuracy"
hyper_param_options -- hyper param options (selector, early stop, strategy, ..) see:
HyperParamOptions
inputs -- Input objects to pass to the handler. Type hints can be given so the input will be parsed during runtime from mlrun.DataItem to the given type hint. The type hint can be given in the key field of the dictionary after a colon, e.g: "<key> : <type_hint>".
outputs -- list of outputs which can pass in the workflow
workdir -- default input artifacts path
labels -- labels to tag the job/run with ({key:val, ..})
base_task -- task object to use as base
watch -- watch/follow run log, True by default
local -- run the function locally vs on the runtime/cluster
verbose -- add verbose prints/logs
project_object -- override the project object to use, will default to the project set in the runtime context.
auto_build -- when set to True and the function require build it will be built on the first function run, use only if you do not plan on changing the build config between runs
schedule -- ScheduleCronTrigger class instance or a standard crontab expression string (which will be converted to the class using its from_crontab constructor), see this link for help: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/cron.html#module-apscheduler.triggers.cron
artifact_path -- path to store artifacts, when running in a workflow this will be set automatically
notifications -- list of notifications to push when the run is completed
returns --
List of log hints - configurations for how to log the returning values from the handler's run (as artifacts or results). The list's length must be equal to the amount of returning objects. A log hint may be given as:
A string of the key to use to log the returning value as result or as an artifact. To specify The artifact type, it is possible to pass a string in the following structure: "<key> : <type>". Available artifact types can be seen in mlrun.ArtifactType. If no artifact type is specified, the object's default artifact type will be used.
A dictionary of configurations to use when logging. Further info per object type and artifact type can be given there. The artifact key must appear in the dictionary as "key": "the_key".
builder_env -- env vars dict for source archive config/credentials e.g. builder_env={"GIT_TOKEN": token}
reset_on_run -- When True, function python modules would reload prior to code execution. This ensures latest code changes are executed. This argument must be used in conjunction with the local=True argument.
- Returns:
MLRun RunObject or PipelineNodeWrapper