Attach storage to functions

Attach storage to functions#

In the vast majority of cases, an MLRun function requires access to storage. This storage might be used to provide inputs to the function including data-sets to process or data-streams that contain input events. Typically, storage is used to store function outputs and result artifacts. For example, trained models or processed data-sets.

Since MLRun functions can be distributed and executed in Kubernetes pods, the storage used would typically be shared, and execution pods would need some added configuration options applied to them so that the function code is able to access the designated storage. These configurations might be k8s volume mounts, specific environment variables that contain configuration and credentials, and other configuration of security settings. These storage configurations are not applicable to functions running locally in the development environment, since they are executed in the local context.

The common types of shared storage are:

v3io storage through API — When running as part of the Iguazio system, MLRun has access to the system's v3io storage through paths such as v3io:///projects/my_projects/file.csv. To enable this type of access, several environment variables need to be configured in the pod that provide the v3io API URL and access keys.
v3io storage through FUSE mount — Some tools cannot utilize the v3io API to access it and need basic filesystem semantics. For that purpose, v3io provides a FUSE (Filesystem in user-space) driver that can be used to mount v3io containers as specific paths in the pod itself. For example /User. To enable this, several specific volume mount configurations need to be applied to the pod spec.
NFS storage access — When MLRun is deployed as open-source, independent of Iguazio, the deployment automatically adds a pod running NFS storage. To access this NFS storage through pods, a kubernetes pvc mount is needed.
S3-compatible object storage — When MLRun is deployed on Kubernetes without Iguazio (for example on IG4 systems using MinIO), S3 credentials need to be injected into pods as environment variables so that functions can access object storage. MLRun supports this via the s3 auto-mount type, which reads credentials from a Kubernetes secret and sets the standard AWS environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT_URL_S3) on each pod.
Others — As use-cases evolve, other cases of storage access may be needed. This will require various configurations to be applied to function execution pods.

MLRun attempts to offload this storage configuration task from the user by automatically applying the most common storage configuration to functions. As a result, most cases do not require any additional storage configurations before executing a function as a Kubernetes pod. The configurations applied by MLRun are:

In an Iguazio system, apply configurations for v3io access through the API.
In an open-source deployment where NFS is configured, apply configurations for pvc access to NFS storage.

This MLRun logic is referred to as auto-mount.

In this section

Disabling auto-mount
Modifying the auto-mount default configuration

Disabling auto-mount#

In cases where the default storage configuration does not fit the function needs, MLRun allows for function spec modifiers to be manually applied to functions. These modifiers can add various configurations to the function spec, adding environment variables, mounts and additional configurations. MLRun also provides a set of common modifiers that can be used to apply storage configurations. These modifiers can be applied by using the .apply() method on the function and adding the modifier to apply. You can see some examples of this later in this page.

When a different storage configuration is manually applied to a function, MLRun's auto-mount logic is disabled. This prevents conflicts between configurations. The auto-mount logic can also be disabled by setting func.spec.disable_auto_mount = True on any MLRun function.

Modifying the auto-mount default configuration#

The default auto-mount behavior applied by MLRun is controlled by setting MLRun configuration parameters. For example, the logic can be set to automatically mount the v3io FUSE driver on all functions, or perform pvc mount for NFS storage on all functions. The following code demonstrates how to apply the v3io FUSE driver by default:

# Change MLRun auto-mount configuration
import mlrun.mlconf

mlrun.mlconf.storage.auto_mount_type = "v3io_fuse"

Each of the auto-mount supported methods applies a specific modifier function. The supported methods are:

v3io_credentials — apply v3io credentials needed for v3io API usage. Applies the v3io_cred() modifier.
v3io_fuse — create Fuse driver mount. Applies the mount_v3io() modifier.
pvc — create a pvc mount. Applies the mount_pvc() modifier.
s3 — inject S3 credentials as environment variables from a Kubernetes secret. Applies the mount_s3() modifier. Use this for S3-compatible object storage (AWS S3, MinIO) on Kubernetes deployments without Iguazio v3io. Requires secret_name in auto_mount_params pointing to a secret that contains AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY keys (and optionally AWS_ENDPOINT_URL_S3 for non-AWS endpoints).
secret_env — inject Kubernetes secret keys as environment variables into pods, and optionally also inject plain (non-secret) key=value environment variables in the same call. Applies the set_env_vars_from_secret() modifier. Requires secret_name in auto_mount_params. Optionally accepts keys (semicolon-separated list of secret keys to expose; omit to mount all keys in the secret via envFrom) and cleartext_env (plain key=value pairs to inject alongside the secret-backed vars — see secret_env example below).
auto — the default auto-mount logic as described above (either v3io_credentials or pvc).
none — perform no auto-mount (same as using disable_auto_mount = True).

The modifier functions executed by auto-mount can be further configured by specifying their parameters. These can be provided in the storage.auto_mount_params configuration parameters. Parameters can be passed as a string made of key=value pairs separated by commas. For example, the following code runs a pvc mount with specific parameters:

mlrun.mlconf.storage.auto_mount_type = "pvc"
pvc_params = {
    "pvc_name": "my_pvc_mount",
    "volume_name": "pvc_volume",
    "volume_mount_path": "/mnt/storage/nfs",
}
mlrun.mlconf.storage.auto_mount_params = ",".join(
    [f"{key}={value}" for key, value in pvc_params.items()]
)

Alternatively, the parameters can be provided as a base64-encoded JSON object, which can be useful when passing complex parameters or strings that contain special characters:

pvc_params_str = base64.b64encode(json.dumps(pvc_params).encode())
mlrun.mlconf.storage.auto_mount_params = pvc_params_str

The following code demonstrates how to configure S3-compatible object storage (for example MinIO on an IG4 system):

import mlrun.mlconf

mlrun.mlconf.storage.auto_mount_type = "s3"
mlrun.mlconf.storage.auto_mount_params = "secret_name=minio-credentials,endpoint_url=http://minio.iguazio.svc:9000"

secret_env example#

The secret_env auto-mount type is useful when pods need a mix of sensitive credentials (stored in a Kubernetes Secret) and non-sensitive configuration values. A common example is Azure Blob Storage, where the client credentials must be kept secret but the storage account name is not sensitive:

import base64
import json
import mlrun.mlconf

mlrun.mlconf.storage.auto_mount_type = "secret_env"

# Recommended: base64-encoded JSON — supports colons and special characters in values
params = {
    "secret_name": "azure-credentials",
    "keys": ["AZURE_CLIENT_ID", "AZURE_CLIENT_SECRET", "AZURE_TENANT_ID"],
    "cleartext_env": {
        "AZURE_STORAGE_ACCOUNT": "mystorageaccount",
    },
}
mlrun.mlconf.storage.auto_mount_params = base64.b64encode(
    json.dumps(params).encode()
).decode()

This mounts AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID from the azure-credentials secret as pod environment variables, and also sets AZURE_STORAGE_ACCOUNT=mystorageaccount as a plain environment variable — all in a single auto-mount call.

The keys parameter is optional. If omitted, all keys in the secret are mounted as environment variables (via Kubernetes envFrom):

params = {
    "secret_name": "azure-credentials",
    "cleartext_env": {
        "AZURE_STORAGE_ACCOUNT": "mystorageaccount",
    },
}

The cleartext_env parameter can also be provided as a semicolon-separated key:value string in the plain key=value params format. Note that colons in values are not supported in string form; use the base64-JSON format if values may contain colons:

mlrun.mlconf.storage.auto_mount_type = "secret_env"
mlrun.mlconf.storage.auto_mount_params = (
    "secret_name=azure-credentials,"
    "keys=AZURE_CLIENT_ID;AZURE_CLIENT_SECRET;AZURE_TENANT_ID,"
    "cleartext_env=AZURE_STORAGE_ACCOUNT:mystorageaccount"
)

The modifier can also be applied directly to an individual function without using auto-mount:

import mlrun.runtimes.mounts

function.apply(
    mlrun.runtimes.mounts.set_env_vars_from_secret(
        secret_name="azure-credentials",
        keys=["AZURE_CLIENT_ID", "AZURE_CLIENT_SECRET", "AZURE_TENANT_ID"],
        cleartext_env={"AZURE_STORAGE_ACCOUNT": "mystorageaccount"},
    )
)

Attach storage to functions

Contents

Attach storage to functions#

Disabling auto-mount#

Modifying the auto-mount default configuration#

secret_env example#