Application runtime#
You can use the ApplicationRuntime() to provide an image that runs on top of your deployed model.
The application runtime deploys a container image (for example, a web application) that is exposed on a specific port, and a command to run the HTTP server. The runtime is based on top of Nuclio, and adds the application as a sidecar to a Nuclio function pod while the actual function is a reverse proxy to that application.
You can set an existing image to run in the application, or let the application runtime build the sidecar image for you, by specifying the source code - either baked into the image at build time, or pulled dynamically at runtime.
An API Gateway, by default, is in front of the application and can provide different authentication methods, or none.
Typical use cases are:
Deploy a Vizro dashboard that communicates with an external source (for example, a serving model) to display graphs, data, and inference.
Deploy a model and a UI — the model serving is the backend and the UI is the side car.
Deploy a fastapi web-server with an MLRun model. In this case, the Nuclio function is a reverse proxy and the user web-app is the side car.
Note
When using an application runtime with min-replicas set to 0 (Scale-To-Zero enabled) and direct port access enabled, be aware of a known limitation in metric collection. Scale-to-Zero currently relies on metrics collected only from the main Nuclio container that runs the reverse proxy. Traffic sent directly to the application runtime sidecar bypasses these metrics, so such requests are not counted. As a result, the function may scale down to zero even while it is actively serving traffic through the direct port. In addition, once scaled to zero, the function can only be scaled back up by invoking the main Nuclio port, not the direct port. To avoid unexpected behavior, do not rely on direct port traffic alone when using Scale-To-Zero.
In this section
Usage examples#
Deploy an application from a single Python file#
This flow supports local development workflows, for example a user working in a Jupyter notebook with a single Python handler file or a source that is already saved in the artifact store (for example, S3). It runs a single code file, without having to put it in a shared folder or packaging it in an archive.
Under the hood, an MLRun init container prepares the application source code on a shared volume within the pod. It pulls and extracts the code at runtime, without requiring an image rebuild for code change. It runs once per pod startup and blocks the sidecar execution until completion. The sidecar then mounts the shared volume and runs the application directly.
By default, the source is extracted to the local path in the pod ( /home/mlrun_code). You can specify a custom location with set_source_target() if, for example, your sidecar image expects code at a specific path (or the image has environment variables or configs pointing to a hardcoded path), etc.
By default, MLRun maintains a single source artifact per function, overwritten on each deployment with the latest source content. The artifact is a system-generated internal mechanism used to deliver the local file to the init container at runtime.
If you want to manage artifact versions manually (for example, to preserve code lineage or pin a specific version), set spec.build.source to an explicit artifact URI:
app.spec.build.source = "store://artifacts/<project>/<artifact-key>:<tag>"
When using an explicit artifact URI, the automatic upload is skipped and it is the user's responsibility to manage artifact versions and redeploy or restart the function to pick up changes.
Note
If, after the first run, you change requirements, commands, or base image, you must pass force_build=True.
# Create an application from a single local Python file
# In this example, my_server.py is a Flask app that listens on port 8050.
application = project.set_function(
func="./my_server.py",
name="my-app",
kind="application",
requirements=["flask"],
)
application.set_internal_application_port(8050)
application.spec.command = "python"
application.spec.args = ["my-app-source.py"]
# The local file is automatically uploaded as an artifact and loaded by an init container at pod startup
application.deploy(with_mlrun=False)
application.invoke("/")
Pull at runtime from Git and source archives#
With pull-at-runtime, the application source is either in a Git repository or a source archive. This approach is intended primarily for development and iterative workflows, allowing fast redeployments when only the source code changes, without rebuilding the sidecar image. Be aware that if the source changes between pod startups, different pods may run different code versions.
The source code is specified by one of:
A remote archive URL (.zip or .tar.gz)
A Git URL (starts with git://)
If your source code is constant, set pull_at_runtime=False when you deploy the function: MLRun builds the sidecar image with the source baked in, so no init container is needed.
If you want MLRun to fetch the code dynamically at pod startup (without rebuilding the image), set pull_at_runtime=True when you deploy the function.
By default, the source is extracted to the local path in the pod ( /home/mlrun_code). You can specify a custom location with mlrun.runtimes.ApplicationRuntime.set_source_target() if, for example, your sidecar image expects code at a specific path (or the image has environment variables or configs pointing to a hardcoded path), etc.
Note
If, after the first run, you change requirements, commands, or base image, you must pass force_build=True.
Deploy from a Git repository
# Deploy an application from a Git repository.
# The repo is cloned by an init container at pod startup — no image rebuild on code changes.
application = project.set_function(
name="my-git-app",
kind="application",
image="python:3.11",
)
application.set_internal_application_port(8050)
application.spec.command = "python"
application.spec.args = ["-m", "http.server", "8050"]
application.with_source_archive(
source="git://github.com/org/repo.git#main",
pull_at_runtime=True,
)
application.deploy(with_mlrun=False)
application.invoke("/")
Deploy from a remote archive
# Deploy an application from a remote archive (.zip or .tar.gz).
# The archive is extracted by an init container at pod startup — no image rebuild on code changes.
application = project.set_function(
name="my-archive-app",
kind="application",
image="python:3.11",
)
application.set_internal_application_port(8050)
application.spec.command = "python"
application.spec.args = ["-m", "http.server", "8050"]
application.with_source_archive(
source="s3://my-bucket/my-source.tar.gz",
pull_at_runtime=True,
)
application.deploy(with_mlrun=False)
application.invoke("/")
Deploy a Vizro dashboard from a pre-built image#
# Create an application runtime (with pre-built image)
application = project.set_function(
name="my-vizro-dashboard", kind="application", image="repo/my-vizro-image:latest"
)
# Set the port that the side-car listens on
application.set_internal_application_port(port=8050)
# Deploy
application.deploy()
Deploy a Vizro dashboard from a source archive or git#
# Specify the source to be loaded at build-time or run-time
application = project.set_function(
name="my-vizro-dashboard", kind="application", requirements=["vizro"]
)
application.set_internal_application_port(8050)
application.spec.command = "gunicorn"
application.spec.args = [
"<my-app>:<my-server>",
"--bind",
"0.0.0.0:8050",
]
# Provide code artifacts
application.with_source_archive(
"git://github.com/org/repo#my-branch", pull_at_runtime=False
)
# Build the application image via MLRun and deploy the Nuclio function
# Optionally add mlrun
application.deploy(with_mlrun=False)
Reusing an already built reverse proxy image is done when:
Redeploying a function that built the reverse proxy once and has
application.status.container_imageenriched.It was already built manually with
mlrun.runtimes.ApplicationRuntime.deploy_reverse_proxy_image().
Using one of the above options can help minimize redundant builds and streamline the development cycle.
Authentication modes#
An application runtime can be accessed through an API Gateway that supports various authentication methods.
The default authentication mode is none for open source and access-key for the Iguazio platform.
The different authentication modes can be configured as follows (see APIGateway() for further information):
from mlrun.common.schemas.api_gateway import APIGatewayAuthenticationMode
# Unless disabled, the default API gateway is created when the application is deployed
application.deploy(create_default_api_gateway=False)
# Create API gateway without authentication
application.create_api_gateway(
authentication_mode=APIGatewayAuthenticationMode.none,
)
# Basic authentication mode.
# This means that the application can be invoked only using the provided credentials
application.create_api_gateway(
authentication_mode=APIGatewayAuthenticationMode.basic,
authentication_creds=("my-username", "my-password"),
)
# Access-key authentication mode. the application can be invoked only with a valid session
application.create_api_gateway(
authentication_mode=APIGatewayAuthenticationMode.access_key,
)
API gateway configurations#
If you want to deploy an application with the default API Gateway, simply run app.deploy().
If you don’t need the default API Gateway, or if you prefer to create your own custom one, set create_default_api_gateway=False when calling deploy() and then manually create a custom API Gateway.
If you specify a custom port, you must set direct_port_access=True; otherwise, the value is ignored and the internal application port is used instead.
Note
Setting direct_port_access=True bypassses the reverse proxy and exposes the pod sidecar directly. Additionally, the Nuclio configuration isn't used (scale to zero, etc.).
There are additional parameters that can be configured when creating an API gateway:
application.create_api_gateway(
# The name of the API gateway
name="my-api-gateway",
# Optional path of the API gateway, default value is "/". The given path should be supported by the deployed application
path="/",
# Set to True to allow direct port access to the application sidecar
direct_port_access=False,
# Set to True to force SSL redirect, False to disable. Defaults to mlrun.mlconf.force_api_gateway_ssl_redirect()
ssl_redirect=True,
# Set the API gateway as the default for the application (`status.api_gateway`)
set_as_default=False,
)
Expose multiple ports#
Note
Requires Nuclio 1.14.14 and higher.
By default, MLRun creates an API gateway for each application that forwards the events to the internal port, in this example 8010. Some applications use different ports, for example, for the application API and the dashboard. This example uses the default port 8010 for the application and adds port 8020 for other uses.
# Enable direct_port_access
direct_port_access=True
# Set multiple ports
app.with_sidecar(ports=[8010,8020])
# Set the internal (main) port
app.set_internal_application_port(8010)
# Add API gateway for second port
url_direct = app.create_api_gateway(port=8020,name="port-8020",direct_port_access=True,authentication_mode=mlrun.common.schemas.api_gateway.APIGatewayAuthenticationMode.none)
Application runtime in a dark environment#
To use application runtime in a dark (air-gapped) environment, you need to build the reverse proxy image and push it to a private registry, following the steps below:
# 1. Create the reverse proxy image in a non air-gapped system
import mlrun.runtimes
mlrun.runtimes.ApplicationRuntime.deploy_reverse_proxy_image()
# 2. The created image name is saved on the ApplicationRuntime class:
mlrun.runtimes.ApplicationRuntime.reverse_proxy_image
# 3. Push the created image to the system’s docker registry
# 4. On the air-gapped environment, set the image on the class property:
mlrun.runtimes.ApplicationRuntime.reverse_proxy_image = (
"registry/reverse-proxy-image:<tag>"
)
# 5. When creating application functions, this image will be used as the reverse proxy image, and it won’t be built again.
Application and serving function integration#
This example demonstrates deploying a serving function and using it with the application. Serving creation: First, create the model file and save it as a .py file.
%%writefile add_ten_model.py
from mlrun.serving import V2ModelServer
class AddTenModel(V2ModelServer):
def load(self):
# No actual model to load, just a demo
pass
def predict(self, request):
input = request['inputs'][0]
result = input + 10
return {"outputs": [result]}
Now, deploy this model as a serving function:
import mlrun
project_name = "app-demo-flask"
model_name = "add-ten-model"
model_path = "add_ten_model.py"
project = mlrun.get_or_create_project(project_name)
# Create the serving function
# Create the serving function
function = project.set_function(
name="add-ten-serving",
kind="serving",
func=model_path,
)
project.save()
# Add the model to the function (even though there's no real model in this case)
function.add_model(model_name, model_path=model_path, class_name="AddTenModel")
function.deploy()
# An example of invoke:
function.invoke(f"/v2/models/{model_name}/infer", {"inputs": [20]})["outputs"][
"outputs"
]
Application creation: First, create the flask server application. The application includes several different endpoints - to check its functionality:
%%writefile flask_app_example.py
from flask import Flask
import requests
import mlrun
app = Flask(__name__)
@app.route("/internal")
def internal():
# Test access to the serving function with MLRun.
project_name = "app-demo-flask"
project = mlrun.get_or_create_project(project_name)
function = project.get_function("add-ten-serving",ignore_cache=True)
response = function.invoke("/v2/models/add-ten-model/infer", {"inputs": [20]})
output = response["outputs"]["outputs"]
return {"result": output}
@app.route("/external")
def external():
# Test access to the serving function without MLRun (externally).
project_name = "app-demo-flask"
project = mlrun.get_or_create_project(project_name)
function = project.get_function("add-ten-serving",ignore_cache=True)
url = f"https://{function.status.external_invocation_urls[0]}"
response = requests.post(url, json={"inputs": [50]}).json()
output = response["outputs"]["outputs"]
return {"result": output}
Archive the application code into a .tar.gz file:
!tar -czvf archive.tar.gz flask_app_example.py
Now, deploy this flask application:
import mlrun
project = mlrun.get_or_create_project("app-demo-flask")
# Create a demo secret for testing
project.set_secrets(secrets={"secret-example": "project_secret_example"})
# Specify source to be loaded on build time
# The image or the requirements should include mlrun, flask and gunicorn.
application = project.set_function(
name="flask_app",
kind="application",
requirements_file="/your/path/requirements.txt", # if needed
)
# Provide code artifacts
application.with_source_archive(
"v3io:///your/path/archive.tar.gz", pull_at_runtime=False
)
application.set_internal_application_port(5000)
application.spec.command = "gunicorn"
application.spec.args = [
"flask_app_example:app",
"--bind",
"127.0.0.1:5000",
"--log-level",
"debug",
]
application.deploy(with_mlrun=True)
# Test the deployment:
application.invoke("/internal").json()
application.invoke("/external").json()
Configure sidecar Kubernetes probes#
MLRun supports the liveness, readiness, and startup probes for the sidecar container as described in [Kubernetes Configure Probes] documentation (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes). You need to understand your app so that you can identify and configure the specific Kubernetes features required for your use case.
Configure Kubernetes for the sidecar container using the set_probe() and delete_probe() methods.
Set probes#
To configure probes, use the set_probe() method.
Basic HTTP probe example:
application.set_probe(
type="liveness",
http_path="/health",
http_port=8080,
http_scheme="HTTPS",
initial_delay_seconds=15,
period_seconds=10,
failure_threshold=3,
timeout_seconds=5,
)
For probe types not covered by the simple parameters (e.g., TCP socket, exec, or gRPC probes), you can use the config parameter to provide a full Kubernetes probe configuration:
application.set_probe(
type="liveness",
initial_delay_seconds=15,
period_seconds=10,
config={
"tcpSocket": {"port": 8080},
"terminationGracePeriodSeconds": 90,
},
)
Usage#
Parameter precedence: When both config and explicit parameters are provided, the explicit parameters override values in the config. Example:
application.set_probe(
type="startup",
initial_delay_seconds=15, # This will be used (not 20 from config)
config={
"tcpSocket": {"port": 8080},
"initialDelaySeconds": 20, # Overridden by parameter above
"periodSeconds": 30,
},
)
Port enrichment: If you provide http_path without http_port, the port is automatically enriched from the internal_application_port just before deployment. Example:
application.set_internal_application_port(8050)
application.set_probe(
type="readiness",
http_path="/health",
# Port will be enriched to 8050 before deployment
)
Set override: Calling set_probe repeatedly with the same probe type replaces the existing configuration.
application.set_probe(
type="liveness",
initial_delay_seconds=15,
http_path="/health",
)
application.set_probe(type="liveness", http_path="/health", period_seconds=10)
# The liveness probe configuration will now only have period_seconds=10
Health checks: Each probe must define exactly one health check handler (httpGet, tcpSocket, grpc or exec) as required by Kubernetes; this rule is enforced defensively to ensure configuration compliance.
Delete probes#
To remove a probe configuration, use the delete_probe() method:
application.delete_probe(type="readiness")