Build function image#
As discussed in Images and their usage in MLRun, MLRun provides pre-built images which contain the components necessary to execute an MLRun runtime. In some cases, however, custom images need to be created. This page details this process and the available options.
When is a build required?#
In many cases an MLRun runtime can be executed without having to build an image. This will be true when the basic MLRun images fulfill all the requirements for the code to execute. It is required to build an image if one of the following is true:
The code uses additional Python packages, OS packages, scripts or other configurations that need to be applied
The code uses different base-images or different versions of MLRun images than provided by default
The code runs nuclio functions, which are packaged as images (the build is triggered by MLRun and executed by nuclio)
The build process in MLRun is based on Kaniko and automated by MLRun - MLRun generates the dockerfile for the build process, and configures Kaniko with parameters needed for the build.
Building images is done through functions provided by the
MlrunProject class. By using
project functions, the same process is used to build and deploy a stand-alone function or functions serving as steps
in a pipeline.
Automatically building images#
MLRun has the capability to auto-detect when a function image needs to first be built. Following is an example that will require building of the image:
project = mlrun.new_project(project_name, "./proj")
# auto_build will trigger building the image before running,
# due to the additional requirements.
auto_build option is only suitable when the build configuration does not change between runs of the
runtime. For example, if during the development process new requirements were added, the
auto_build parameter should
not be used, and manual build is needed to re-trigger a build of the image.
In the example above, the
requirements parameter was used to specify a list of additional Python packages required by
the code. This option directly affects the image build process - each requirement is installed using
part of the docker-build process. The
requirements parameter can also contain a path to a requirements file, making
it easier to reuse an existing configuration rather than specify a list of packages.
Manually building an image#
To manually build an image, use the
build_function() function, which provides multiple
options that control and configure the build process.
Specifying base image#
To use an existing image as the base image for building the image, set the image name in the
Note that this image serves as the base (dockerfile
FROM property), and should not to be confused with the
resulting image name, as specified in the
To run arbitrary commands during the image build, pass them in the
commands parameter of
build_function(). For example:
github_repo = "myusername/myrepo.git@mybranch"
"pip install git+https://github.com/" + github_repo,
"mkdir -p /some/path && chmod 0777 /some/path",
These commands are added as
RUN operations to the dockerfile generating the image.
MLRun package deployment#
mlrun_version_specifier parameters allow control over the inclusion of the MLRun package in the
build process. Depending on the base-image used for the build, the MLRun package may already be available in which
with_mlrun=False. If not specified, MLRun will attempt to detect this situation - if the image used is one
of the default MLRun images released with MLRun,
with_mlrun is automatically set to
If the code execution requires a different version of MLRun than the one used to deploy the function,
mlrun_version_specifier to point at the specific version needed. This uses the published MLRun images
of the specified version instead.
Working with code repository#
As the code matures and evolves, the code will usually be stored in a git code repository.
When the MLRun project is associated with a git repo (see Create, save, and use projects for details), functions can be added
set_function() and setting
with_repo=True. This indicates that the
code of the function should be retrieved from the project code repository.
In this case, the entire code repository will be retrieved from git as part of the image-building process, and cloned into the built image. This is recommended when the function relies on code spread across multiple files and also is usually preferred for production code, since it means that the code of the function is stable, and further modifications to the code will not cause instability in deployed images.
During the development phase it may be desired to retrieve the code in runtime, rather than re-build the function
image every time the code changes. To enable this, use
gets a path to the source (can be a git repository or a tar or zip file) and set
Using a private Docker registry#
By default, images are pushed to the registry configured during MLRun deployment, using the configured registry credentials.
To push resulting images to a different registry, specify the registry URL in the
image parameter. If
the registry requires credentials, create a k8s secret containing these credentials, and pass its name in the
When using ECR as registry, MLRun uses Kaniko's ECR credentials helper, in which case the secret provided should contain AWS credentials needed to create ECR repositories, as described here. MLRun detects automatically that the registry is an ECR registry based on its URL and configures Kaniko to use the ECR helper. For example:
# AWS credentials stored in a k8s secret -
# kubectl create secret generic ecr-credentials --from-file=<path to .aws/credentials>
When using an ECR registry and not providing a secret name, MLRun assumes that an EC2 instance role is used to authorize access to ECR. In this case MLRun clears out AWS credentials provided by project-secrets or environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) from the Kaniko pod used for building the image. Otherwise Kaniko would attempt to use these credentials for ECR access instead of using the instance role. This means it's not possible to build an image with both ECR access via instance role and S3 access using a different set of credentials. To build this image, the instance role that has access to ECR must have the permissions required to access S3.
Using self-signed registry#
If you need to build your function and push the resulting container image to an external Docker registry that uses a self-signed SSL certificate,
you can use Kaniko with the
When using this flag, Kaniko ignores the SSL certificate verification while pulling base images and/or pushing the final built image to the registry over HTTPS.
Caution: Using the
--skip-tls-verify flag poses security risks since it bypasses SSL certificate validation.
Only use this flag in trusted environments or with private registries where you are confident in the security of the network connections.
To use this flag, pass it in the extra_args parameter, for example:
Build environment variables#
It is possible to pass environment variables that will be set in the Kaniko pod that executes the build. This
may be useful to pass important information needed for the build process. The variables are passed as a dictionary in
builder_env parameter, for example:
It is also possible to pass custom arguments and flags to Kaniko.
extra_args parameter can be utilized in
build_function(), or during the deployment of the function. It provides a way to fine-tune
the Kaniko build process according to your specific needs.
You can provide the
extra_args as a string in the format of a CLI command line, just as you would when using
Kaniko directly, for example:
extra_args="--build arg GIT_TOKEN=token --skip-tls-verify",
Note that when building an image in MLRun, project secrets are automatically passed to the builder pod as environment variables whose name is the secret key.
Deploying nuclio functions#
When using nuclio functions, the image build process is done by nuclio as part of the deployment of the function.
Most of the configurations mentioned in this page are available for nuclio functions as well. To deploy a nuclio
deploy_function() instead of using
Creating default Spark runtime images#
When using Spark to execute code, either using a Spark service (remote-spark) or the Spark operator, an image is required that contains both Spark binaries and dependencies, and MLRun code and dependencies. This image is used in the following scenarios:
For remote-spark, the image is used to run the initial MLRun code which will submit the Spark job using the remote Spark service
For Spark operator, the image is used for both the driver and the executor pods used to execute the Spark job
This image needs to be created any time a new version of Spark or MLRun is being used, to ensure that jobs are executed with the correct versions of both products.
To prepare this image, MLRun provides the following facilities:
# For remote Spark
from mlrun.runtimes import RemoteSparkRuntime
# For Spark operator
from mlrun.runtimes import Spark3Runtime