Build function image#
As discussed in Images and their usage in MLRun, MLRun provides pre-built images which contain the components necessary to execute an MLRun runtime. In some cases, however, custom images need to be created. This page details this process and the available options.
When is a build required?#
In many cases an MLRun runtime can be executed without having to build an image. This will be true when the basic MLRun images fulfill all the requirements for the code to execute. It is required to build an image if one of the following is true:
The code uses additional Python packages, OS packages, scripts or other configurations that need to be applied
The code uses different base-images or different versions of MLRun images than provided by default
Executed source code has changed, and the image has the code packaged in it - see here for more details on source code, and using
with_code()
to avoid re-building the image when the code has changedThe code runs nuclio functions, which are packaged as images (the build is triggered by MLRun and executed by nuclio)
The build process in MLRun is based on Kaniko and automated by MLRun - MLRun generates the dockerfile for the build process, and configures Kaniko with parameters needed for the build.
Building images is done through functions provided by the MlrunProject
class. By using
project functions, the same process is used to build and deploy a stand-alone function or functions serving as steps
in a pipeline.
Automatically building images#
MLRun has the capability to auto-detect when a function image needs to first be built. Following is an example that will require building of the image:
project = mlrun.new_project(project_name, "./proj")
project.set_function(
"train_code.py",
name="trainer",
kind="job",
image="mlrun/mlrun",
handler="train_func",
requirements=["pandas"]
)
# auto_build will trigger building the image before running,
# due to the additional requirements.
project.run_function("trainer", auto_build=True)
Using the auto_build
option is only suitable when the build configuration does not change between runs of the
runtime. For example, if during the development process new requirements were added, the auto_build
parameter should
not be used, and manual build is needed to re-trigger a build of the image.
In the example above, the requirements
parameter was used to specify a list of additional Python packages required by
the code. This option directly affects the image build process - each requirement is installed using pip
as
part of the docker-build process. The requirements
parameter can also contain a path to a requirements file, making
it easier to reuse an existing configuration rather than specify a list of packages.
Manually building an image#
To manually build an image, use the build_function()
function, which provides multiple
options that control and configure the build process.
Specifying base image#
To use an existing image as the base image for building the image, set the image name in the base_image
parameter.
Note that this image serves as the base (dockerfile FROM
property), and should not to be confused with the
resulting image name, as specified in the image
parameter.
project.build_function(
"trainer",
base_image="myrepo/my_base_image:latest",
)
Running commands#
To run arbitrary commands during the image build, pass them in the commands
parameter of
build_function()
. For example:
github_repo = "myusername/myrepo.git@mybranch"
project.build_function(
"trainer",
base_image="myrepo/base_image:latest",
commands= [
"pip install git+https://github.com/" + github_repo,
"mkdir -p /some/path && chmod 0777 /some/path",
]
)
These commands are added as RUN
operations to the dockerfile generating the image.
MLRun package deployment#
The with_mlrun
and mlrun_version_specifier
parameters allow control over the inclusion of the MLRun package in the
build process. Depending on the base-image used for the build, the MLRun package may already be available in which
case use with_mlrun=False
. If not specified, MLRun will attempt to detect this situation - if the image used is one
of the default MLRun images released with MLRun, with_mlrun
is automatically set to False
.
If the code execution requires a different version of MLRun than the one used to deploy the function,
set the mlrun_version_specifier
to point at the specific version needed. This uses the published MLRun images
of the specified version instead.
For example:
project.build_function(
"trainer",
with_mlrun=True,
mlrun_version_specifier="1.0.0"
)
Working with code repository#
As the code matures and evolves, the code will usually be stored in a git code repository.
When the MLRun project is associated with a git repo (see Create, save, and use projects for details), functions can be added
by calling set_function()
and setting with_repo=True
. This indicates that the
code of the function should be retrieved from the project code repository.
In this case, the entire code repository will be retrieved from git as part of the image-building process, and cloned into the built image. This is recommended when the function relies on code spread across multiple files and also is usually preferred for production code, since it means that the code of the function is stable, and further modifications to the code will not cause instability in deployed images.
During the development phase it may be desired to retrieve the code in runtime, rather than re-build the function
image every time the code changes. To enable this, use set_source()
which
gets a path to the source (can be a git repository or a tar or zip file) and set pull_at_runtime=True
.
Using a private Docker registry#
By default, images are pushed to the registry configured during MLRun deployment, using the configured registry credentials.
To push resulting images to a different registry, specify the registry URL in the image
parameter. If
the registry requires credentials, create a k8s secret containing these credentials, and pass its name in the
secret_name
parameter.
When using ECR as registry, MLRun uses Kaniko’s ECR credentials helper, in which case the secret provided should contain AWS credentials needed to create ECR repositories, as described here. MLRun detects automatically that the registry is an ECR registry based on its URL and configures Kaniko to use the ECR helper. For example:
# AWS credentials stored in a k8s secret -
# kubectl create secret generic ecr-credentials --from-file=<path to .aws/credentials>
project.build_function(
"trainer",
image="<aws_account_id>.dkr.ecr.us-east-2.amazonaws.com/myrepo/image:v1",
secret_name="ecr-credentials",
)
Using self-signed registry#
If you need to build your function and push the resulting container image to an external Docker registry that uses a self-signed SSL certificate,
you can use Kaniko with the --skip-tls-verify
flag.
When using this flag, Kaniko ignores the SSL certificate verification while pulling base images and/or pushing the final built image to the registry over HTTPS.
Caution: Using the --skip-tls-verify
flag poses security risks since it bypasses SSL certificate validation.
Only use this flag in trusted environments or with private registries where you are confident in the security of the network connections.
To use this flag, pass it in the extra_args parameter, for example:
project.build_function(
...
extra_args="--skip-tls-verify",
)
Build environment variables#
It is possible to pass environment variables that will be set in the Kaniko pod that executes the build. This
may be useful to pass important information needed for the build process. The variables are passed as a dictionary in
the builder_env
parameter, for example:
project.build_function(
...
builder_env={"GIT_TOKEN": token},
)
Extra arguments#
It is also possible to pass custom arguments and flags to Kaniko.
The extra_args
parameter can be utilized in build_image()
,
build_function()
, or during the deployment of the function. It provides a way to fine-tune
the Kaniko build process according to your specific needs.
You can provide the extra_args
as a string in the format of a CLI command line, just as you would when using
Kaniko directly, for example:
project.build_function(
...
extra_args="--build arg GIT_TOKEN=token --skip-tls-verify",
)
Note that when building an image in MLRun, project secrets are automatically passed to the builder pod as environment variables whose name is the secret key.
Deploying nuclio functions#
When using nuclio functions, the image build process is done by nuclio as part of the deployment of the function.
Most of the configurations mentioned in this page are available for nuclio functions as well. To deploy a nuclio
function, use deploy_function()
instead of using
build_function()
and run_function()
.
Creating default Spark runtime images#
When using Spark to execute code, either using a Spark service (remote-spark) or the Spark operator, an image is required that contains both Spark binaries and dependencies, and MLRun code and dependencies. This image is used in the following scenarios:
For remote-spark, the image is used to run the initial MLRun code which will submit the Spark job using the remote Spark service
For Spark operator, the image is used for both the driver and the executor pods used to execute the Spark job
This image needs to be created any time a new version of Spark or MLRun is being used, to ensure that jobs are executed with the correct versions of both products.
To prepare this image, MLRun provides the following facilities:
# For remote Spark
from mlrun.runtimes import RemoteSparkRuntime
RemoteSparkRuntime.deploy_default_image()
# For Spark operator
from mlrun.runtimes import Spark3Runtime
Spark3Runtime.deploy_default_image()