Custom packager tutorials overview#

Learn when custom packagers are required, and how to create them.

In this section

When to write a custom packager#

Write a custom packager when:

  • Your type isn't handled by a built-in packager — for example, a PIL Image, a LangChain prompt template, or a domain-specific data class

  • You want human-readable serialization — save as JSON, PNG, CSV, etc. instead of an opaque pickle file

  • You need bundling support — your type is a collection that should be decomposed into individual artifacts when unbundled with "*key"

Choosing a base class: DefaultPackager vs Packager#

MLRun provides two base classes for custom packagers. In most cases you should use DefaultPackager.

Packager (full control)#

The base Packager class gives you complete control. You override pack() and unpack() directly and manage artifact-type routing, validation, and fallback behavior yourself. Use this only when DefaultPackager's convention-based approach doesn't fit your needs.

The four patterns#

Custom packagers follow one of four patterns, depending on what your type needs:

Pattern

When to use

What to implement

Pack-only

The type is produced as output but never consumed as a typed input. The framework automatically excludes pack-only artifact types from unpacking validation.

pack_<artifact_type> methods only.

Unpack-only

Legacy/migration support — reading artifacts from an older format while new writes use a different artifact type.

unpack_<artifact_type> methods only.

Round-trip (pack + unpack)

The type needs to be saved and loaded back in a later function.

Both pack_<artifact_type> and unpack_<artifact_type> methods.

Bundling & unbundling

The type is a collection that should decompose into separate artifacts when unbundled.

pack_<artifact_type>/unpack_<artifact_type> plus bundle/unbundle methods and BUNDLE_FROM_LIST/BUNDLE_FROM_DICT flags.

Step-by-step guide#

1. Subclass DefaultPackager#

from mlrun.package import ArtifactType
from mlrun.package.packagers.default_packager import DefaultPackager


class MyTypePackager(DefaultPackager): ...

2. Set class variables#

At a minimum, set the type your packager handles and the default artifact type:

class MyTypePackager(DefaultPackager):
    PACKABLE_OBJECT_TYPE = MyType
    DEFAULT_PACKING_ARTIFACT_TYPE = ArtifactType.FILE
    DEFAULT_UNPACKING_ARTIFACT_TYPE = ArtifactType.FILE

If your type has subclasses that should also be handled by this packager, set PACK_SUBCLASSES = True.

3. Implement pack_<artifact_type>() methods#

Each packing method serializes the object and returns a tuple of (Artifact, instructions_dict). The instructions_dict carries metadata needed to reconstruct the object when unpacking. Returning an artifact means that you can return any of the common subclasses of Artifact, including: ModelArtifact, DatasetArtifact and LLMPromptArtifact.

from mlrun import Artifact


def pack_file(
    self, obj: MyType, key: str, file_format: str = "json"
) -> tuple[Artifact, dict]:
    # Serialize to a temporary file
    path = f"/tmp/{key}.{file_format}"
    obj.save(path)

    # Create the artifact
    artifact = Artifact(key=key, src_path=path)

    # Clean up the temp file after upload
    self.add_future_clearing_path(path)

    # Return artifact + instructions for unpacking
    return artifact, {"file_format": file_format}

Note

Inside a pack_<artifact_type> method you create and return an Artifact object — you do not call context.log_artifact() or context.log_dataset(). The packager manager handles the actual logging and uploading; the pack method's job is only to serialize the data and describe the artifact.

The method name determines the artifact type: pack_file handles artifact_type="file", pack_plot handles "plot", and so on.

Important: Extra parameters like file_format above become packing kwargs that users can pass via log hints:

The class variables DEFAULT_PACKING_ARTIFACT_TYPE must be equal to one of the artifact types defined by your pack_<artifact_type> methods, so that when users log without an explicit artifact type, the packager knows which method to call.

returns = ['my_output : file[file_format="csv"]']

All packing kwargs must have default values so users aren't forced to specify them.

Result artifact type#

A special case of a pack_<artifact_type>() method is the result artifact type — a scalar or simple value (int, float, str, bool) stored directly in run metadata (visible in run.status.results and in the MLRun UI without downloading anything). For result types, the pack method returns a plain dict with the key and value instead of an (Artifact, instructions) tuple:

def pack_result(self, obj: MyType, key: str) -> dict:
    # Stored as run metadata, not as a file artifact
    return {key: obj.score}

DefaultPackager already provides a generic pack_result implementation, so you only need to override it if you want custom extraction logic (e.g. pulling a specific field from your type). The "result" type is always available for packing.

4. Implement unpack_<artifact_type>() methods#

Each unpacking method takes a DataItem and the instructions that were stored during packing, and returns the reconstructed object:

import mlrun


def unpack_file(self, data_item: mlrun.DataItem, file_format: str = "json") -> MyType:
    # Download the artifact to a local path
    local_path = data_item.local()

    # Reconstruct the object
    return MyType.load(local_path, format=file_format)

Each instruction parameter (e.g. file_format) must be optional (have a default value) so that the method can also handle objects that were logged manually rather than through this packager.

The class variables DEFAULT_UNPACKING_ARTIFACT_TYPE must be equal to one of the artifact types defined by your unpack_<artifact_type> methods, so that when users log without an explicit artifact type, the packager knows which method to call.

For pack-only packagers, you can skip implementing unpack_<artifact_type> methods entirely — the artifact type is automatically excluded from unpacking validation, so no extra configuration is needed. Real-world examples:

  • Pack-only: PIL Image → PNG (no need to reconstruct the original PIL object from the logged PNG)

  • Unpack-only: reading a legacy serialization format that should no longer be written (e.g. unpack_v1 for backward compatibility while new outputs use pack_v2)

5. Clean up temporary files#

If your pack_<artifact_type> or unpack_<artifact_type> methods write files to disk, call self.add_future_clearing_path(path) so MLRun deletes them after the artifact is uploaded. This prevents temporary files from accumulating on the worker.

6. Set the priority (optional)#

The PRIORITY class variable (integer 1–10, 1 = highest priority) controls which packager is selected when multiple packagers can handle the same type. Custom packagers default to priority 3, which is higher than the built-in packagers at 5. You rarely need to change this unless you have multiple custom packagers competing for the same type - which is not how the packagers are intended to be used (XPackager should handle type x).

class MyTypePackager(DefaultPackager):
    PRIORITY = 2  # Higher priority than other custom packagers
    ...

7. Register the packager in your project#

Use add_custom_packager() to register your packager:

project.add_custom_packager(packager="my_module.MyTypePackager", is_mandatory=True)

The is_mandatory flag controls what happens when the packager fails to import on a remote worker:

  • True — the run fails immediately with an import error

  • False — the packager is silently skipped and the fallback pickle behavior is used

To remove a registered packager:

project.remove_custom_packager("my_module.MyTypePackager")

8. Make the packager importable on the remote worker#

When running remotely, the worker must be able to import your packager module. There are several ways to achieve this:

  • Pull at runtime (simplest) — set the project source with pull_at_runtime=True so the code is fetched before execution:

    project.set_source(source="./", pull_at_runtime=True)
    
  • Build into the function image — include the packager source in the function's build so it is baked into the container image

  • Shared storage — place the packager module on a shared volume and configure the function's working directory to point there

If the packager module is missing at runtime, the run fails immediately when is_mandatory=True, or falls back to pickle when is_mandatory=False.