Creating a custom packager - itemization (bundling & unbundling) tutorial

Creating a custom packager - itemization (bundling & unbundling) tutorial#

This tutorial walks through creating a custom packager that supports bundling and unbundling. The packager handles EvalSuite — a custom collection type that wraps dict[str, pd.DataFrame], where each key is an evaluation name and each value is a results DataFrame.

You will learn how to:

Define a custom collection type (EvalSuite)
Implement unbundle() so the "*" log-hint prefix decomposes the suite into individual dataset artifacts
Implement bundle() to reconstruct an EvalSuite from a dict of DataFrames
Implement pack_file() / unpack_file() for saving/loading the suite as a single artifact
Test all three flows: unbundled output, single-file output, and consuming a packed suite as a typed input in a downstream handler

In this section

The problem
Setup
Create the custom packager
Test 1: unbundling with "*"
Test 2: packing as a single file (no unbundling)
Test 3: consuming unbundled artifacts as an EvalSuite input
Summary of all three patterns

The problem#

Imagine a function that evaluates an AI system on multiple benchmarks and returns all results in a single object. Without bundling support you have two options:

Return the whole object as one artifact — opaque, hard to browse individual benchmarks in the UI
Manually break it apart — return each DataFrame separately and wire up the log hints by hand

With a bundling packager, you can return the single EvalSuite object and use "*eval_suite" in the log hint. The packager automatically unbundles it — each benchmark becomes its own DatasetArtifact. And when a downstream function needs the full suite, the packager re-bundles the individual artifacts back into an EvalSuite.

Setup#

In this tutorial, you'll create a project and define the EvalSuite type.

Create project#

import mlrun

project = mlrun.get_or_create_project("itemization-tutorial", "./")

Define the EvalSuite type#

EvalSuite is a thin wrapper around dict[str, pd.DataFrame]. It inherits from dict so it behaves like a normal dictionary but has its own type — which is what the packager matches on.

%%writefile eval_suite.py

import pandas as pd


class EvalSuite(dict):
    """
    A collection of named evaluation results.

    Each key is a benchmark name (str) and each value is a
    results DataFrame (pd.DataFrame).
    """

    def summary(self) -> pd.DataFrame:
        """Return a summary DataFrame with mean scores per benchmark."""
        rows = []
        for name, df in self.items():
            numeric_cols = df.select_dtypes(include="number")
            row = {"benchmark": name, **numeric_cols.mean().to_dict()}
            rows.append(row)
        return pd.DataFrame(rows)

Write a function#

This handler simulates running three benchmarks and returns the results as an EvalSuite.

%%writefile run_benchmarks.py

import pandas as pd

from eval_suite import EvalSuite


def run_benchmarks(num_samples: int = 5) -> EvalSuite:
    """
    Simulate running multiple evaluation benchmarks.

    :param num_samples: Number of samples per benchmark.

    :returns: An EvalSuite with results from three benchmarks.
    """
    suite = EvalSuite()

    # Benchmark 1: accuracy test
    suite["accuracy"] = pd.DataFrame({
        "sample_id": range(num_samples),
        "correct": [True, True, False, True, True][:num_samples],
        "score": [0.95, 0.88, 0.42, 0.91, 0.87][:num_samples],
    })

    # Benchmark 2: latency test
    suite["latency"] = pd.DataFrame({
        "sample_id": range(num_samples),
        "response_ms": [120, 95, 340, 110, 88][:num_samples],
        "tokens": [45, 32, 128, 51, 29][:num_samples],
    })

    # Benchmark 3: safety test
    suite["safety"] = pd.DataFrame({
        "sample_id": range(num_samples),
        "passed": [True, True, True, False, True][:num_samples],
        "risk_score": [0.01, 0.03, 0.02, 0.85, 0.04][:num_samples],
    })

    return suite

fn = project.set_function(
    "run_benchmarks.py",
    name="run-benchmarks",
    kind="job",
    image="mlrun/mlrun",
    handler="run_benchmarks",
)

Create the custom packager#

Write the packager#

The packager supports three modes:

As a single file (pack_file / unpack_file) — saves all DataFrames to a single JSON file
Unbundling (unbundle) — decomposes the suite into its inner dict so each DataFrame is packed separately by the Pandas packager
Bundling (bundle) — reconstructs an EvalSuite from a dict of DataFrames

%%writefile suite_packager.py

import json
import os
import tempfile

import pandas as pd

from mlrun.artifacts import Artifact
from mlrun.package.packagers.default_packager import DefaultPackager
from mlrun.package.utils import ArtifactType

from eval_suite import EvalSuite


class EvalSuitePackager(DefaultPackager):
    """
    A custom packager for EvalSuite objects that supports bundling
    and unbundling.

    - ``pack_file`` / ``unpack_file``: save/load the entire suite as
      a single JSON artifact.
    - ``unbundle``: decompose into a dict of DataFrames for per-benchmark
      artifact logging.
    - ``bundle``: reconstruct an EvalSuite from a dict of DataFrames.
    """

    PACKABLE_OBJECT_TYPE = EvalSuite
    DEFAULT_PACKING_ARTIFACT_TYPE = ArtifactType.FILE
    DEFAULT_UNPACKING_ARTIFACT_TYPE = ArtifactType.FILE
    BUNDLE_FROM_DICT = True  # Enable bundling from dicts (the manager will use bundle()/unbundle())

    def pack_file(
        self,
        obj: EvalSuite,
        key: str,
    ) -> tuple[Artifact, dict]:
        """
        Save the entire EvalSuite as a single JSON file.

        Each benchmark's DataFrame is serialized using pandas' ``to_dict()``
        with orient='records'.

        :param obj: The EvalSuite to pack.
        :param key: The artifact key.

        :returns: The artifact and unpacking instructions.
        """
        data = {
            name: df.to_dict(orient="records")
            for name, df in obj.items()
        }

        temp_dir = tempfile.mkdtemp()
        file_path = os.path.join(temp_dir, f"{key}.json")
        with open(file_path, "w") as f:
            json.dump(data, f, indent=2)

        self.add_future_clearing_path(temp_dir)

        artifact = Artifact(key=key, src_path=file_path)
        return artifact, {}

    def unpack_file(
        self,
        data_item,
    ) -> EvalSuite:
        """
        Load an EvalSuite from a JSON file.

        :param data_item: The data item pointing to the JSON artifact.

        :returns: The reconstructed EvalSuite.
        """
        local_path = self.get_data_item_local_path(data_item=data_item)
        with open(local_path) as f:
            data = json.load(f)

        return EvalSuite({
            name: pd.DataFrame(records)
            for name, records in data.items()
        })

    def unbundle(
        self,
        bundled_object: EvalSuite,
    ) -> dict:
        """
        Unbundle an EvalSuite into its inner dict.

        Each value (a pd.DataFrame) will be packed separately by the
        Pandas packager — so each benchmark becomes its own DatasetArtifact.

        :param bundled_object: The EvalSuite to unbundle.

        :returns: The inner dict of DataFrames.
        """
        return dict(bundled_object)

    def bundle(
        self,
        collection: dict | list,
    ) -> EvalSuite:
        """
        Bundle a dict of DataFrames back into an EvalSuite.

        :param collection: A dict mapping benchmark names to DataFrames.

        :returns: The reconstructed EvalSuite.
        """
        return EvalSuite(collection)

Notice:

BUNDLE_FROM_DICT = True tells the packager manager that EvalSuite can be constructed from a dict. This enables both the "*" unbundling operator on output and dict-based input bundling.
unbundle() simply returns dict(bundled_object) — the plain dict of DataFrames. The manager then packs each DataFrame individually using the built-in Pandas packager (as DatasetArtifacts).
bundle() wraps a dict back into EvalSuite(collection). The manager calls this when a function declares suite: EvalSuite as a type hint and receives a dict of data items as input.
pack_file() / unpack_file() handle the case where the suite is logged as a single file artifact (without the "*" prefix).

Register the packager#

project.add_custom_packager(
    packager="suite_packager.EvalSuitePackager", is_mandatory=True
)

Test 1: unbundling with `"*"`#

Using "*eval_suite" in the log hint triggers unbundling. The packager calls unbundle() to get the inner dict, then the manager packs each DataFrame individually — creating three separate DatasetArtifacts.

run_unbundled = fn.run(
    local=True,
    params={"num_samples": 5},
    returns=["*eval_suite"],  # unbundle into separate artifacts
)

run_unbundled.outputs

Each benchmark is now its own artifact. You can inspect them individually:

# List the unbundled artifact keys
[key for key in run_unbundled.outputs if key.startswith("eval_suite")]

Test 2: packing as a single file (no unbundling)#

Without the "*" prefix, the entire suite is packed as a single JSON file artifact via pack_file().

run_single = fn.run(
    local=True,
    params={"num_samples": 5},
    returns=["eval_suite"],  # no * → single file artifact
)

run_single.outputs

Test 3: consuming unbundled artifacts as an EvalSuite input#

The real payoff of bundling is round-trip consumption. The unbundled artifacts from Test 1 (three separate DatasetArtifacts) can be passed as input to a downstream handler that expects an EvalSuite. The packager's bundle() method automatically re-assembles the individual DataFrames back into an EvalSuite.

%%writefile summarize_suite.py

import pandas as pd
from eval_suite import EvalSuite


def summarize(suite: EvalSuite) -> pd.DataFrame:
    """
    Summarize an EvalSuite into a single DataFrame.

    :param suite: The EvalSuite to summarize (may arrive as unbundled artifacts).

    :returns: A summary DataFrame with mean scores per benchmark.
    """
    return suite.summary()

summarize_fn = project.set_function(
    "summarize_suite.py",
    name="summarize-suite",
    kind="job",
    image="mlrun/mlrun",
    handler="summarize",
)

# Pass the unbundled artifacts — the packager re-bundles them into an EvalSuite
summarize_run = summarize_fn.run(
    local=True,
    inputs={"suite": run_unbundled.outputs["eval_suite"]},
    returns=["summary : dataset"],
)

summarize_run.artifact("summary").as_df()

Summary of all three patterns#

Across these tutorials you've seen the three main patterns for custom packagers:

Pattern	Tutorial	Key methods	Use case
Pack-only	PIL Image	`pack_file`, `pack_plot`	Types that are produced but rarely consumed as typed inputs
Round-trip	LangChain	`pack_file`, `unpack_file`	Types that need to be saved and loaded back across functions
Bundling	EvalSuite (this tutorial)	`pack_file`, `unpack_file`, `bundle`, `unbundle`	Collection types that should decompose into individual artifacts

All three follow the same core recipe:

Subclass DefaultPackager
Set PACKABLE_OBJECT_TYPE and DEFAULT_PACKING_ARTIFACT_TYPE
Implement pack_*() and optionally unpack_*() methods
Clean up temp files with add_future_clearing_path()
Register with project.add_custom_packager(packager="module.Class", is_mandatory=True)

For the full API reference, see DefaultPackager and Packager.