Creating a custom packager - itemization (bundling & unbundling) tutorial#
This tutorial walks through creating a custom packager that supports
bundling and unbundling. The packager handles EvalSuite — a custom
collection type that wraps dict[str, pd.DataFrame], where each key is an
evaluation name and each value is a results DataFrame.
You will learn how to:
Define a custom collection type (
EvalSuite)Implement
unbundle()so the"*"log-hint prefix decomposes the suite into individual dataset artifactsImplement
bundle()to reconstruct anEvalSuitefrom a dict of DataFramesImplement
pack_file()/unpack_file()for saving/loading the suite as a single artifactTest all three flows: unbundled output, single-file output, and consuming a packed suite as a typed input in a downstream handler
In this section
The problem#
Imagine a function that evaluates an AI system on multiple benchmarks and returns all results in a single object. Without bundling support you have two options:
Return the whole object as one artifact — opaque, hard to browse individual benchmarks in the UI
Manually break it apart — return each DataFrame separately and wire up the log hints by hand
With a bundling packager, you can return the single EvalSuite object and use
"*eval_suite" in the log hint. The packager automatically unbundles it — each
benchmark becomes its own DatasetArtifact. And when a downstream function needs
the full suite, the packager re-bundles the individual artifacts back into an
EvalSuite.
Setup#
In this tutorial, you'll create a project and define the EvalSuite type.
Create project#
import mlrun
project = mlrun.get_or_create_project("itemization-tutorial", "./")
Define the EvalSuite type#
EvalSuite is a thin wrapper around dict[str, pd.DataFrame]. It inherits
from dict so it behaves like a normal dictionary but has its own type — which
is what the packager matches on.
%%writefile eval_suite.py
import pandas as pd
class EvalSuite(dict):
"""
A collection of named evaluation results.
Each key is a benchmark name (str) and each value is a
results DataFrame (pd.DataFrame).
"""
def summary(self) -> pd.DataFrame:
"""Return a summary DataFrame with mean scores per benchmark."""
rows = []
for name, df in self.items():
numeric_cols = df.select_dtypes(include="number")
row = {"benchmark": name, **numeric_cols.mean().to_dict()}
rows.append(row)
return pd.DataFrame(rows)
Write a function#
This handler simulates running three benchmarks and returns the results as
an EvalSuite.
%%writefile run_benchmarks.py
import pandas as pd
from eval_suite import EvalSuite
def run_benchmarks(num_samples: int = 5) -> EvalSuite:
"""
Simulate running multiple evaluation benchmarks.
:param num_samples: Number of samples per benchmark.
:returns: An EvalSuite with results from three benchmarks.
"""
suite = EvalSuite()
# Benchmark 1: accuracy test
suite["accuracy"] = pd.DataFrame({
"sample_id": range(num_samples),
"correct": [True, True, False, True, True][:num_samples],
"score": [0.95, 0.88, 0.42, 0.91, 0.87][:num_samples],
})
# Benchmark 2: latency test
suite["latency"] = pd.DataFrame({
"sample_id": range(num_samples),
"response_ms": [120, 95, 340, 110, 88][:num_samples],
"tokens": [45, 32, 128, 51, 29][:num_samples],
})
# Benchmark 3: safety test
suite["safety"] = pd.DataFrame({
"sample_id": range(num_samples),
"passed": [True, True, True, False, True][:num_samples],
"risk_score": [0.01, 0.03, 0.02, 0.85, 0.04][:num_samples],
})
return suite
fn = project.set_function(
"run_benchmarks.py",
name="run-benchmarks",
kind="job",
image="mlrun/mlrun",
handler="run_benchmarks",
)
Create the custom packager#
Write the packager#
The packager supports three modes:
As a single file (
pack_file/unpack_file) — saves all DataFrames to a single JSON fileUnbundling (
unbundle) — decomposes the suite into its inner dict so each DataFrame is packed separately by the Pandas packagerBundling (
bundle) — reconstructs anEvalSuitefrom a dict of DataFrames
%%writefile suite_packager.py
import json
import os
import tempfile
import pandas as pd
from mlrun.artifacts import Artifact
from mlrun.package.packagers.default_packager import DefaultPackager
from mlrun.package.utils import ArtifactType
from eval_suite import EvalSuite
class EvalSuitePackager(DefaultPackager):
"""
A custom packager for EvalSuite objects that supports bundling
and unbundling.
- ``pack_file`` / ``unpack_file``: save/load the entire suite as
a single JSON artifact.
- ``unbundle``: decompose into a dict of DataFrames for per-benchmark
artifact logging.
- ``bundle``: reconstruct an EvalSuite from a dict of DataFrames.
"""
PACKABLE_OBJECT_TYPE = EvalSuite
DEFAULT_PACKING_ARTIFACT_TYPE = ArtifactType.FILE
DEFAULT_UNPACKING_ARTIFACT_TYPE = ArtifactType.FILE
BUNDLE_FROM_DICT = True # Enable bundling from dicts (the manager will use bundle()/unbundle())
def pack_file(
self,
obj: EvalSuite,
key: str,
) -> tuple[Artifact, dict]:
"""
Save the entire EvalSuite as a single JSON file.
Each benchmark's DataFrame is serialized using pandas' ``to_dict()``
with orient='records'.
:param obj: The EvalSuite to pack.
:param key: The artifact key.
:returns: The artifact and unpacking instructions.
"""
data = {
name: df.to_dict(orient="records")
for name, df in obj.items()
}
temp_dir = tempfile.mkdtemp()
file_path = os.path.join(temp_dir, f"{key}.json")
with open(file_path, "w") as f:
json.dump(data, f, indent=2)
self.add_future_clearing_path(temp_dir)
artifact = Artifact(key=key, src_path=file_path)
return artifact, {}
def unpack_file(
self,
data_item,
) -> EvalSuite:
"""
Load an EvalSuite from a JSON file.
:param data_item: The data item pointing to the JSON artifact.
:returns: The reconstructed EvalSuite.
"""
local_path = self.get_data_item_local_path(data_item=data_item)
with open(local_path) as f:
data = json.load(f)
return EvalSuite({
name: pd.DataFrame(records)
for name, records in data.items()
})
def unbundle(
self,
bundled_object: EvalSuite,
) -> dict:
"""
Unbundle an EvalSuite into its inner dict.
Each value (a pd.DataFrame) will be packed separately by the
Pandas packager — so each benchmark becomes its own DatasetArtifact.
:param bundled_object: The EvalSuite to unbundle.
:returns: The inner dict of DataFrames.
"""
return dict(bundled_object)
def bundle(
self,
collection: dict | list,
) -> EvalSuite:
"""
Bundle a dict of DataFrames back into an EvalSuite.
:param collection: A dict mapping benchmark names to DataFrames.
:returns: The reconstructed EvalSuite.
"""
return EvalSuite(collection)
Notice:
BUNDLE_FROM_DICT = Truetells the packager manager thatEvalSuitecan be constructed from adict. This enables both the"*"unbundling operator on output and dict-based input bundling.unbundle()simply returnsdict(bundled_object)— the plain dict of DataFrames. The manager then packs each DataFrame individually using the built-in Pandas packager (asDatasetArtifacts).bundle()wraps a dict back intoEvalSuite(collection). The manager calls this when a function declaressuite: EvalSuiteas a type hint and receives a dict of data items as input.pack_file()/unpack_file()handle the case where the suite is logged as a single file artifact (without the"*"prefix).
Register the packager#
project.add_custom_packager(
packager="suite_packager.EvalSuitePackager", is_mandatory=True
)
Test 1: unbundling with "*"#
Using "*eval_suite" in the log hint triggers unbundling. The packager calls
unbundle() to get the inner dict, then the manager packs each DataFrame
individually — creating three separate DatasetArtifacts.
run_unbundled = fn.run(
local=True,
params={"num_samples": 5},
returns=["*eval_suite"], # unbundle into separate artifacts
)
run_unbundled.outputs
Each benchmark is now its own artifact. You can inspect them individually:
# List the unbundled artifact keys
[key for key in run_unbundled.outputs if key.startswith("eval_suite")]
Test 2: packing as a single file (no unbundling)#
Without the "*" prefix, the entire suite is packed as a single JSON
file artifact via pack_file().
run_single = fn.run(
local=True,
params={"num_samples": 5},
returns=["eval_suite"], # no * → single file artifact
)
run_single.outputs
Test 3: consuming unbundled artifacts as an EvalSuite input#
The real payoff of bundling is round-trip consumption. The unbundled artifacts
from Test 1 (three separate DatasetArtifacts) can be passed as input to a
downstream handler that expects an EvalSuite. The packager's bundle() method
automatically re-assembles the individual DataFrames back into an EvalSuite.
%%writefile summarize_suite.py
import pandas as pd
from eval_suite import EvalSuite
def summarize(suite: EvalSuite) -> pd.DataFrame:
"""
Summarize an EvalSuite into a single DataFrame.
:param suite: The EvalSuite to summarize (may arrive as unbundled artifacts).
:returns: A summary DataFrame with mean scores per benchmark.
"""
return suite.summary()
summarize_fn = project.set_function(
"summarize_suite.py",
name="summarize-suite",
kind="job",
image="mlrun/mlrun",
handler="summarize",
)
# Pass the unbundled artifacts — the packager re-bundles them into an EvalSuite
summarize_run = summarize_fn.run(
local=True,
inputs={"suite": run_unbundled.outputs["eval_suite"]},
returns=["summary : dataset"],
)
summarize_run.artifact("summary").as_df()
Summary of all three patterns#
Across these tutorials you've seen the three main patterns for custom packagers:
Pattern |
Tutorial |
Key methods |
Use case |
|---|---|---|---|
Pack-only |
|
Types that are produced but rarely consumed as typed inputs |
|
Round-trip |
|
Types that need to be saved and loaded back across functions |
|
Bundling |
EvalSuite (this tutorial) |
|
Collection types that should decompose into individual artifacts |
All three follow the same core recipe:
Subclass
DefaultPackagerSet
PACKABLE_OBJECT_TYPEandDEFAULT_PACKING_ARTIFACT_TYPEImplement
pack_*()and optionallyunpack_*()methodsClean up temp files with
add_future_clearing_path()Register with
project.add_custom_packager(packager="module.Class", is_mandatory=True)
For the full API reference, see DefaultPackager
and Packager.