Using packagers to automate I/O in a gen AI agent pipeline#
This tutorial demonstrates how MLRun packagers automate input parsing and output logging in a mock scenario: evaluating multiple gen-AI agents on a set of test prompts.
To spare the need for API keys, the "agents" are simple string-formatting heuristics rather than real LLM calls. The focus is on how packagers handle mixed output types and input parsing, not on the evaluation logic itself.
You will learn how to:
Log mixed output types (DataFrames, dicts, strings) using log hints
Use
LogHintobjects for fine-grained control (labels, artifact types)Itemize (unbundle) a dict of responses into separate artifacts with the
*prefixPass a previously logged artifact as a typed input to a downstream function
Understand the difference between
params(direct values) andinputs(DataItems parsed by packagers)
In this section
Setup#
Import the required packages and create (or load) an MLRun project.
import mlrun
from mlrun import LogHint
project = mlrun.get_or_create_project("packagers-tutorial", "./")
Define the evaluation handler#
The handler below simulates evaluating multiple gen AI agents. Each "agent" is just a string-formatting heuristic — no real LLM calls are made, so the notebook runs anywhere without API keys.
Notice the function is pure Python — no MLRun imports, no context object.
The returns log hints (shown later) tell packagers how to log each returned value.
%%writefile eval_agents.py
import pandas as pd
def evaluate_agents(
agents_config: dict,
prompts: list,
) -> tuple[pd.DataFrame, dict, dict, str]:
"""
Evaluate simulated gen AI agents on a set of prompts.
:param agents_config: Mapping of agent name to its configuration dict.
Each config has keys like 'style' and 'max_words'.
:param prompts: List of test prompt strings.
:returns: A tuple of (evaluation DataFrame, best agent dict,
all responses dict, summary string).
"""
scores = []
all_responses = {}
for agent_name, config in agents_config.items():
style = config.get("style", "neutral")
max_words = config.get("max_words", 20)
responses = []
for prompt in prompts:
# Simulated response — no real LLM call
response = f"[{style}] Re: {prompt[:40]}... (max {max_words} words)"
responses.append(response)
# Simulated scoring heuristic
relevance = len(style) * 7 % 100 # deterministic pseudo-score
clarity = (max_words * 3 + 10) % 100
overall = round((relevance + clarity) / 2, 1)
scores.append({
"agent": agent_name,
"style": style,
"relevance": relevance,
"clarity": clarity,
"overall": overall,
})
all_responses[agent_name] = responses
# Build the evaluation DataFrame
evaluation = pd.DataFrame(scores)
# Identify the best agent
best_idx = evaluation["overall"].idxmax()
best_agent = {
"name": evaluation.loc[best_idx, "agent"],
"overall_score": evaluation.loc[best_idx, "overall"],
"config": agents_config[evaluation.loc[best_idx, "agent"]],
}
# Human-readable summary
summary = (
f"Evaluated {len(agents_config)} agents on {len(prompts)} prompts. "
f"Best agent: {best_agent['name']} (score: {best_agent['overall_score']})."
)
return evaluation, best_agent, all_responses, summary
Create the MLRun function#
Register the handler file as an MLRun function. The kind="job" means it can run
locally or on a Kubernetes cluster.
eval_agents = project.set_function(
"eval_agents.py",
name="eval-agents",
kind="job",
image="mlrun/mlrun",
handler="evaluate_agents",
)
Run with mixed log hints#
Note that agents_config and prompts are passed via params={} — they arrive as
plain Python objects (a dict and a list) directly, with no packager involvement.
Packagers only parse values passed via inputs={}, which flow through DataItems.
The second handler below shows input parsing in action.
The returns list uses four different log-hint styles to demonstrate the full range of
packagers output capabilities:
Return value |
Log hint |
What happens |
|---|---|---|
|
|
String shortcut — logged as a |
|
|
LogHint object — logged as a |
|
|
Unbundled — each agent's response list becomes a separate artifact |
|
|
Key only — artifact type inferred from the value type ( |
eval_agents_run = eval_agents.run(
local=True,
params={
"agents_config": {
"concise-bot": {"style": "concise", "max_words": 15},
"verbose-bot": {"style": "verbose", "max_words": 50},
"formal-bot": {"style": "formal", "max_words": 30},
},
"prompts": [
"Explain the benefits of retrieval-augmented generation.",
"Compare fine-tuning vs. prompt engineering.",
"Summarize best practices for LLM evaluation.",
],
},
returns=[
"evaluation : dataset", # string shortcut
LogHint(key="best_agent", labels={"stage": "eval"}), # LogHint with labels
"*all_responses", # unbundled dict
"summary", # key only
],
)
Inspect the results#
The run's outputs dictionary contains all logged artifacts and results.
Here's a look at each one.
eval_agents_run.outputs
Evaluation DataFrame#
The evaluation output was logged as a DatasetArtifact. You can retrieve it as a
DataFrame directly.
eval_agents_run.artifact("evaluation").as_df()
Best agent (result)#
The best_agent dict was logged as a result — a lightweight value stored
directly in the run's metadata (no artifact file created).
eval_agents_run.outputs["best_agent"]
Summary (result)#
The summary string was also logged as a result.
eval_agents_run.outputs["summary"]
Itemized responses#
Because the log hint was "*all_responses", the packager unbundled the dict:
each key (concise-bot, verbose-bot, formal-bot) became a separate artifact.
You can see them in the outputs with a prefix of all_responses_.
# List all unbundled response keys
[key for key in eval_agents_run.outputs if key.startswith("all_responses")]
Consuming packaged artifacts as inputs#
This is where packager input parsing comes into play. When you pass a previously
logged artifact via inputs={}, it arrives as a DataItem. The packager sees the
type hint on the function parameter and automatically converts it to the declared
Python type — no manual .as_df() or json.loads() needed.
This is distinct from params={}, which pass plain JSON serializable Python values directly to the
function with no packager involvement.
Here's a second handler that takes the evaluation DataFrame as input and returns a filtered version.
%%writefile filter_agents.py
import pandas as pd
def filter_top_agents(
evaluation: pd.DataFrame,
min_score: float = 40.0,
) -> pd.DataFrame:
"""
Filter agents whose overall score meets the threshold.
:param evaluation: The evaluation DataFrame produced by evaluate_agents.
:param min_score: Minimum overall score to keep.
:returns: Filtered DataFrame.
"""
return evaluation[evaluation["overall"] >= min_score]
filter_agents = project.set_function(
"filter_agents.py",
name="filter-agents",
kind="job",
image="mlrun/mlrun",
handler="filter_top_agents",
)
Now pass the evaluation artifact from the previous run as an input.
The packager sees the type hint evaluation: pd.DataFrame and automatically
converts the DataItem to a DataFrame — no .as_df() call needed inside the function.
filter_agents_run = filter_agents.run(
local=True,
inputs={"evaluation": eval_agents_run.outputs["evaluation"]},
params={"min_score": 40.0},
returns=["top_agents : dataset"],
)
filter_agents_run.artifact("top_agents").as_df()
What the packagers did automatically#
Here's a summary of what happened behind the scenes — and what you would have had to do manually without packagers:
Step |
With packagers |
Without packagers |
|---|---|---|
Pass |
|
Same — |
Log evaluation DataFrame |
|
|
Log best_agent dict with labels |
|
|
Unbundle responses per agent |
|
Manual loop: |
Log summary string |
|
|
Parse DataFrame input in 2nd function |
|
|
Key distinction: params pass plain Python values — no packager processing.
inputs pass DataItem references — packagers parse them into the type-hinted type.
Packager output logging applies to returns regardless of how the inputs were provided.
With packagers, the handler functions contain zero MLRun-specific code — they are pure Python functions that can also be tested and debugged outside of MLRun.
Next steps#
Read the full packagers guide for details on all built-in packagers, the
LogHintfields, and artifact typesSee the custom packagers tutorials to learn how to write packagers for your own types