Integrating a Hugging Face image classification model with MLRun#
This notebook demonstrates how to set up and test HuggingFace model integration with MLRun, including the Hugging Face profile configuration, creating model artifacts, deploying a serving function with the dedicated_process execution mechanism, and testing the model inference on a sample image. It uses a custom invoke method for non-LLM operations, demonstrated here with image classification.
After completing the steps described here, the model is ready for use with the configured execution mechanism.
In this section
Import the dependencies#
The MLRun imports include:
ModelProvider(): abstract base for integrating with external model providers, primarily generative AI (GenAI) services.ModelRunnerStep(): to run multiple models on each event.LLModel(): to wrap a model for handling a LLM (Large Language Model) prompt-based inference.OpenAIProfile(): to create a new model by parsing and validating input data from keyword arguments.
import os
from dotenv import load_dotenv
import mlrun
import mlrun.artifacts
import mlrun.serving
from mlrun.serving import ModelRunnerStep
from mlrun.datastore.datastore_profile import HuggingFaceProfile
load_dotenv("secrets.env")
> 2025-09-17 13:58:56,766 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-09-17 13:59:00,360 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"0.0.0+unstable"}
True
Configure the project#
The MLRun project is a container for all your work on a this gen AI application. Read more about Projects and automation.
First you configure the project, then initialize it a few steps further on.
# Project configuration
project_name = "hf-image-classification"
image = "mlrun/mlrun"
profile_name = "huggingface_image_classification"
image_classification_model = "microsoft/resnet-50"
execution_mechanism = "dedicated_process"
mlrun_model_name = "sync_invoke_model"
Download an image of a cat to be used for testing:
# Download an image
# This notebook uses an Unsplash image of a cat
# Source: https://unsplash.com
Create the project and the Hugging Face profile#
The HuggingFaceProfile is a datastore profile for credentials management. Read more about Data store profiles.
# Initialize the MLRun project
project = mlrun.get_or_create_project(project_name)
# Create the HuggingFace data store profile with environment variables
profile = HuggingFaceProfile(
name=profile_name,
task="image-classification",
token=os.environ.get("HF_TOKEN"),
device=os.environ.get("HF_DEVICE"),
device_map=os.environ.get("HF_DEVICE_MAP"),
trust_remote_code=os.environ.get("HF_TRUST_REMOTE_CODE"),
)
# Register the profile with the project
project.register_datastore_profile(profile)
# Set up model URL
url_prefix = f"ds://{profile_name}/"
model_url = url_prefix + image_classification_model
print(f"Project: {project_name}")
print(f"Profile: {profile_name}")
print(f"Model URL: {model_url}")
print(f"Execution Mechanism: {execution_mechanism}")
> 2025-09-17 13:59:15,958 [info] Created and saved project: {"context":"./","from_template":null,"name":"hf-image-detection","overwrite":false,"save":true}
> 2025-09-17 13:59:15,961 [info] Project created successfully: {"project_name":"hf-image-detection","stored_in_db":true}
Project: hf-image-detection
Profile: huggingface_image_detection
Model URL: ds://huggingface_image_detection/microsoft/resnet-50
Execution Mechanism: dedicated_process
Log the image artifact to V3IO#
The cat image that you downloaded must be logged to V3IO so that the model can access it.
artifact = project.log_artifact("image_artifact", local_path="cat.jpg", upload=True)
v3io_path = artifact.get_target_path()
Create the model artifact#
# Log the model artifact
model_artifact = project.log_model(
mlrun_model_name,
model_url=model_url,
default_config={"top_k": 2},
)
print(f"Model artifact created: {model_artifact}")
Model artifact created: {'metadata': {'tree': '3670dd4f-aa82-4230-ac14-ae4d5bb0f892', 'key': 'sync_invoke_model', 'uid': '9c848f2fb9a768c0d7693de8eb2daab4c78ec6f7', 'project': 'hf-image-detection', 'iter': 0}, 'status': {'state': 'created'}, 'spec': {'has_children': False, 'license': '', 'parameters': {'default_config': {'top_k': 2}}, 'framework': '', 'producer': {'kind': 'project', 'name': 'hf-image-detection', 'tag': '3670dd4f-aa82-4230-ac14-ae4d5bb0f892', 'owner': 'admin'}, 'model_url': 'ds://huggingface_image_detection/microsoft/resnet-50', 'model_file': '', 'db_key': 'sync_invoke_model'}, 'kind': 'model'}
Create and configure the serving function#
First configure the serving function:
%%writefile image_classification_model.py
from typing import Any
import mlrun.serving
from mlrun.datastore.model_provider.model_provider import ModelProvider
from mlrun.serving.states import LLModel # noqa
class ImageDetectionModel(mlrun.serving.states.Model):
"""Custom MLRun model wrapper for Hugging Face image classification that loads an image
from a given path and returns predictions via HuggingFaceProvider."""
def predict(self, body: Any, **kwargs) -> Any:
if isinstance(self.model_provider, ModelProvider):
# Imported here to avoid requiring Pillow in environments where it's not needed
from PIL import Image
dataitem = mlrun.get_dataitem(body["input"])
with dataitem.open("rb") as f:
image = Image.open(f)
image.load() # ensure image is fully read into memory
result = self.model_provider.custom_invoke(
inputs=image,
)
body["result"] = result
return body
Overwriting image_detection_model.py
Now create the serving function, using the configuration you just defined in image_classification_model.py. Read mode about set_function.
# Create serving function
function = project.set_function(
name="huggingface-model-test",
kind="serving",
tag="latest",
image=image,
func="image_classification_model.py",
requirements=[
"--extra-index-url",
"https://download.pytorch.org/whl/cpu",
"torch==2.7.1+cpu",
"transformers==4.53.2",
"pillow~=11.3",
],
)
Set up the serving graph#
The Flow topology is a full graph/DAG. In this example it uses the async engine, which is based on storey.transformations and an asynchronous event loop.
This notebook uses the ModelRunnerStep to run the model as a graph.
graph = function.set_topology("flow", engine="async")
model_runner_step = ModelRunnerStep(name="my_model_runner")
model_runner_step.add_model(
model_class="ImageDetectionModel",
endpoint_name="my_endpoint",
execution_mechanism=execution_mechanism,
model_artifact=model_artifact,
result_path="output",
)
graph.to(model_runner_step).respond()
print("Serving graph configured with dedicated_process execution mechanism")
Serving graph configured with dedicated_process execution mechanism
Deploy the function#
# For larger models, Hugging Face models may require extended resources:
#
# function.spec.resources = {
# "limits": {"cpu": "5", "memory": "30Gi"},
# "requests": {"cpu": "3", "memory": "1Mi"},
# }
# function.spec.max_replicas = (
# 1 # prevents allocating extended resources to multiple replicas
# )
# Deploy the function
print("Deploying function...")
function.deploy()
print("Function deployed successfully!")
Deploying function...
> 2025-09-17 13:59:16,353 [info] Starting remote function deploy
2025-09-17 13:59:16 (info) Deploying function
2025-09-17 13:59:16 (info) Building
2025-09-17 13:59:16 (info) Staging files and preparing base images
2025-09-17 13:59:16 (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-09-17 13:59:16 (info) Building processor image
2025-09-17 14:06:52 (info) Build complete
2025-09-17 14:08:10 (info) Function deploy complete
> 2025-09-17 14:08:18,161 [info] Model endpoint creation task completed with state succeeded
> 2025-09-17 14:08:18,162 [info] Successfully deployed function: {"external_invocation_urls":["hf-image-detection-huggingface-model-test.default-tenant.app.vmdev25.lab.iguazeng.com/"],"internal_invocation_urls":["nuclio-hf-image-detection-huggingface-model-test.default-tenant.svc.cluster.local:8080"]}
Function deployed successfully!
Test the model inference#
# Test the model with the input data
results = function.invoke(
f"v2/models/{mlrun_model_name}/infer",
{"input": v3io_path},
)["result"]
print("Response received:")
print(f"Number of results: {len(results)}")
print(results)
Response received:
Number of results: 2
[{'label': 'tabby, tabby cat', 'score': 0.8296310901641846}, {'label': 'tub, vat', 'score': 0.0689203143119812}]