Integrating a Hugging Face image classification model with MLRun#

This notebook demonstrates how to set up and test HuggingFace model integration with MLRun, including the Hugging Face profile configuration, creating model artifacts, deploying a serving function with the dedicated_process execution mechanism, and testing the model inference on a sample image. It uses a custom invoke method for non-LLM operations, demonstrated here with image classification.

After completing the steps described here, the model is ready for use with the configured execution mechanism.

In this section

Import the dependencies#

The MLRun imports include:

  • ModelProvider(): abstract base for integrating with external model providers, primarily generative AI (GenAI) services.

  • ModelRunnerStep(): to run multiple models on each event.

  • LLModel(): to wrap a model for handling a LLM (Large Language Model) prompt-based inference.

  • OpenAIProfile(): to create a new model by parsing and validating input data from keyword arguments.

import os
from dotenv import load_dotenv

import mlrun
import mlrun.artifacts
import mlrun.serving
from mlrun.serving import ModelRunnerStep
from mlrun.datastore.datastore_profile import HuggingFaceProfile

load_dotenv("secrets.env")
> 2025-09-17 13:58:56,766 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-09-17 13:59:00,360 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"0.0.0+unstable"}
True

Configure the project#

The MLRun project is a container for all your work on a this gen AI application. Read more about Projects and automation.

First you configure the project, then initialize it a few steps further on.

# Project configuration
project_name = "hf-image-classification"
image = "mlrun/mlrun"
profile_name = "huggingface_image_classification"
image_classification_model = "microsoft/resnet-50"
execution_mechanism = "dedicated_process"
mlrun_model_name = "sync_invoke_model"

Download an image of a cat to be used for testing:

# Download an image
# This notebook uses an Unsplash image of a cat
# Source: https://unsplash.com

Create the project and the Hugging Face profile#

The HuggingFaceProfile is a datastore profile for credentials management. Read more about Data store profiles.

# Initialize the MLRun project
project = mlrun.get_or_create_project(project_name)

# Create the HuggingFace data store profile with environment variables
profile = HuggingFaceProfile(
    name=profile_name,
    task="image-classification",
    token=os.environ.get("HF_TOKEN"),
    device=os.environ.get("HF_DEVICE"),
    device_map=os.environ.get("HF_DEVICE_MAP"),
    trust_remote_code=os.environ.get("HF_TRUST_REMOTE_CODE"),
)

# Register the profile with the project
project.register_datastore_profile(profile)

# Set up model URL
url_prefix = f"ds://{profile_name}/"
model_url = url_prefix + image_classification_model

print(f"Project: {project_name}")
print(f"Profile: {profile_name}")
print(f"Model URL: {model_url}")
print(f"Execution Mechanism: {execution_mechanism}")
> 2025-09-17 13:59:15,958 [info] Created and saved project: {"context":"./","from_template":null,"name":"hf-image-detection","overwrite":false,"save":true}
> 2025-09-17 13:59:15,961 [info] Project created successfully: {"project_name":"hf-image-detection","stored_in_db":true}
Project: hf-image-detection
Profile: huggingface_image_detection
Model URL: ds://huggingface_image_detection/microsoft/resnet-50
Execution Mechanism: dedicated_process

Log the image artifact to V3IO#

The cat image that you downloaded must be logged to V3IO so that the model can access it.

artifact = project.log_artifact("image_artifact", local_path="cat.jpg", upload=True)
v3io_path = artifact.get_target_path()

Create the model artifact#

# Log the model artifact
model_artifact = project.log_model(
    mlrun_model_name,
    model_url=model_url,
    default_config={"top_k": 2},
)

print(f"Model artifact created: {model_artifact}")
Model artifact created: {'metadata': {'tree': '3670dd4f-aa82-4230-ac14-ae4d5bb0f892', 'key': 'sync_invoke_model', 'uid': '9c848f2fb9a768c0d7693de8eb2daab4c78ec6f7', 'project': 'hf-image-detection', 'iter': 0}, 'status': {'state': 'created'}, 'spec': {'has_children': False, 'license': '', 'parameters': {'default_config': {'top_k': 2}}, 'framework': '', 'producer': {'kind': 'project', 'name': 'hf-image-detection', 'tag': '3670dd4f-aa82-4230-ac14-ae4d5bb0f892', 'owner': 'admin'}, 'model_url': 'ds://huggingface_image_detection/microsoft/resnet-50', 'model_file': '', 'db_key': 'sync_invoke_model'}, 'kind': 'model'}

Create and configure the serving function#

First configure the serving function:

%%writefile image_classification_model.py
from typing import Any
import mlrun.serving
from mlrun.datastore.model_provider.model_provider import ModelProvider
from mlrun.serving.states import LLModel  # noqa


class ImageDetectionModel(mlrun.serving.states.Model):
    """Custom MLRun model wrapper for Hugging Face image classification that loads an image
    from a given path and returns predictions via HuggingFaceProvider."""

    def predict(self, body: Any, **kwargs) -> Any:
        if isinstance(self.model_provider, ModelProvider):
            # Imported here to avoid requiring Pillow in environments where it's not needed
            from PIL import Image
            
            dataitem = mlrun.get_dataitem(body["input"])
            with dataitem.open("rb") as f:
                image = Image.open(f)
                image.load()  # ensure image is fully read into memory

            result = self.model_provider.custom_invoke(
                inputs=image,
            )
            body["result"] = result
            return body
Overwriting image_detection_model.py

Now create the serving function, using the configuration you just defined in image_classification_model.py. Read mode about set_function.

# Create serving function
function = project.set_function(
    name="huggingface-model-test",
    kind="serving",
    tag="latest",
    image=image,
    func="image_classification_model.py",
    requirements=[
        "--extra-index-url",
        "https://download.pytorch.org/whl/cpu",
        "torch==2.7.1+cpu",
        "transformers==4.53.2",
        "pillow~=11.3",
    ],
)

Set up the serving graph#

The Flow topology is a full graph/DAG. In this example it uses the async engine, which is based on storey.transformations and an asynchronous event loop. This notebook uses the ModelRunnerStep to run the model as a graph.

graph = function.set_topology("flow", engine="async")
model_runner_step = ModelRunnerStep(name="my_model_runner")
model_runner_step.add_model(
    model_class="ImageDetectionModel",
    endpoint_name="my_endpoint",
    execution_mechanism=execution_mechanism,
    model_artifact=model_artifact,
    result_path="output",
)
graph.to(model_runner_step).respond()

print("Serving graph configured with dedicated_process execution mechanism")
Serving graph configured with dedicated_process execution mechanism

Deploy the function#

# For larger models, Hugging Face models may require extended resources:
#
# function.spec.resources = {
#     "limits": {"cpu": "5", "memory": "30Gi"},
#     "requests": {"cpu": "3", "memory": "1Mi"},
# }
# function.spec.max_replicas = (
#     1  # prevents allocating extended resources to multiple replicas
# )
# Deploy the function
print("Deploying function...")
function.deploy()
print("Function deployed successfully!")
Deploying function...
> 2025-09-17 13:59:16,353 [info] Starting remote function deploy
2025-09-17 13:59:16  (info) Deploying function
2025-09-17 13:59:16  (info) Building
2025-09-17 13:59:16  (info) Staging files and preparing base images
2025-09-17 13:59:16  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-09-17 13:59:16  (info) Building processor image
2025-09-17 14:06:52  (info) Build complete
2025-09-17 14:08:10  (info) Function deploy complete
> 2025-09-17 14:08:18,161 [info] Model endpoint creation task completed with state succeeded
> 2025-09-17 14:08:18,162 [info] Successfully deployed function: {"external_invocation_urls":["hf-image-detection-huggingface-model-test.default-tenant.app.vmdev25.lab.iguazeng.com/"],"internal_invocation_urls":["nuclio-hf-image-detection-huggingface-model-test.default-tenant.svc.cluster.local:8080"]}
Function deployed successfully!

Test the model inference#

# Test the model with the input data
results = function.invoke(
    f"v2/models/{mlrun_model_name}/infer",
    {"input": v3io_path},
)["result"]

print("Response received:")
print(f"Number of results: {len(results)}")
print(results)
Response received:
Number of results: 2
[{'label': 'tabby, tabby cat', 'score': 0.8296310901641846}, {'label': 'tub, vat', 'score': 0.0689203143119812}]