Integrating an OpenAI LLM with MLRun

Integrating an OpenAI LLM with MLRun#

This notebook demonstrates how to set up and test an OpenAI model integration with MLRun, including profile setup, model deployment, and inference testing. After running this notebook, the model is now ready for production use with the configured execution mechanism and token limits.

This notebook uses an OpenAI Large Language Model. You create a connection to the model, query the model, and receive responses.

You could run a similar flow on a Hugging Face LLM, but it would need greater resources since Hugging Face models need to be downloaded.

In this section

Import the dependencies
Configure the project and test the data
Set up the project and the OpenAI profile
Create/log the model artifact
Create/log the LLM Prompt artifact
Create the serving function
Set up the serving graph
Deploy the function
Test the model inference
Analyze the token usage

See also

Remote models

Import the dependencies#

The MLRun imports include:

ModelProvider(): abstract base for integrating with external model providers, primarily generative AI (GenAI) services.
ModelRunnerStep(): to run multiple models on each event.
LLModel(): to wrap a model for handling a LLM (Large Language Model) prompt-based inference.
OpenAIProfile(): to create a new model by parsing and validating input data from keyword arguments.

import os
import json
from dotenv import load_dotenv
import mlrun.serving
from mlrun.datastore.model_provider.model_provider import UsageResponseKeys
from mlrun.serving import ModelRunnerStep
from mlrun.datastore.datastore_profile import OpenAIProfile

load_dotenv("secrets.env")

> 2025-10-29 16:59:58,732 [warning] Failed resolving version info. Ignoring and using defaults
> 2025-10-29 17:00:01,282 [warning] Server or client version is unstable. Assuming compatible: {"client_version":"0.0.0+unstable","server_version":"0.0.0+unstable"}

True

Configure the project and test the data#

The project uses an OpenAI LLM (gpt-4o-mini). The execution process value of dedicated_process is used for large models that are CPU/GPU-intensive tasks that also require significant runnable-specific initialization.

# Project configuration
project_name = "openai-project"
image = "mlrun/mlrun"
profile_name = "my_openai_profile"
basic_llm_model = "gpt-4o-mini"
execution_mechanism = "dedicated_process"
mlrun_model_name = "sync_invoke_model"

# Test input data
INPUT_DATA = {
    "question": "What is the capital of France? Answer with one word first, then provide a historical overview.",
    "depth_level": "detailed",
    "persona": "teacher",
    "tone": "casual",
}

EXPECTED_RESULT = "paris"

PROMPT_TEMPLATE = [
    {
        "role": "user",
        "content": "{question}. Explain {depth_level} as a {persona} in {tone} style.",
    }
]

Set up the project and the OpenAI profile#

The MLRun project is a container for all your work on a this gen AI application. Read more about Projects and automation.

The OpenAIProfile is a datastore profile for credentials management. Read more about Data store profiles.

# Initialize MLRun project
project = mlrun.get_or_create_project(project_name)

# Create an OpenAI profile with environment variables
profile = OpenAIProfile(
    name=profile_name,
    api_key=os.environ.get("OPENAI_API_KEY"),
    organization=os.environ.get("OPENAI_ORG_ID"),
    project=os.environ.get("OPENAI_PROJECT_ID"),
    base_url=os.environ.get("OPENAI_BASE_URL"),
    timeout=os.environ.get("OPENAI_TIMEOUT"),
    max_retries=os.environ.get("OPENAI_MAX_RETRIES"),
)

# Register the profile with the project
project.register_datastore_profile(profile)

# Set up the LLM URL
url_prefix = f"ds://{profile_name}/"
model_url = url_prefix + basic_llm_model

print(f"Project: {project_name}")
print(f"Profile: {profile_name}")
print(f"Model URL: {model_url}")
print(f"Execution Mechanism: {execution_mechanism}")

> 2025-10-29 17:00:16,954 [info] Created and saved project: {"context":"./","from_template":null,"name":"openai-project","overwrite":false,"save":true}
> 2025-10-29 17:00:16,957 [info] Project created successfully: {"project_name":"openai-project","stored_in_db":true}
Project: openai-project
Profile: my_openai_profile
Model URL: ds://my_openai_profile/gpt-4o-mini
Execution Mechanism: dedicated_process

Create/log the model artifact#

This step logs the model artifact. See full details in log_model.

# Log the model artifact
model_artifact = project.log_model(
    mlrun_model_name,
    model_url=model_url,
    default_config={"max_tokens": 100},
)

print(f"Model artifact created: {model_artifact}")

Model artifact created: {'spec': {'parameters': {'default_config': {'max_tokens': 100}}, 'model_url': 'ds://my_openai_profile/gpt-4o-mini', 'has_children': False, 'framework': '', 'db_key': 'sync_invoke_model', 'license': '', 'model_file': '', 'producer': {'kind': 'project', 'name': 'openai-project', 'tag': '3afd90e8-46c8-47f2-90b5-cf372c3bca1b', 'owner': 'admin'}}, 'status': {'state': 'created'}, 'kind': 'model', 'metadata': {'key': 'sync_invoke_model', 'tree': '3afd90e8-46c8-47f2-90b5-cf372c3bca1b', 'project': 'openai-project', 'iter': 0, 'uid': 'aa608f068257e4967dc62a78c58aef661f349031'}}

Create/log the LLM prompt artifact#

log_llm_prompt creates and logs an LLMPromptArtifact that captures a prompt definition for large language model (LLM) interactions.

# Log the LLM prompt artifact
llm_prompt_artifact = project.log_llm_prompt(
    "my_llm_prompt",
    prompt_template=PROMPT_TEMPLATE,
    model_artifact=model_artifact,
    prompt_legend={
        "question": {"field": None, "description": None},
        "depth_level": {"field": None, "description": None},
        "persona": {"field": None, "description": None},
        "tone": {"field": None, "description": None},
    },
)

print(f"LLM prompt artifact created: {llm_prompt_artifact}")

LLM prompt artifact created: {'spec': {'target_path': 'v3io:///projects/openai-project/artifacts/my_llm_prompt.json', 'prompt_template': [{'role': 'user', 'content': '{question}. Explain {depth_level} as a {persona} in {tone} style.'}], 'size': 98, 'has_children': False, 'parent_uri': 'store://models/openai-project/sync_invoke_model#0@3afd90e8-46c8-47f2-90b5-cf372c3bca1b^aa608f068257e4967dc62a78c58aef661f349031', 'format': 'json', 'db_key': 'my_llm_prompt', 'license': '', 'producer': {'kind': 'project', 'name': 'openai-project', 'tag': 'c3cd9d9c-9dd9-4ca9-bc29-b9836db078cb', 'owner': 'admin'}, 'prompt_legend': {'question': {'field': 'question', 'description': None}, 'depth_level': {'field': 'depth_level', 'description': None}, 'persona': {'field': 'persona', 'description': None}, 'tone': {'field': 'tone', 'description': None}}}, 'status': {'state': 'created'}, 'kind': llm-prompt, 'metadata': {'key': 'my_llm_prompt', 'hash': '24312969d4fde40522a147a1728bfe0fb5fb7755', 'tree': 'c3cd9d9c-9dd9-4ca9-bc29-b9836db078cb', 'project': 'openai-project', 'iter': 0, 'uid': 'b560cd9f8ef1e209ac2b8407eeb0dc25c2628b9f'}}

Create the serving function#

The serving type function is used for deploying models and higher-level real-time Graphs (DAG) over one or more Nuclio functions. See more details in serving graphs and set_function.

%%writefile models.py
from mlrun.serving.states import LLModel  # noqa

Overwriting models.py

function = project.set_function(
    func="models.py",
    name="openai-model-test",
    kind="serving",
    image=image,
    requirements=["openai==1.77.0"],
)
print("Serving function created")

Serving function created

Set up the serving graph#

The Flow topology and engines topology is a full graph/DAG. In this example it uses the async engine, which is based on storey.transformations and an asynchronous event loop. This notebook uses the ModelRunnerStep to run the model as a graph.

graph = function.set_topology("flow", engine="async")
model_runner_step = ModelRunnerStep(name="my-model-runner")
model_runner_step.add_model(
    endpoint_name="my_endpoint",
    model_class="LLModel",
    execution_mechanism=execution_mechanism,
    model_artifact=llm_prompt_artifact,
    result_path="output",
)
graph.to(model_runner_step).respond()

print("Serving graph configured with dedicated_process execution mechanism")

Serving graph configured with dedicated_process execution mechanism

Deploy the function#

# Deploy the function
print("Deploying function...")
function.deploy()
print("Function deployed successfully!")

Deploying function...
> 2025-10-29 17:00:17,230 [info] Starting remote function deploy
2025-10-29 17:00:17  (info) Deploying function
2025-10-29 17:00:17  (info) Building
2025-10-29 17:00:17  (info) Staging files and preparing base images
2025-10-29 17:00:17  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2025-10-29 17:00:17  (info) Building processor image
2025-10-29 17:02:22  (info) Build complete
2025-10-29 17:02:52  (info) Function deploy complete
> 2025-10-29 17:03:02,508 [info] Model endpoint creation task completed with state succeeded
> 2025-10-29 17:03:02,509 [info] Successfully deployed function: {"external_invocation_urls":["openai-project-openai-model-test.default-tenant.app.vmdev68.lab.iguazeng.com/"],"internal_invocation_urls":["nuclio-openai-project-openai-model-test.default-tenant.svc.cluster.local:8080"]}
Function deployed successfully!

Test the model inference#

# Test the model with the input data
response = function.invoke(
    f"v2/models/{mlrun_model_name}/infer",
    json.dumps(INPUT_DATA),
)["output"]

print("Response received:")
print(f"Response length: {len(response)}")
print("\nResponse structure:")
for key in response.keys():
    print(f"  - {key}")

Response received:
Response length: 2

Response structure:
  - answer
  - usage

# Extract and display the answer
answer = response[UsageResponseKeys.ANSWER]
print("Answer:")
print(answer)
print(f"\nExpected keyword: {EXPECTED_RESULT}")
print(f"Contains expected result: {EXPECTED_RESULT in answer.lower()}")

Answer:
Paris.

Alright, let's dive into the historical journey of Paris! 

Paris, known as the "City of Light," has a history that stretches back over 2,000 years. It all began with a group of people called the Parisii, a Celtic tribe that settled on the banks of the Seine River around the 3rd century BC. They established a small fishing village that gradually developed into a bustling trade center.

By the 1st century BC, the Romans took notice of this growing

Expected keyword: paris
Contains expected result: True

Analyze the token usage#

stats = response[UsageResponseKeys.USAGE]

print("Token Analysis:")
print(f"Completion tokens (API): {stats['completion_tokens']}")
print(f"Prompt tokens: {stats['prompt_tokens']}")
print(f"Total tokens: {stats['total_tokens']}")

Token Analysis:
Completion tokens (API): 100
Prompt tokens: 35
Total tokens: 135