Using LLM prompt templates and artifacts#

This tutorial illustrates how easy it is to use LLMs and prompt templates, inside a complete workflow using the llm-prompt artifact.

Whenever an LLM-Prompt artifact is used, there MUST be a definition of:

  • What is the prompt template

  • Which LLM is used

  • What the model’s generation configuration is (if not using the default)

The model we are using is gpt-4o-mini from OpenAI with the default configuration (see section 3 for available model params), this case covers using a remote model directly from the configured datasource without having to download it first. We use streamlit to create a chat front-end and deploy it as application runtime.

In this tutorial

!pip install streamlit

Set up the environment#

This section sets up the environment variables required for OpenAI API access, including the base URL and API key.

from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv("ai_gateway.env")

# Validate OpenAI credentials
missing_vars = [
    var for var in ("OPENAI_API_KEY", "OPENAI_BASE_URL") if not os.getenv(var)
]
if missing_vars:
    raise EnvironmentError(
        f"Missing required environment variables: {', '.join(missing_vars)}. "
        "Please ensure they are set in 'ai_gateway.env' or your system environment."
    )

# Set additional configuration
os.environ["OPENAI_MAX_RETRIES"] = "100"

Import mlrun library and initialize the project#

This initializes the MLRun project

%config Completer.use_jedi = False

import mlrun
from mlrun import get_or_create_project

image = "mlrun/mlrun"
project_name = "llm-openai-bot"
project = get_or_create_project(
    project_name, context="./", user_project=True, allow_cross_project=True
)

This section sets up the necessary datastore profiles for time-series database (TSDB) and stream data. which are essential for monitoring model performance and detecting drift. You can use a data store profile to manage datastore credentials. A data store profile holds all the information required to address an external data source, including credentials. The DatastoreProfileV3io is used for V3IO storages while DatastoreProfileTDEngine, DatastoreProfileKafkaSource are used in community edition. Notice that recommended base period is 10 minutes, for demo purposes we set base period to 1 minute.

from src.model_monitoring_utils import enable_model_monitoring

enable_model_monitoring(
    project=project, deploy_histogram_data_drift_app=False, base_period=1
)

Configure OpenAI profile#

This section sets up an openAI profile (credentials and environment variables), and specifies the model. This tutorial uses the model gpt-4o-mini. You can change it to any model you want to use.

from mlrun.datastore.datastore_profile import OpenAIProfile

open_ai_profile = OpenAIProfile(
    name="openai_profile",
    api_key=os.environ.get("OPENAI_API_KEY"),
    organization=os.environ.get("OPENAI_ORG_ID"),
    project=os.environ.get("OPENAI_PROJECT_ID"),
    base_url=os.environ.get("OPENAI_BASE_URL"),
    timeout=os.environ.get("OPENAI_TIMEOUT"),
    max_retries=os.environ.get("OPENAI_MAX_RETRIES"),
)
project.register_datastore_profile(open_ai_profile)
model_url = f"ds://openai_profile/gpt-4o-mini"

Define the prompt templates and the prompt artifact template#

The prompt templates are defined in the src/llm_prompts.py file and include templates for the finance and sport domains.
These templates - finance_prompt_template and sport_prompt_template - are structured to guide the LLM in generating responses based on user queries. Each template includes a system message that sets the context for the LLM and a user message that includes the user's ID, tone, depth level, and question.

Use the prompt_legend parameter to specify how to map input fields to the corresponding prompt placeholders and to provide descriptive metadata for each placeholder.

For reference, see log_llm_prompt() for how the LLM prompt artifacts are logged as part of the project.

from src.llm_prompts import finance_prompt_template, sport_prompt_template

model_artifact = project.log_model(
    "open-ai",
    model_url=model_url,
)
# Create and log the finance prompt template as an LLM prompt artifact, capturing its definition and metadata
finance_llm_prompt_artifact = project.log_llm_prompt(
    "finance_llm_prompt",
    prompt_template=finance_prompt_template,
    model_artifact=model_artifact,
    invocation_config={
        "temperature": 0.7,
        "max_tokens": 256,
    },  # Invocation config will be add to each invocation
    prompt_legend={
        "question": {
            "field": "user_query",
            "description": "The main financial question or request the user is asking.",
        },
        "depth_level": {
            "field": "response_detail_level",
            "description": "Indicates the level of detail in the answer (e.g., basic, intermediate, advanced).",
        },
        "user_id": {
            "field": "customer_id",
            "description": "Unique identifier of the user, useful for personalization and tracking.",
        },
        "tone": {
            "field": "reply_style",
            "description": "The desired style of the response (e.g., formal, friendly, concise, detailed).",
        },
    },
)
sport_llm_prompt_artifact = project.log_llm_prompt(
    "sport_llm_prompt",
    prompt_template=sport_prompt_template,
    model_artifact=model_artifact,
    prompt_legend={
        "question": {
            "field": "user_query",
            "description": "The main sports or fitness-related question from the user.",
        },
        "depth_level": {
            "field": "response_detail_level",
            "description": "Indicates how in-depth the explanation should be (e.g., beginner, intermediate, expert).",
        },
        "user_id": {
            "field": "customer_id",
            "description": "Unique identifier of the user, used for personalization or tracking.",
        },
        "tone": {
            "field": "reply_style",
            "description": "The preferred style or tone of the response (e.g., motivational, professional, casual).",
        },
    },
)

Define the function graph and add ModelRunnerStep with proxy models for the shared model#

ModelRunnerStep is used to run multiple models on each event. When a ModelRunnerStep is included in a function graph, MLRun automatically imports the default language model class (LLModel or mlrun.serving.states.LLModel) during function deployment to wrap the model for handling a LLM prompt-based inference. This class extends the base Model to provide specialized handling for LLMPromptArtifact objects, enabling both synchronous and asynchronous invocation of language models. Follow the class description and implement your own enrichment when custom class is needed.

Use the add_shared_model method to add a shared model to the graph — this model becomes accessible to all ModelRunners in the graph. Use add_shared_model_proxy to add a proxy model to a ModelRunnerStep. A proxy model acts as a lightweight reference to an existing shared model within the graph. It allows each step to reuse the same underlying shared model without duplicating it, while still being able to assign a unique endpoint name, labels, and endpoint creation strategy for tracking or monitoring purposes. This helps maintain efficiency and consistency across multiple model runners that operate on shared models.

from mlrun.serving import ModelRunnerStep
from mlrun.common.schemas.model_monitoring.constants import (
    ModelEndpointCreationStrategy,
)

function = project.set_function(
    name="open-ai-tut",
    kind="serving",
    tag="latest",
    func="./src/LLM_file.py",
    image=image,
    requirements=["openai==1.77.0"],
)
graph = function.set_topology("flow", engine="async")

model_runner_step = ModelRunnerStep(
    name="model_runner_step", model_selector="MyModelSelector"
)

graph.add_shared_model(
    name="shared_llm",
    execution_mechanism="dedicated_process",
    model_class="LLModel",
    model_artifact=model_artifact,
    result_path="outputs",
)

model_runner_step.add_shared_model_proxy(
    endpoint_name="finance_endpoint",
    model_artifact=finance_llm_prompt_artifact,
    shared_model_name="shared_llm",
    model_endpoint_creation_strategy=ModelEndpointCreationStrategy.OVERWRITE,
)
model_runner_step.add_shared_model_proxy(
    endpoint_name="sport_endpoint",
    model_artifact=sport_llm_prompt_artifact,
    shared_model_name="shared_llm",
    model_endpoint_creation_strategy=ModelEndpointCreationStrategy.OVERWRITE,
)

graph.to(model_runner_step).respond()

Enable tracking and deploy the function#

This section enables experiment tracking, deploys the function, and visualizes the workflow of the LLM model using a graph within the Streamlit app. Note: The deploy_endpoint provides the URL to interact with the Streamlit interface.

function.set_tracking(enable_tracking=True)
graph.plot()
deploy_endpoint = function.deploy()

Deploy the model monitoring application#

This section deploys the model monitoring application, which is responsible for monitoring the performance of the LLMs that were deployed in the previous step. It uses the monitoring_application script to define the monitoring logic. The application is deployed using the deploy_function method, which makes it available for monitoring the LLMs in real time.

llm_monitoring_app = project.set_model_monitoring_function(
    func="./src/monitoring_application.py",
    application_class="ModelMonitoringApplication",
    name="llm-monitoring",
    image=image,
)

project.deploy_function(llm_monitoring_app)
import json

payload = {
    "model_name": "sport_endpoint",
    "user_query": "What can you tell me about finance ?",
    "response_detail_level": "basic overview",
    "customer_id": 12345,
    "reply_style": "casual",
}

function.invoke("", body=json.dumps(payload).encode("utf-8"))

Configure the Streamlit chatbot application#

This section sets up a Streamlit app that enables you to interact with the LLMs deployed in the previous steps. The app provides a user interface for selecting different models, tones, and depth levels, and allows users to submit questions to the LLMs.

!tar -czvf frontend_ui.tar.gz ./src/streamlit_ui.py
# Log the streamlit tar file as project artifact and use it as source archive
frontend_source = project.log_artifact(
    "frontend_source", local_path="./frontend_ui.tar.gz", upload=True
)

ui_fn = project.set_function(
    name="frontend",
    kind="application",
    image="mlrun/mlrun",
    requirements=["streamlit==1.49.1"],
)


API_URL = function.get_url()

# Set application spec and envs
ui_fn.set_env("API_URL", API_URL)
ui_fn.with_source_archive(frontend_source.target_path, pull_at_runtime=False)
ui_fn.set_internal_application_port(8000)
ui_fn.spec.command = "streamlit"
ui_fn.spec.args = [
    "run",
    "--server.port",
    "8000",
    "/home/mlrun_code/src/streamlit_ui.py",
]

Launch the Streamlit Chatbot to Interact with the LLM Model#

This section launches the Streamlit chatbot, providing a user-friendly interface for interacting with the deployed LLM models. Users can select the model, tone, and depth level, submit questions, and view responses in a chat-style format.

ui_fn.deploy(with_mlrun=False, create_default_api_gateway=False)
ui_fn.create_api_gateway(
    name="llm-prompt-artifact-ui",
    path="/",
    direct_port_access=True,
    ssl_redirect=True,
    set_as_default=False,
    authentication_mode="none",
)
print(
    f"Use this address to interact with your new chatbot ! https://{ui_fn.status.address}"
)

Model Architecture