OpenAI-compatible frontend#

set_openai_frontend() configures a serving function with OpenAI-compatible REST endpoints in a single call. It registers the path templates, input body mappings (extracting fields from the request), and output body mappings (filtering/reshaping the response) for each supported operation group — letting the official OpenAI Python SDK invoke an MLRun serving function as if it were OpenAI.

Built on top of the API handlerset_openai_frontend() produces an APIHandlerConfig under the hood. You can also configure additional endpoints alongside the OpenAI ones.

Supported endpoint groups#

Selected via the OpenAIEndpoint enum:

Value

OpenAI operation group

Paths registered

OpenAIEndpoint.CHAT_COMPLETIONS

/chat/completions

POST /chat/completions, GET /chat/completions, GET / POST / DELETE /chat/completions/{completion_id}, GET /chat/completions/{completion_id}/messages

OpenAIEndpoint.RESPONSES

/responses

POST /responses, GET / DELETE /responses/{response_id}, GET /responses/{response_id}/input_items, POST /responses/input_tokens, POST /responses/{response_id}/cancel, POST /responses/compact

Each group ships pre-built input and output body mappings that:

  • Extract the documented OpenAI request fields from the request body as keyword arguments (e.g. model, messages, input, instructions, …).

  • Filter the handler's response down to the OpenAI response contract so the SDK's typed deserializers (ChatCompletion, Response, …) accept it.

Mandatory fields (per the OpenAI spec) are enforced via mandatory=True on the relevant mappings — missing fields fail the request with HTTP 422 (Unprocessable Entity).

Quick start#

import mlrun
from mlrun.serving.openai_mappings import OpenAIEndpoint

fn = mlrun.code_to_function(
    name="openai-frontend",
    kind="serving",
    filename="openai_handler.py",
    image="mlrun/mlrun",
)

# Register every supported endpoint group, no prefix
fn.set_openai_frontend()

# Or: only the responses group, behind the standard /v1 prefix
fn.set_openai_frontend([OpenAIEndpoint.RESPONSES], prefix="/v1")

Path prefix#

The prefix= argument prepends a path segment to every registered endpoint. Most OpenAI clients send requests under /v1/ — use prefix="/v1" to match:

fn.set_openai_frontend(prefix="/v1")
# Registers /v1/chat/completions, /v1/responses, /v1/responses/{response_id}, …

The prefix is optional; if provided, it must start with / (e.g. /v1, /v2) — a missing leading slash raises MLRunInvalidArgumentError.

Dispatcher handler (include_url_info=True)#

A flow graph terminates in a single handler, so to serve multiple endpoint groups (or multiple methods on the same path template — e.g. GET vs DELETE on /responses/{response_id}) you add a small dispatcher step that routes by request path and HTTP method. Enable include_url_info=True on the APIHandlerConfig so mlrun_request_path and mlrun_request_method are injected into the handler:

from mlrun.serving.endpoint_mapping import APIHandlerConfig

fn.set_api_handler_config(APIHandlerConfig(include_url_info=True))
fn.set_openai_frontend()  # merged into the existing config

graph = fn.set_topology("flow", engine="sync")
graph.to(...)

The next step receives mlrun_request_path and mlrun_request_method as keyword arguments (alongside any kwargs from input body mappings, path templates, and query strings). See URL info for the full include_url_info contract.

Invoking from the OpenAI Python SDK#

The endpoint paths, input/output body mappings, and mandatory-field expectations registered by set_openai_frontend() are kept in sync with the openai SDK version pinned in MLRun's dev-requirements.txt. Other SDK versions may introduce request/response fields that the bundled mappings don't yet cover.

Point the SDK at the deployed function's URL — no other changes needed:

import httpx
import openai

import mlrun

client = openai.OpenAI(
    base_url=fn.get_url(),
    api_key="dummy",  # MLRun doesn't enforce a key; pass anything non-empty
    http_client=httpx.Client(verify=mlrun.mlconf.httpdb.http.verify),
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
)

The SDK deserializes the response into a typed ChatCompletion object — the output body mappings filter extra fields and enforce mandatory ones to match what the SDK expects.

Custom endpoints alongside#

set_openai_frontend() is additive — it merges its endpoints into the existing APIHandlerConfig. You can add custom endpoints (e.g. health checks, admin) on top:

from http import HTTPMethod
from mlrun.common.schemas.serving import APIHandlerAction

fn.set_openai_frontend(prefix="/v1")
config = APIHandlerConfig.from_dict(fn.spec.api_handler_config)
config.add_endpoint_handler("/health", HTTPMethod.GET, APIHandlerAction.ALLOW)
fn.set_api_handler_config(config)

Returning errors#

Raise an mlrun.errors.MLRunHTTPStatusError subclass to surface a precise HTTP status code (e.g. MLRunNotFoundError → 404). To return an OpenAI-shaped error envelope (so the SDK deserializes it as a typed error), return Response(body={"error": {...}}, status_code=4xx, content_type="application/json") — the output body mapping is skipped on non-2xx, so the body and status code pass through to the caller intact. See Returning a custom HTTP status code for details.