API handler

API handler#

An API handler is a graph step that is automatically prepended to a serving graph when configured. It validates incoming HTTP requests against a set of user-defined endpoints, extracts parameters from path templates, query strings, and the request body, and passes them to the next step in the graph.

Use an API handler to:

Implement industry-defined REST API schemas on your serving graph (for example, the OpenAI chat-completion interface for LLMs — see OpenAI-compatible frontend).
Gate access to specific paths using ALLOW and FORBID rules.

API handlers are active only for HTTP-triggered invocations. When an event arrives through a non-HTTP trigger such as a stream, the API handler is bypassed (the path is always / in that case).

Supported runtimes: Serving functions with an HTTP trigger, and the mock server (local testing).

Overview#

When the GraphServer receives an HTTP event and an API handler is configured, it runs the handler step before forwarding the event to the graph. The handler:

Matches the request's HTTP method and URL path against the configured endpoint list.
If a match is found:
- Extracts path template parameters and query string parameters.
- Optionally applies JSONPath transformations on the request body (input_body_mappings).
- Optionally injects the normalized request path into the event (include_url_info).
- Passes the enriched event to the graph root.
- On the way back, optionally reshapes the graph response via output_body_mappings before returning it to the caller.
If no match is found, the handler fails the request with an appropriate HTTP error (404 for unknown paths, 405 for method not allowed, 403 for FORBID action).

Configuration#

APIHandlerConfig#

APIHandlerConfig holds the full configuration for the API handler. Create one, add endpoint rules and optional body mappings, then attach it to the serving function.

from http import HTTPMethod

from mlrun.common.schemas.serving import APIHandlerAction
from mlrun.runtimes.nuclio.serving import APIHandlerConfig

config = APIHandlerConfig()

# Allow GET /v1/models
config.add_endpoint_handler("/v1/models", HTTPMethod.GET, APIHandlerAction.ALLOW)

# Allow POST /v1/models/{model_name}/predict  (path template)
config.add_endpoint_handler(
    "/v1/models/{model_name}/predict",
    HTTPMethod.POST,
    APIHandlerAction.ALLOW,
    description="Run inference on a named model",
)

# Forbid access to admin endpoints
config.add_endpoint_handler("/admin/*", HTTPMethod.GET, APIHandlerAction.FORBID)

# Attach configuration to the serving function
serving_fn.set_api_handler_config(config)

The configuration is serialized into serving_fn.spec.api_handler_config and is picked up at deployment time.

`add_endpoint_handler` signature#

config.add_endpoint_handler(
    path,  # URL path, e.g. "/v1/chat" or "/v1/chat/*"
    http_method=HTTPMethod.POST,  # HTTPMethod enum or string ("GET", "POST", ...)
    action=APIHandlerAction.ALLOW,  # ALLOW or FORBID
    description=None,  # Optional human-readable description
    input_body_mappings=None,  # BodyMappings instance (see Body mapping section)
    output_body_mappings=None,  # BodyMappings instance (see Output body mapping section)
)

To remove an endpoint:

config.remove_endpoint_handler("/v1/models", HTTPMethod.GET)

Path matching rules#

Endpoints are matched in the following priority order:

Exact paths — no wildcards or template parameters, e.g. /v1/models.
Path templates — contain {param} placeholders, e.g. /v1/models/{model_name}/predict. Matched with a pre-compiled regex; insertion order wins when multiple templates match the same path.
Wildcard paths — end with *, e.g. /admin/*. The * must be at the end and appear only once. Matched by prefix; the request path must contain at least one segment after the prefix. Insertion order wins.

Examples:

Configured path	Matches	Does not match
`/v1/models`	`/v1/models`	`/v1/models/gpt`
`/v1/models/{model_name}`	`/v1/models/gpt`, `/v1/models/bert`	`/v1/models`
`/v1/*`	`/v1/chat`, `/v1/models/gpt`	`/v1`

Extracting request parameters#

When the handler allows a request, it can extract parameters from three sources:

Source	How to configure	Available as
Path template parameters	`{param}` in the path pattern	keyword argument
Query string parameters	automatic	keyword argument
Request body fields	`BodyMappings` (JSONPath, see below)	keyword argument

The extracted parameters are passed to the next step as keyword arguments. If the same parameter name appears in more than one source, the request fails with an error (400). Conflicts between body_map and path template parameters are detected at setup time.

Parameters are always forwarded to the next step. When any parameters are extracted or include_url_info is enabled, they are collected into a dict and passed as keyword arguments. Otherwise, the original event body is forwarded unchanged.

Body mapping (`BodyMappings`)#

BodyMappings maps destination parameter names to JSONPath source expressions. Each endpoint has its own BodyMappings instance passed via input_body_mappings. The request body must be a JSON dict when body mappings are configured.

from mlrun.runtimes.nuclio.serving import BodyMappings

# Build a body mapping for a specific endpoint
bm = BodyMappings()
bm.add_mapping(destination_path="model_name", source_json_path="$.model")
bm.add_mapping(destination_path="inputs", source_json_path="$.data.inputs")

# Multiple matches (e.g. all book titles) return a list
bm.add_mapping(
    destination_path="titles", source_json_path="$['store']['book'][*]['title']"
)

# Attach the mapping to the endpoint
config.add_endpoint_handler(
    "/v1/predict",
    HTTPMethod.POST,
    APIHandlerAction.ALLOW,
    input_body_mappings=bm,
)

add_mapping parameters:

Parameter	Description
`destination_path`	Name of the keyword argument passed to the next step.
`source_json_path`	JSONPath expression evaluated against the request body dict.
`mandatory`	If `True` (default `False`), a missing field fails the request with HTTP 422 Unprocessable Entity.

Rules:

A single JSONPath match → the value is returned as-is.
Multiple matches → a list of values is returned.
No match on a mandatory field → the request fails with HTTP 422 (Unprocessable Entity).
No match on an optional field → the parameter is silently omitted.
Non-dict body with only optional mappings → mappings are silently skipped. With any mandatory mapping → HTTP 422 (Unprocessable Entity).
Calling add_mapping with a duplicate destination_path or source_json_path overwrites the existing entry and logs a warning.

To remove a mapping by destination path: bm.remove_mapping("model_name") — where "model_name" is the destination_path.

Output body mapping (`output_body_mappings`)#

The same BodyMappings class also reshapes the response body sent back to the caller, attached via output_body_mappings on add_endpoint_handler. JSONPath syntax is identical to input mapping, applied in reverse: source_json_path extracts from the graph response, destination_path names the field in the response returned to the caller.

from mlrun.runtimes.nuclio.serving import BodyMappings

out_bm = BodyMappings()
out_bm.add_mapping(
    destination_path="prediction", source_json_path="$.label", mandatory=True
)
out_bm.add_mapping(destination_path="score", source_json_path="$.confidence")

config.add_endpoint_handler(
    "/v1/predict",
    HTTPMethod.POST,
    APIHandlerAction.ALLOW,
    output_body_mappings=out_bm,
)

Differences from input body mapping:

Source = graph response; destination = field in the response sent to the caller.
No match on an optional field → emitted as None (input: silently omitted).
Non-2xx responses skip output mapping entirely — the original body and status code pass through. See Custom HTTP status code.

Mandatory-field handling (HTTP 422 on missing) and hierarchical merging behave the same as input — see Hierarchical body map merging.

Hierarchical body map merging#

Applies to both input_body_mappings and output_body_mappings. When a request matches multiple endpoints (for example, a wildcard /* and a specific /v1/predict), their body mappings are merged from least specific to most specific. The most specific endpoint wins on conflict:

Same destination — the more specific endpoint's source overwrites the less specific one.
Same source, different destination — the stale destination from the less specific endpoint is removed; only the more specific endpoint's destination is kept.

This allows a wildcard endpoint to define shared defaults while specific endpoints override individual mappings:

# Wildcard: shared defaults for all POST endpoints under /v1/
star_bm = BodyMappings()
star_bm.add_mapping(
    destination_path="model", source_json_path="$.model", mandatory=True
)
star_bm.add_mapping(destination_path="stream", source_json_path="$.stream")
config.add_endpoint_handler(
    "/v1/*", HTTPMethod.POST, APIHandlerAction.ALLOW, input_body_mappings=star_bm
)

# Specific endpoint: inherits "stream" from wildcard, overrides "model" → "model_name"
predict_bm = BodyMappings()
predict_bm.add_mapping(destination_path="model_name", source_json_path="$.model")
predict_bm.add_mapping(destination_path="messages", source_json_path="$.messages")
config.add_endpoint_handler(
    "/v1/chat/completions",
    HTTPMethod.POST,
    APIHandlerAction.ALLOW,
    input_body_mappings=predict_bm,
)
# POST /v1/chat/completions effective mapping:
#   model_name ← $.model   (specific wins; "model" destination from wildcard is dropped)
#   stream     ← $.stream  (inherited from wildcard)
#   messages   ← $.messages (specific only)

URL info (`include_url_info`)#

When include_url_info=True, the handler injects two additional fields into the event:

mlrun_request_path — the normalized, URL-decoded path (without the query string). Decoding matches Flask/FastAPI semantics: an encoded slash (%2F) in a segment becomes indistinguishable from a path separator.
mlrun_request_method — the HTTP method as an uppercase string (e.g. "GET", "DELETE").

Both are passed together so a dispatcher handler can distinguish endpoints that share a path template but differ by method. Query string parameters are always extracted as keyword arguments regardless of this setting.

config = APIHandlerConfig(include_url_info=True)
config.add_endpoint_handler(
    "/v1/chat/completions/{completion_id}", HTTPMethod.GET, APIHandlerAction.ALLOW
)

A GET /v1/chat/completions/abc123?limit=10 request passes the following keyword arguments to the next step:

{
    "completion_id": "abc123",  # from path template
    "limit": "10",  # from query string
    "mlrun_request_path": "/v1/chat/completions/abc123",  # from include_url_info
    "mlrun_request_method": "GET",  # from include_url_info
}

Dispatch by method on a shared path template:

def responses_router(
    body, response_id, mlrun_request_path, mlrun_request_method, **kwargs
):
    if mlrun_request_method == "GET":
        return get_response(response_id)
    if mlrun_request_method == "DELETE":
        return delete_response(response_id)
    raise ValueError(f"unsupported method {mlrun_request_method}")

The handler signature must accept these names (explicitly or via **kwargs); otherwise Python raises TypeError: unexpected keyword argument.

Custom HTTP responses#

A graph handler can control the HTTP response by returning a Response wrapper instead of a plain dict. The runtime preserves status_code, content_type, and headers end-to-end.

There are two ways to construct it:

# A) Use context.Response — picks the right class for the runtime
#    (mlrun.serving.server.Response in MockServer; nuclio_sdk.Response in deployed Nuclio).
class MyStep:
    def do(self, body):
        return self.context.Response(
            body={"error": {"message": "not found"}},
            status_code=404,
            content_type="application/json",
        )


# B) Import the class directly — works the same way; the SDK normalizes it on the way out.
from mlrun.serving.server import Response


def my_handler(body, **kwargs):
    return Response(
        body={"id": "resp_1", "object": "response"},
        status_code=200,
        content_type="application/json",
    )

If you simply return a dict, the runtime treats the response as 200 OK.

When using the API handler with output_body_mappings, the mapping runs only when status_code < 300.

Custom HTTP status code#

A handler that returns a dict produces an HTTP 200 response by default. To return a different status code, the handler has two options:

Raise an mlrun.errors.MLRunHTTPStatusError subclass — for example MLRunNotFoundError returns HTTP 404, MLRunUnprocessableEntityError returns HTTP 422. The response body is plain text with the exception class name and message.
Return Response(body, status_code, ...) — full control over body, status code, content type, and headers.

Interaction with `output_body_mappings`#

output_body_mappings describes the success-shape contract, so the mapping runs only when status_code < 300. Non-2xx responses pass through with their body and status code intact, so the caller sees the original error envelope instead of a mandatory-field-check failure.

If the handler returns a plain dict, the runtime treats the response as 200 OK and runs the output mapping as usual.

How downstream steps receive parameters#

Extracted parameters are passed as keyword arguments to the handler function or do() method. For example, given the endpoint /v1/chat/completions/{completion_id} with include_url_info=True, a GET /v1/chat/completions/abc123?limit=10 request calls the next step as:

def step_handler(
    body,
    completion_id,
    limit,
    mlrun_request_path,
    mlrun_request_method,
    **kwargs,
):
    # body: original request body
    # completion_id="abc123"                            — from path template
    # limit="10"                                        — from query string
    # mlrun_request_path="/v1/chat/completions/abc123"  — from include_url_info
    # mlrun_request_method="GET"                        — from include_url_info
    ...


class MyStep:
    def do(
        self,
        body,
        completion_id,
        limit,
        mlrun_request_path,
        mlrun_request_method,
        **kwargs,
    ): ...

Complete example#

The following example configures a serving function with an API handler that supports an OpenAI-compatible POST /v1/chat/completions endpoint. It extracts the model field and messages array from the request body, makes the request path available to downstream steps, and reshapes the handler's response (filtering and renaming fields) before returning it to the caller.

import mlrun
from http import HTTPMethod

from mlrun.common.schemas.serving import APIHandlerAction
from mlrun.runtimes.nuclio.serving import APIHandlerConfig, BodyMappings

project = mlrun.get_or_create_project("chat-serving", context="./")

serving_fn = project.set_function(
    name="chat-completion",
    kind="serving",
    image="mlrun/mlrun",
)


# --- Define the graph step that processes chat completions ---
# model_name and messages are passed as keyword arguments by the API handler
def chat_handler(model_name, messages, mlrun_request_path, **kwargs):
    reply = f"Received {len(messages)} message(s) for model '{model_name}'"
    return {
        "reply": reply,
        "path": mlrun_request_path,
        "internal_debug": "trace-id-123",  # will be filtered by output_body_mappings
    }


graph = serving_fn.set_topology("flow", engine="sync")
graph.to(name="chat", handler=chat_handler).respond()

# --- Configure the API handler ---
config = APIHandlerConfig(include_url_info=True)

# Extract model and messages from the request body using JSONPath
bm = BodyMappings()
bm.add_mapping(
    destination_path="model_name", source_json_path="$.model", mandatory=True
)
bm.add_mapping(
    destination_path="messages", source_json_path="$.messages", mandatory=True
)

# Reshape the handler's response on the way out — keep only `reply` and rename
# `path` to `request_path`. `internal_debug` is dropped (not declared).
output_bm = BodyMappings()
output_bm.add_mapping(
    destination_path="reply", source_json_path="$.reply", mandatory=True
)
output_bm.add_mapping(destination_path="request_path", source_json_path="$.path")

# Allow the OpenAI-compatible chat completion endpoint
config.add_endpoint_handler(
    "/v1/chat/completions",
    HTTPMethod.POST,
    APIHandlerAction.ALLOW,
    description="OpenAI-compatible chat completion",
    input_body_mappings=bm,
    output_body_mappings=output_bm,
)

# Block all admin paths
config.add_endpoint_handler("/admin/*", HTTPMethod.GET, APIHandlerAction.FORBID)
config.add_endpoint_handler("/admin/*", HTTPMethod.POST, APIHandlerAction.FORBID)

# Attach to the serving function
serving_fn.set_api_handler_config(config)

# --- Test locally with the mock server ---
server = serving_fn.to_mock_server()

# Allowed endpoint:
#   input bm extracts model_name + messages → chat_handler receives them as kwargs.
#   output bm filters the handler's return: keeps `reply`, renames `path` →
#   `request_path`, and drops `internal_debug` (not declared).
result = server.test(
    "/v1/chat/completions",
    method="POST",
    body={"model": "my-llm", "messages": [{"role": "user", "content": "Hello"}]},
)
# result: {"reply": "Received 1 message(s) for model 'my-llm'", "request_path": "/v1/chat/completions"}

# Forbidden endpoint: raises 403
try:
    server.test("/admin/settings", method="GET", body={})
except Exception as e:
    print(e)  # Access forbidden

# Unknown endpoint: raises 404
try:
    server.test("/unknown", method="GET", body={})
except Exception as e:
    print(e)  # Endpoint not found