mlrun.serving

mlrun.serving#

class mlrun.serving.states.BaseStep(name: str | None = None, after: list | None = None, shape: str | None = None, max_iterations: int | None = None)[source]#

cycle_to(step_names: str | list[str])[source]#

create a cycle in the graph to the specified step names

Example:

# Create a cycle from 'step3' to 'step1':
graph.to('step1')                 .to('step2')                 .to('step3')                 .cycle_to(['step1'])  # creates a cycle from step3 to step1

Parameters:: step_names -- list of step names to create a cycle to (for cyclic graphs)

error_handler(name: str | None = None, class_name=None, handler=None, before=None, function=None, full_event: bool | None = None, input_path: str | None = None, result_path: str | None = None, **class_args)[source]#

set error handler on a step or the entire graph (to be executed on failure/raise)

When setting the error_handler on the graph object, the graph completes after the error handler execution.

Example:

# Set an 'error_catcher' step as the error_handler of the 'raise' step:
# in case of error/raise in 'raise' step, the handle_error will be run. After that,
# the 'echo' step will be run.

graph = function.set_topology('flow', engine='async')
graph.to(name='raise', handler='raising_step')
    .error_handler(name='error_catcher', handler='handle_error', full_event=True, before='echo')
graph.add_step(name="echo", handler='echo', after="raise").respond()

Parameters:

name -- unique name (and path) for the error handler step, default is class name
class_name -- class name or step object to build the step from. The error handler step is derived from task step (ie no router/queue functionally)
handler -- class/function handler to invoke on run/event
before -- string or list of next step(s) names that will run after this step. The before param must not specify upstream steps as it will cause a loop. If before is not specified, the graph will complete after the error handler execution.
function -- function this step should run in
full_event -- this step accepts the full event (not just the body)
input_path -- selects the key/path in the event to use as input to the step. This requires that the event body will behave like a dict, for example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means the step will receive 7 as input
result_path -- selects the key/path in the event to write the results to. This requires that the event body will behave like a dict, for example: event: {"x": 5} , result_path="y" means the output of the step will be written to event["y"] resulting in {"x": 5, "y": <result>}
class_args -- class init arguments

set_flow(steps: list[str | StepToDict | dict[str, Any]], force: bool = False)[source]#

Set list of steps as downstream from this step, in the order specified. This will overwrite any existing downstream steps.

Parameters:

steps -- list of steps to follow this one
force -- whether to overwrite existing downstream steps. If False, this method will fail if any downstream steps have already been defined. Defaults to False.

Returns:

the last step added to the flow

Example:

# The code below sets the downstream nodes of step1 by using a list of steps (provided to
# `set_flow()`) and a
# single step (provided to `to()`), resulting in the graph (step1 ->
# step2 -> step3 -> step4).
# Notice that using `force=True` is required in case step1 already had downstream nodes
# (e.g. if the existing
# graph is step1 -> step2_old) and that following the execution of this code the existing downstream steps
# are removed. If the intention is to split the graph (and not to overwrite), use `to()`.

step1.set_flow(
    [
        dict(name="step2", handler="step2_handler"),
        dict(name="step3", class_name="Step3Class"),
    ],
    force=True,
).to(dict(name="step4", class_name="Step4Class"))

add a step right after this step and return the new step

Example:

# a 4-step pipeline ending with a stream:
graph.to('URLDownloader')
     .to('ToParagraphs')
     .to(name='to_json', handler='json.dumps')
     .to('>>', 'to_v3io', path=stream_path)

Parameters:

class_name -- Class name or step object to build the step from. For router steps the class name should start with *. For queue/stream step the class should be >> or $queue.
name -- Unique name (and path) for the child step, default is class name
handler -- Class/function handler to invoke on run/event
graph_shape -- graphviz shape name
function -- Function this step should run in
full_event -- This step accepts the full event (not just body)
input_path -- Selects the key/path in the event to use as input to the step. This requires that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means the step will receive 7 as input
result_path -- Selects the key/path in the event to write the results to. This requires that the event body will behave like a dict, example: event: {"x": 5} , result_path="y" means the output of the step will be written to event["y"] resulting in {"x": 5, "y": <result>}
model_endpoint_creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry; otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.
cycle_to -- list of step names to create a cycle to (for cyclic graphs)
max_iterations -- maximum number of iterations for this step in case of a cycle graph
class_args -- class init arguments

class mlrun.serving.states.RootFlowStep(name=None, steps=None, after: list | None = None, engine=None, final_step=None, allow_cyclic: bool = False, max_iterations: int | None = None)[source]#

root flow step

add_shared_model(name: str, model_class: str | Model, execution_mechanism: str | ParallelExecutionMechanisms, model_artifact: str | ModelArtifact, inputs: list[str] | None = None, outputs: list[str] | None = None, input_path: str | None = None, result_path: str | None = None, override: bool = False, **model_parameters) → None[source]#

Add a shared model to the graph, this model will be available to all the ModelRunners in the graph

Parameters:

name -- Name of the shared model (should be unique in the graph)
model_class -- Model class name. If LLModel is chosen (either by name LLModel or by its full path, e.g. mlrun.serving.states.LLModel), outputs will be overridden with UsageResponseKeys fields.
execution_mechanism --
Parallel execution mechanism to be used to execute this model. Must be one of:
- process_pool:
To run in a separate process from a process pool. This is appropriate for CPU or GPU intensive tasks as they would otherwise block the main process by holding Python's Global Interpreter Lock (GIL).
- dedicated_process:
To run in a separate dedicated process. This is appropriate for CPU or GPU intensive tasks that also require significant Runnable-specific initialization (e.g. a large model).
- thread_pool:
To run in a separate thread. This is appropriate for blocking I/O tasks, as they would otherwise block the main event loop thread.
- asyncio:
To run in an asyncio task. This is appropriate for I/O tasks that use asyncio, allowing the event loop to continue running while waiting for a response.
- shared_executor:
Reuses an external executor (typically managed by the flow or context) to execute the runnable. Should be used only if you have multiple ParallelExecution in the same flow and especially useful when:
- You want to share a heavy resource like a large model loaded onto a GPU.
- You want to centralize task scheduling or coordination for multiple lightweight tasks.
- You aim to minimize overhead from creating new executors or processes/threads per runnable.
The runnable is expected to be pre-initialized and reused across events, enabling efficient use of memory and hardware accelerators.
- naive:
To run in the main event loop. This is appropriate only for trivial computation and/or file I/O. It means that the runnable will not actually be run in parallel to anything else.
model_artifact -- model artifact or mlrun model artifact uri
inputs -- list of the model inputs (e.g. features) ,if provided will override the inputs that been configured in the model artifact, please note that those inputs need to be equal in length and order to the inputs that model_class predict method expects
outputs -- list of the model outputs (e.g. labels) ,if provided will override the outputs that been configured in the model artifact, please note that those outputs need to be equal to the model_class predict method outputs (length, and order)
input_path -- input path inside the user event, expect scopes to be defined by dot notation (e.g "inputs.my_model_inputs"). expects list or dictionary type object in path.
result_path -- result path inside the user output event, expect scopes to be defined by dot notation (e.g "outputs.my_model_outputs") expects list or dictionary type object in path.
override -- bool allow override existing model on the current ModelRunnerStep.
model_parameters -- Parameters for model instantiation

add_step(class_name=None, name=None, handler=None, after=None, before=None, graph_shape=None, function=None, full_event: bool | None = None, input_path: str | None = None, result_path: str | None = None, model_endpoint_creation_strategy: ModelEndpointCreationStrategy | None = None, cycle_to: list[str] | None = None, max_iterations: int | None = None, **class_args)#

add task, queue or router step/class to the flow

use after/before to insert into a specific location

Example:

graph = fn.set_topology("flow", exist_ok=True)
graph.add_step(class_name="Chain", name="s1")
graph.add_step(class_name="Chain", name="s3", after="$prev")
graph.add_step(class_name="Chain", name="s2", after="s1", before="s3")

Parameters:

class_name -- class name or step object to build the step from for router steps the class name should start with '*' for queue/stream step the class should be '>>' or '$queue'
name -- unique name (and path) for the child step, default is class name
handler -- class/function handler to invoke on run/event
after -- the step name this step comes after can use $prev to indicate the last added step
before -- string or list of next step names that will run after this step
graph_shape -- graphviz shape name
function -- function this step should run in
full_event -- this step accepts the full event (not just body)
input_path -- selects the key/path in the event to use as input to the step. This requires that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means the step will receive 7 as input
result_path -- selects the key/path in the event to write the results to. This require that the event body will behave like a dict, example: event: {"x": 5} , result_path="y" means the output of the step will be written to event["y"] resulting in {"x": 5, "y": <result>}
model_endpoint_creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry;otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.
cycle_to -- list of step names to create a cycle to (for cyclic graphs)
max_iterations -- maximum number of iterations for this step in case of a cycle graph
class_args -- class init arguments

configure_shared_pool_resource(max_processes: int | None = None, max_threads: int | None = None, pool_factor: int | None = None) → None[source]#

Configure the resource limits for the shared models in the graph.

Parameters:

max_processes -- Maximum number of processes to spawn (excluding dedicated processes). Defaults to the number of CPUs or 16 if undetectable.
max_threads -- Maximum number of threads to spawn. Defaults to 32.
pool_factor -- Multiplier to scale the number of process/thread workers per runnable. Defaults to 1.

class mlrun.serving.states.ModelRunnerStep(*args, name: str | None = None, model_runner_selector: str | ModelRunnerSelector | None = None, model_runner_selector_parameters: dict | None = None, model_selector: str | ModelSelector | None = None, model_selector_parameters: dict | None = None, raise_exception: bool = True, **kwargs)[source]#

Runs multiple Models on each event.

Example:

model_runner_step = ModelRunnerStep(name="my_model_runner")
model_runner_step.add_model(..., model_class=MyModel(name="my_model"))
graph.to(model_runner_step)

Note when ModelRunnerStep is used in a graph, MLRun automatically imports the default language model class (LLModel) during function deployment.

Note ModelRunnerStep can only be added to a graph that has the flow topology and running with async engine.

Note see configure_pool_resource method documentation for default number of max threads and max processes.

Raises:

ModelRunnerError -- when a model raises an error the ModelRunnerStep will handle it, collect errors and outputs from added models. If raise_exception is True will raise ModelRunnerError. Else will add the error msg as part of the event body mapped by model name if more than one model was added to the ModelRunnerStep

Parameters:

name -- The name of the ModelRunnerStep.
model_runner_selector -- ModelRunnerSelector instance whose select_models() and select_outlets() methods will be used to select models to run on each event and outlets to route the event to.
model_runner_selector_parameters -- Parameters for the model_runner_selector, if model_runner_selector is the class name we will use this param when initializing the selector.
model_selector -- (Deprecated)
model_selector_parameters -- (Deprecated)
raise_exception -- Determines whether to raise ModelRunnerError when one or more models raise an error during execution. If False, errors will be added to the event body.

Add a Model to this ModelRunner.

Parameters:

endpoint_name -- str, will identify the model in the ModelRunnerStep, and assign model endpoint name
model_class -- Model class name. If LLModel is chosen (either by name LLModel or by its full path, e.g. mlrun.serving.states.LLModel), outputs will be overridden with UsageResponseKeys fields.
execution_mechanism --
Parallel execution mechanism to be used to execute this model. Must be one of:
- process_pool: To run in a separate process from a process pool. This is appropriate for CPU or GPU intensive tasks as they would otherwise block the main process by holding Python's Global Interpreter Lock (GIL).
- dedicated_process: To run in a separate dedicated process. This is appropriate for CPU or GPU intensive tasks that also require significant Runnable-specific initialization (e.g. a large model).
- thread_pool: To run in a separate thread. This is appropriate for blocking I/O tasks, as they would otherwise block the main event loop thread.
- asyncio: To run in an asyncio task. This is appropriate for I/O tasks that use asyncio, allowing the event loop to continue running while waiting for a response.
- naive: To run in the main event loop. This is appropriate only for trivial computation and/or file I/O. It means that the runnable will not actually be run in parallel to anything else.
model_artifact -- model artifact or mlrun model artifact uri
labels -- model endpoint labels, should be list of str or mapping of str:str
model_endpoint_creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry; otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.
inputs -- list of the model inputs (e.g. features) ,if provided will override the inputs that been configured in the model artifact, please note that those inputs need to be equal in length and order to the inputs that model_class predict method expects
outputs --
list of the model outputs (e.g. labels) ,if provided will override the outputs that been configured in the model artifact, please note that those outputs need to be equal to the model_class predict method outputs (length, and order)

When using LLModel, the output will be overridden with UsageResponseKeys.fields().
input_path -- when specified selects the key/path in the event to use as model monitoring inputs this require that the event body will behave like a dict, expects scopes to be defined by dot notation (e.g "data.d"). examples: input_path="data.b" event: {"data":{"a": 5, "b": 7}}, means monitored body will be 7. event: {"data":{"a": [5, 9], "b": [7, 8]}} means monitored body will be [7,8]. event: {"data":{"a": "extra_data", "b": {"f0": [1, 2]}}} means monitored body will be {"f0": [1, 2]}. if a list or list of lists is provided, it must follow the order and size defined by the input schema.
result_path -- when specified selects the key/path in the output event to use as model monitoring outputs this require that the output event body will behave like a dict, expects scopes to be defined by dot notation (e.g "data.d"). examples: result_path="out.b" event: {"out":{"a": 5, "b": 7}}, means monitored body will be 7. event: {"out":{"a": [5, 9], "b": [7, 8]}} means monitored body will be [7,8] event: {"out":{"a": "extra_data", "b": {"f0": [1, 2]}}} means monitored body will be {"f0": [1, 2]} if a list or list of lists is provided, it must follow the order and size defined by the output schema.
override -- bool allow override existing model on the current ModelRunnerStep.
model_parameters -- Parameters for model instantiation

add_shared_model_proxy(endpoint_name: str, model_artifact: str | ModelArtifact | LLMPromptArtifact, shared_model_name: str | None = None, labels: list[str] | dict[str, str] | None = None, model_endpoint_creation_strategy: ModelEndpointCreationStrategy | None = ModelEndpointCreationStrategy.INPLACE, override: bool = False) → None[source]#

Add a proxy model to the ModelRunnerStep, which is a proxy for a model that is already defined as shared model within the graph

Parameters:

endpoint_name -- str, will identify the model in the ModelRunnerStep, and assign model endpoint name
model_artifact -- model artifact or mlrun model artifact uri, according to the model artifact we will match the model endpoint to the correct shared model.
shared_model_name -- str, the name of the shared model that is already defined within the graph
labels -- model endpoint labels, should be list of str or mapping of str:str
model_endpoint_creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry; otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.
override -- bool allow override existing model on the current ModelRunnerStep.

Raises:

GraphError -- when the shared model is not found in the root flow step shared models.

configure_pool_resource(max_processes: int | None = None, max_threads: int | None = None, pool_factor: int | None = None) → None[source]#

Configure the resource limits for the shared models in the graph.

Parameters:

max_processes -- Maximum number of processes to spawn (excluding dedicated processes). Defaults to the number of CPUs or 16 if undetectable.
max_threads -- Maximum number of threads to spawn. Defaults to 32.
pool_factor -- Multiplier to scale the number of process/thread workers per runnable. Defaults to 1.

class mlrun.serving.states.Model(name: str, raise_exception: bool = True, artifact_uri: str | None = None, shared_proxy_mapping: dict | None = None, **kwargs)[source]#

load() → None[source]#: Override to load model if needed.

predict(body: Any, **kwargs) → Any[source]#

Override to implement prediction logic. If the logic requires asyncio, override predict_async() instead.

This method may be a generator function to implement streaming. It may also return a generator (without being a generator function itself), in which case is_streaming() should be overridden to return True.

async predict_async(body: Any, **kwargs) → Any[source]#

Override to implement prediction logic if the logic requires asyncio.

This method may be an async generator function to implement streaming. It may also return an async generator (without being an async generator function itself), in which case is_streaming() should be overridden to return True.

class mlrun.serving.states.LLModel(name: str, input_path: str | list[str] | None = None, result_path: str | list[str] | None = None, **kwargs)[source]#

A model wrapper for handling LLM (Large Language Model) prompt-based inference.

This class extends the base Model to provide specialized handling for LLMPromptArtifact objects, enabling both synchronous and asynchronous invocation of language models.

Model Invocation:

The execution of enriched prompts is delegated to the model_provider configured for the model (e.g., Hugging Face or OpenAI).
The model_provider is responsible for sending the prompt to the correct backend API and returning the generated output.
Users can override the predict and predict_async methods to customize the behavior of the model invocation.

Prompt Enrichment Overview:

If an LLMPromptArtifact is found, load its prompt template and fill in placeholders using values from the request body.
If the artifact is not an LLMPromptArtifact, skip formatting and attempt to retrieve messages directly from the request body using the input path.

Simplified Example:

Input body:

{"city": "Paris", "days": 3}

Prompt template in artifact:

[
    {"role": "system", "content": "You are a travel planning assistant."},
    {
        "role": "user",
        "content": "Create a {{days}}-day itinerary for {{city}}.",
    },
]

Result after enrichment:

[
    {"role": "system", "content": "You are a travel planning assistant."},
    {"role": "user", "content": "Create a 3-day itinerary for Paris."},
]

Parameters:

name -- Name of the model.
input_path -- Path in the request body where input data is located.
result_path -- Path in the response body where model outputs and the statistics will be stored.

class mlrun.serving.states.ModelRunnerSelector(**kwargs)[source]#

Strategy for controlling model selection and output routing in ModelRunnerStep.

Subclass this to implement custom logic for agent workflows: - select_models(): Called BEFORE execution to choose which models run - select_outlets(): Called AFTER execution to route output to downstream steps

Return None from either method to use default behavior (all models / all outlets).

Example:

class ToolSelector(ModelRunnerSelector):
    def select_outlets(self, event):
        tool = event.get("tool_call")
        return [tool] if tool else ["final"]

select_models(event: Any, available_models: list[Model]) → list[str] | list[Model] | None[source]#

Called before model execution.

Parameters:

event -- The full event
available_models -- List of available models

Returns the models to execute (by name or Model objects).

select_outlets(event: Any) → list[str] | None[source]#

Called after model execution.

Parameters:: event -- The event body after model execution
Returns:: Returns the downstream outlets to route the event to.

Bases: TaskStep

error execution step, runs a class or handler

kind = 'error_step'#

class mlrun.serving.GraphContext(level='info', logger=None, server=None, nuclio_context: Context | None = None)[source]#

Bases: object

Graph context object

get_param(key: str, default=None)[source]#

get_remote_endpoint(name, external=True)[source]#

return the remote nuclio/serving function http(s) endpoint given its name

Parameters:

name -- the function name/uri in the form [project/]function-name[:tag]
external -- return the external url (returns the external url by default)

get_secret(key: str)[source]#

property project: str#: current project name (for the current function)

property project_obj#

push_error(event, message, source=None, **kwargs)[source]#

property server#

class mlrun.serving.GraphServer(graph=None, parameters=None, load_mode=None, function_uri=None, verbose=False, version=None, functions=None, graph_initializer=None, error_stream=None, track_models=None, secret_sources=None, default_content_type=None, function_name=None, function_tag=None, project=None, model_endpoint_creation_task_name=None, api_handler_config: APIHandlerConfig | None = None)[source]#

Bases: ModelObj

property api_handler_config: APIHandlerConfig | None#

property graph: RootFlowStep | RouterStep#

init_object(namespace)[source]#

init_states(context, namespace, resource_cache: ResourceCache | None = None, logger=None, is_mock=False, monitoring_mock=False, stream_profile: DatastoreProfile | None = None) → None[source]#: for internal use, initialize all steps (recursively)

kind = 'server'#

run(event, context=None, get_body: bool = False, extra_args=None)[source]#

set_current_function(function)[source]#: set which child function this server is currently running on

set_error_stream(error_stream)[source]#: set/initialize the error notification stream

invoke a test event into the server to simulate/test server behavior

example:

server = create_graph_server()
server.add_model("my", class_name=MyModelClass, model_path="{path}", z=100)
print(server.test("my/infer", testdata))

Parameters:

path -- api path, e.g. (/{router.url_prefix}/{model-name}/..) path
body -- message body (dict or json str/bytes)
method -- optional, GET, POST, ..
headers -- optional, request headers, ..
content_type -- optional, http mime type
silent -- don't raise on error responses (when not 20X)
get_body -- return the body as py object (vs serialize response into json)
event_id -- specify the unique event ID (by default a random value will be generated)
trigger -- nuclio trigger info or mlrun.serving.server.MockTrigger class (holds kind and name)
offset -- trigger offset (for streams)
time -- event time Datetime or str, default to now()

wait_for_completion()[source]#: wait for async operation to complete

Bases: TaskStep

monitoring application execution step, runs users class code

kind = 'monitoring_application'#

class mlrun.serving.OTelMetricsExporter(endpoint: str | None = None, insecure: bool | None = None, headers_source: Literal['file', 'project_secret', 'none'] = 'file', project_secret_key: str | None = None, export_interval_millis: int | None = None, **kwargs)[source]#

Bases: OTelMetricsExporter

MLRun serving graph step that exports OTel metrics as a side-effect.

Inherits from storey.OTelMetricsExporter (a pass-through Flow step that forwards each event downstream after recording the metric) and layers MLRun-aware defaults on top:

endpoint defaults to mlrun.mlconf.telemetry.otlp_endpoint — set by the operator on the API server and delivered to the SDK via /client-spec. Pass explicitly to route metrics to a different OTLP receiver.
insecure defaults to mlrun.mlconf.telemetry.insecure.
Headers come from one of three sources controlled by headers_source:
- "file" (default): read from the kubelet-mounted secret at mlrun.common.constants.MLRUN_TELEMETRY_OTLP_HEADERS_PATH. The server-side runtime injector mounts the secret when the function's runtime.spec.mount_otlp_secret=True. One file per header — filename = header name, contents = header value. Used by MLRun's internal Model Monitoring applications.
- "project_secret": read from a single project secret whose value is a JSON dict of {header_name: header_value}. Use this when app authors want to manage their own OTel auth headers without touching the operator's secret.
- "none": no headers. Suitable for in-cluster collectors that don't require authentication.

Example:

flow = function.set_topology("flow", engine="async")
flow.to(name="my_app", class_name="MyApp").to(
    class_name="mlrun.serving.OTelMetricsExporter",
    # endpoint, insecure default from mlconf.telemetry
    headers_source="file",
)

Warning

When headers_source is "file" or "project_secret", headers are resolved eagerly in __init__. Always add the step via the class_name="mlrun.serving.OTelMetricsExporter" form (above) so construction is deferred to function-pod startup, where the secret mount (or project-secret env) actually exists. Instantiating the class directly on the SDK side — e.g. flow.to(OTelMetricsExporter(headers_source="file")) — runs the resolver against a missing mount, silently returns an empty headers dict, and bakes that empty dict into the serialized graph.

The OTLP endpoint is resolved at construction time; if neither passed nor present in mlconf.telemetry, construction raises MLRunRuntimeError. Call mlrun.get_run_db() (or mlrun.get_or_create_project(...)) first in dev contexts so the SDK has synced /client-spec.

Parameters:

endpoint -- OTLP gRPC endpoint URL (e.g. "otel-collector.iguazio .svc.cluster.local:4317"). Defaults to mlrun.mlconf.telemetry.otlp_endpoint.
insecure -- Use a plaintext (non-TLS) gRPC channel. Defaults to mlrun.mlconf.telemetry.insecure.
headers_source -- One of "file", "project_secret", "none". See class docstring above.
project_secret_key -- Required when headers_source="project_secret". The secret's value must be a JSON object whose keys are header names and values are header values.

All remaining keyword arguments are forwarded to storey.OTelMetricsExporter unchanged (flush_mode, export_interval_millis, instrument_type, metric_name_field, value_field, attribute_fields, metrics_field, etc.).

Bases: BaseStep, StepToDict

queue step, implement an async queue or represent a stream

property async_object#

default_shape = 'cds'#

init_object(context, namespace, mode='sync', reset=False, **extra_kwargs)[source]#: init the step class

kind = 'queue'#

run(event, *args, **kwargs)[source]#

add a step right after this step and return the new step

Example:

# a 4-step pipeline ending with a stream:
graph.to('URLDownloader')
     .to('ToParagraphs')
     .to(name='to_json', handler='json.dumps')
     .to('>>', 'to_v3io', path=stream_path)

Parameters:

class_name -- Class name or step object to build the step from. For router steps the class name should start with *. For queue/stream step the class should be >> or $queue.
name -- Unique name (and path) for the child step, default is class name
handler -- Class/function handler to invoke on run/event
graph_shape -- graphviz shape name
function -- Function this step should run in
full_event -- This step accepts the full event (not just body)
input_path -- Selects the key/path in the event to use as input to the step. This requires that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means the step will receive 7 as input
result_path -- Selects the key/path in the event to write the results to. This requires that the event body will behave like a dict, example: event: {"x": 5} , result_path="y" means the output of the step will be written to event["y"] resulting in {"x": 5, "y": <result>}
model_endpoint_creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry; otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.
cycle_to -- list of step names to create a cycle to (for cyclic graphs)
max_iterations -- maximum number of iterations for this step in case of a cycle graph
class_args -- class init arguments

Bases: TaskStep

router step, implement routing logic for running child routes

add_route(key, route=None, class_name=None, handler=None, function=None, creation_strategy: ModelEndpointCreationStrategy = ModelEndpointCreationStrategy.INPLACE, **class_args)[source]#

add child route step or class to the router, if key exists it will be updated

Parameters:

key -- unique name (and route path) for the child step
route -- child step object (Task, ..)
class_name -- class name to build the route step from (when route is not provided)
class_args -- class init arguments
handler -- class handler to invoke on run/event
function -- function this step should run in
creation_strategy --
Strategy for creating or updating the model endpoint:
- overwrite: If model endpoints with the same name exist, delete the latest one; create a new model endpoint entry and set it as latest.
- inplace (default): If model endpoints with the same name exist, update the latest entry;otherwise, create a new entry.
- archive: If model endpoints with the same name exist, preserve them; create a new model endpoint with the same name and set it to latest.

clear_children(routes: list)[source]#: clear child steps (routes)

default_shape = 'doubleoctagon'#

get_children()[source]#: get child steps (routes)

init_object(context, namespace, mode='sync', reset=False, **extra_kwargs)[source]#: init the step class

kind = 'router'#

plot(filename=None, format=None, source=None, **kw)[source]#

plot/save graph using graphviz

Parameters:

filename -- target filepath for the image (None for the notebook)
format -- The output format used for rendering ('pdf', 'png', etc.)
source -- source step to add to the graph
kw -- kwargs passed to graphviz, e.g. rankdir="LR" (see: https://graphviz.org/doc/info/attrs.html)

Returns:

graphviz graph object

property routes#: child routes/steps, traffic is routed to routes based on router logic

Bases: BaseStep

task execution step, runs a class or handler

property async_object#: return the sync or async (storey) class instance

clear_object()[source]#

get_full_class_args(namespace, class_object, **extra_kwargs)[source]#

get_step_class_object(namespace)[source]#

init_object(context, namespace, mode='sync', reset=False, **extra_kwargs)[source]#: init the step class

kind = 'task'#

respond()[source]#

mark this step as the responder.

step output will be returned as the flow result, no other step can follow

run(event, *args, **kwargs)[source]#: run this step, in async flows the run is done through storey

to_dict(fields: list | None = None, exclude: list | None = None, strip: bool = False) → dict[source]#

class mlrun.serving.V2ModelServer(context=None, name: str | None = None, model_path: str | None = None, model=None, protocol=None, input_path: str | None = None, result_path: str | None = None, shard_by_endpoint: bool | None = None, **kwargs)[source]#

Bases: StepToDict

base model serving class (v2), using similar API to KFServing v2 and Triton

The class is initialized automatically by the model server and can run locally as part of a nuclio serverless function, or as part of a real-time pipeline default model url is: /v2/models/<model>[/versions/<ver>]/operation

You need to implement two mandatory methods:: load() - download the model file(s) and load the model into memory predict() - accept request payload and return prediction/inference results

you can override additional methods : preprocess, validate, postprocess, explain you can add custom api endpoint by adding method op_xx(event), will be invoked by calling the <model-url>/xx (operation = xx)

model server classes are subclassed (subclass implements the load() and predict() methods) the subclass can be added to a serving graph or to a model router

defining a sub class:

class MyClass(V2ModelServer):
    def load(self):
        # load and initialize the model and/or other elements
        model_file, extra_data = self.get_model(suffix=".pkl")
        self.model = load(open(model_file, "rb"))

    def predict(self, request):
        events = np.array(request["inputs"])
        dmatrix = xgb.DMatrix(events)
        result: xgb.DMatrix = self.model.predict(dmatrix)
        return {"outputs": result.tolist()}

usage example:

# adding a model to a serving graph using the subclass MyClass
# MyClass will be initialized with the name "my", the model_path, and an arg called my_param
graph = fn.set_topology("router")
fn.add_model(
    "my", class_name="MyClass", model_path="<model-uri>>", my_param=5
)

Parameters:

context -- for internal use (passed in init)
name -- step name
model_path -- model file/dir or artifact path
model -- model object (for local testing)
protocol -- serving API protocol (default "v2")
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
shard_by_endpoint -- whether to use the endpoint as the partition/sharding key when writing to model monitoring stream. Defaults to True.
kwargs -- extra arguments (can be accessed using self.get_param(key))

do_event(event, *args, **kwargs)[source]#: main model event handler method

explain(request: dict) → dict[source]#: model explain operation

get_model(suffix='') -> (<class 'str'>, <class 'dict'>)[source]#

get the model file(s) and metadata from model store

the method returns a path to the model file and the extra data (dict of dataitem objects) it also loads the model metadata into the self.model_spec attribute, allowing direct access to all the model metadata attributes.

get_model is usually used in the model .load() method to init the model .. rubric:: Examples

def load(self):
    model_file, extra_data = self.get_model(suffix=".pkl")
    self.model = load(open(model_file, "rb"))
    categories = extra_data["categories"].as_df()

Parameters:

suffix (str) -- optional, model file suffix (when the model_path is a directory)

Returns:

str -- (local) model file
dict -- extra dataitems dictionary

get_param(key: str, default=None)[source]#: get param by key (specified in the model or the function)

load()[source]#: model loading function, see also .get_model() method

logged_results(request: dict, response: dict, op: str)[source]#

Hook for controlling which results are tracked by the model monitoring

This hook allows controlling which input/output data is logged by the model monitoring. It allows filtering out columns or adding custom values, and can also be used to monitor derived metrics, for example in image classification to calculate and track the RGB values vs the image bitmap.

The request ["inputs"] holds a list of input values/arrays, the response ["outputs"] holds a list of corresponding output values/arrays (the schema of the input/output fields is stored in the model object). This method should return lists of alternative inputs and outputs which will be monitored.

Parameters:

request -- predict/explain request, see model serving docs for details
response -- result from the model predict/explain (after postprocess())
op -- operation (predict/infer or explain)

Returns:

the input and output lists to track

post_init(mode='sync', **kwargs)[source]#: sync/async model loading, for internal use

postprocess(request: dict) → dict[source]#: postprocess, before returning response

predict(request: dict) → list[source]#

model prediction operation

Returns:: list with the model prediction results (can be multi-port) or list of lists for multiple predictions

preprocess(request: dict, operation) → dict[source]#: preprocess the event body before validate and action

set_metric(name: str, value)[source]#: set real time metric (for model monitoring)

validate(request, operation)[source]#: validate the event body (after preprocess)

class mlrun.serving.VotingEnsemble(context=None, name: str | None = None, routes=None, protocol: str | None = None, url_prefix: str | None = None, health_prefix: str | None = None, vote_type: str | None = None, weights: dict[str, float] | None = None, executor_type: ParallelRunnerModes | str = ParallelRunnerModes.thread, format_response_with_col_name_flag: bool = False, prediction_col_name: str = 'prediction', shard_by_endpoint: bool | None = None, **kwargs)[source]#

Bases: ParallelRun

Voting Ensemble

The VotingEnsemble class enables you to apply prediction logic on top of the different added models.

You can use it by calling:

<prefix>/<model>[/versions/<ver>]/operation
Sends the event to the specific <model>[/versions/<ver>]
<prefix>/operation
Sends the event to all models and applies vote(self, event)

The VotingEnsemble applies the following logic: Incoming Event -> Router Preprocessing -> Send to model/s -> Apply all model/s logic (Preprocessing -> Prediction -> Postprocessing) -> Router Voting logic -> Router Postprocessing -> Response

This enables you to do the general preprocessing and postprocessing steps once on the router level, with only model-specific adjustments at the model level.

When enabling model tracking via set_tracking() the ensemble logic predictions will appear with model name as the given VotingEnsemble name or "VotingEnsemble" by default.

Example:

# Define a serving function
# Note: You can point the function to a file containing you own Router or Classifier Model class
#       this basic class supports sklearn based models (with `<model>.predict()` api)
fn = mlrun.code_to_function(name='ensemble',
                            kind='serving',
                            filename='model-server.py'
                            image='mlrun/mlrun')

# Set the router class
# You can set your own classes by simply changing the `class_name`
fn.set_topology(class_name='mlrun.serving.routers.VotingEnsemble')

# Add models
fn.add_model(<model_name>, <model_path>, <model_class_name>)
fn.add_model(<model_name>, <model_path>, <model_class_name>)

How to extend the VotingEnsemble:

The VotingEnsemble applies its logic using the logic(predictions) function. The logic() function receives an array of (# samples, # predictors) which you can then use to apply whatever logic you may need.

If we use this VotingEnsemble as an example, the logic() function tries to figure out whether you are trying to do a classification or a regression prediction by the prediction type or by the given vote_type parameter. Then we apply the appropriate max_vote() or mean_vote() which calculates the actual prediction result and returns it as the VotingEnsemble's prediction.

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
vote_type -- Voting type to be used (from VotingTypes). by default will try to self-deduct upon the first event: - float prediction type: regression - int prediction type: classification
({"<model_name>" (weights A dictionary) -- <model_weight>}) that specified each model weight, if there is a model that didn't appear in the dictionary his weight will be count as a zero. None means that all the models have the same weight.
executor_type -- Parallelism mechanism, out of ParallelRunnerModes, by default threads
format_response_with_col_name_flag --

If this flag is True the model's responses output format is
{id: <id>, model_name: <name>, outputs: {..., prediction: [<predictions>], ...}}

Else
{id: <id>, model_name: <name>, outputs: [<predictions>]}
prediction_col_name -- The dict key for the predictions column in the model's responses output. Example: If the model returns {id: <id>, model_name: <name>, outputs: {..., prediction: [<predictions>], ...}} the prediction_col_name should be prediction. by default, prediction
shard_by_endpoint -- whether to use the endpoint as the partition/sharding key when writing to model monitoring stream. Defaults to True.
kwargs -- extra arguments

do_event(event, *args, **kwargs)[source]#

Handles incoming requests.

Parameters:: event (nuclio.Event) -- Incoming request as a nuclio.Event.
Returns:: Event response after running the requested logic
Return type:: Response

extract_results_from_response(response)[source]#

Extracts the prediction from the model response. This function is used to allow multiple model return types. and allow for easy extension to the user's ensemble and models best practices.

Parameters:: response (Union[List, Dict]) -- The model response's output field.
Returns:: The model's predictions
Return type:: List

logic(predictions: list[list[int | float]], weights: list[float])[source]#

Returns the final prediction of all the models after applying the desire logic

Parameters:

predictions -- The predictions from all models, per event
weights -- models weights in the prediction order

Returns:

List of the resulting voted predictions

post_init(mode='sync', **kwargs)[source]#

validate(request: dict, method: str)[source]#

Validate the event body (after preprocessing)

Parameters:

request -- Event body.
method -- Event method.

Returns:

The given Event body (request).

Raises:

Exception -- If validation failed.

mlrun.serving.create_graph_server(parameters=None, load_mode=None, graph=None, verbose=False, current_function=None, **kwargs) → GraphServer[source]#

create graph server host/emulator for local or test runs

Usage example:

server = create_graph_server(graph=RouterStep(), parameters={})
server.init(None, globals())
server.graph.add_route(
    "my", class_name=MyModelClass, model_path="{path}", z=100
)
print(server.test("/v2/models/my/infer", testdata))

class mlrun.serving.remote.BatchHttpRequests(url: str | None = None, subpath: str | None = None, method: str | None = None, headers: dict | None = None, url_expression: str | None = None, body_expression: str | None = None, return_json: bool = True, input_path: str | None = None, result_path: str | None = None, retries=None, backoff_factor=None, timeout=None, **kwargs)[source]#

class for calling remote endpoints in parallel

sync and async graph step implementation for request/resp to remote service (class shortcut = "$remote") url can be an http(s) url (e.g. https://myservice/path) or an mlrun function uri ([project/]name). alternatively the url_expression can be specified to build the url from the event (e.g. "event['url']").

example pipeline:

function = mlrun.new_function("myfunc", kind="serving")
flow = function.set_topology("flow", engine="async")
flow.to(
    BatchHttpRequests(
        url_expression="event['url']",
        body_expression="event['data']",
        method="POST",
        input_path="req",
        result_path="resp",
    )
).respond()

server = function.to_mock_server()
# request contains a list of elements, each with url and data
request = [{"url": f"{base_url}/{i}", "data": i} for i in range(2)]
resp = server.test(body={"req": request})

Parameters:

url -- http(s) url or function [project/]name to call
subpath -- path (which follows the url)
method -- HTTP method (GET, POST, ..), default to POST
headers -- dictionary with http header values
url_expression -- an expression for getting the url from the event, e.g. "event['url']"
body_expression -- an expression for getting the request body from the event, e.g. "event['data']"
return_json -- indicate the returned value is json, and convert it to a py object
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
retries -- number of retries (in exponential backoff)
backoff_factor -- A backoff factor in seconds to apply between attempts after the second try
timeout -- How long to wait for the server to send data before giving up, float in seconds

post_init(mode='sync', **kwargs)[source]#

class mlrun.serving.remote.MLRunAPIRemoteStep(method: str, path: str, fill_placeholders: bool | None = None, **kwargs)[source]#

Graph step implementation for calling MLRun API endpoints

Parameters:

method -- The HTTP method to use for the request (e.g., "GET", "POST", "PUT", "DELETE"). If not provided, the step will try to use event.method at runtime, and if that is also missing, it defaults to "POST".
path -- API path (e.g. /api/projects)
fill_placeholders -- if True, fill placeholders in the path using event fields (default to False)
kwargs -- other arguments passed to RemoteStep

post_init(mode='sync', **kwargs)[source]#

class mlrun.serving.remote.RemoteFunctionStep(fn: RemoteRuntime | str | None = None, project_name: str = '', **kwargs)[source]#

Graph step implementation for invoking functions remotely.

Parameters:

fn -- Either an mlrun.runtimes.RemoteRuntime object or a string URI in the form function_name or project_name/function_name.
project_name --
Optional project name containing the function. If not provided, the project name will be derived automatically according to the following order:
1. Extracted from the function URI (if specified as 'project_name/function_name')
2. Taken from the project_name parameter
3. Inferred from the current runtime or graph execution context

class for calling remote endpoints

sync and async graph step implementation for request/resp to remote service (class shortcut = "$remote") url can be an http(s) url (e.g. https://myservice/path) or an mlrun function uri ([project/]name). alternatively the url_expression can be specified to build the url from the event (e.g. "event['url']").

example pipeline:

flow = function.set_topology("flow", engine="async")
flow.to(name="step1", handler="func1")
    .to(RemoteStep(name="remote_echo", url="https://myservice/path", method="POST"))
    .to(name="laststep", handler="func2").respond()

Parameters:

url -- http(s) url or function [project/]name to call
subpath -- path (which follows the url), use $path to use the event.path
method -- The HTTP method to use for the request (e.g., "GET", "POST", "PUT", "DELETE"). If not provided, the step will try to use event.method at runtime, and if that is also missing, it defaults to "POST".
headers -- dictionary with http header values
url_expression -- an expression for getting the url from the event, e.g. "event['url']"
body_expression -- an expression for getting the request body from the event, e.g. "event['data']"
return_json -- indicate the returned value is json, and convert it to a py object
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
retries -- number of retries (in exponential backoff)
backoff_factor -- A backoff factor in seconds to apply between attempts after the second try
timeout -- How long to wait for the server to send data before giving up, float in seconds
headers_expression -- an expression for getting the request headers from the event, e.g. "event['headers']"

post_init(mode='sync', **kwargs) → None[source]#

class mlrun.serving.remote.RemoteStep(url: str, subpath: str | None = None, method: str | None = None, headers: dict | None = None, url_expression: str | None = None, body_expression: str | None = None, return_json: bool = True, input_path: str | None = None, result_path: str | None = None, max_in_flight=None, retries=None, backoff_factor=None, timeout=None, headers_expression: str | None = None, **kwargs)[source]#

class for calling remote endpoints

sync and async graph step implementation for request/resp to remote service (class shortcut = "$remote") url can be an http(s) url (e.g. https://myservice/path) or an mlrun function uri ([project/]name). alternatively the url_expression can be specified to build the url from the event (e.g. "event['url']").

example pipeline:

flow = function.set_topology("flow", engine="async")
flow.to(name="step1", handler="func1")
    .to(RemoteStep(name="remote_echo", url="https://myservice/path", method="POST"))
    .to(name="laststep", handler="func2").respond()

Parameters:

url -- http(s) url or function [project/]name to call
subpath -- path (which follows the url), use $path to use the event.path
method -- The HTTP method to use for the request (e.g., "GET", "POST", "PUT", "DELETE"). If not provided, the step will try to use event.method at runtime, and if that is also missing, it defaults to "POST".
headers -- dictionary with http header values
url_expression -- an expression for getting the url from the event, e.g. "event['url']"
body_expression -- an expression for getting the request body from the event, e.g. "event['data']"
return_json -- indicate the returned value is json, and convert it to a py object
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
retries -- number of retries (in exponential backoff)
backoff_factor -- A backoff factor in seconds to apply between attempts after the second try
timeout -- How long to wait for the server to send data before giving up, float in seconds
headers_expression -- an expression for getting the request headers from the event, e.g. "event['headers']"

do_event(event)[source]#

post_init(mode='sync', **kwargs)[source]#

base model router class

Model Serving Router, route between child models

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
kwargs -- extra arguments

property background_task_reached_terminal_state#

do_event(event, *args, **kwargs)[source]#: handle incoming events, event is nuclio event class

get_metadata()[source]#: return the model router/host details

parse_event(event)[source]#

post_init(mode='sync', **kwargs)[source]#

postprocess(event)[source]#: run tasks after processing the event

preprocess(event)[source]#: run tasks before processing the event

class mlrun.serving.routers.EnrichmentModelRouter(context=None, name: str | None = None, routes=None, protocol: str | None = None, url_prefix: str | None = None, health_prefix: str | None = None, feature_vector_uri: str = '', impute_policy: dict | None = None, **kwargs)[source]#

Model router with feature enrichment and imputing

Model router with feature enrichment (from the feature store)

The EnrichmentModelRouter class enrich the incoming event with real-time features read from a feature vector (in MLRun feature store) and forwards the enriched event to the child models

The feature vector is specified using the feature_vector_uri, in addition an imputing policy can be specified to substitute None/NaN values with pre defines constant or stats.

Parameters:

feature_vector_uri -- feature vector uri in the form: [project/]name[:tag]
impute_policy -- value imputing (substitute NaN/Inf values with statistical or constant value), you can set the impute_policy parameter with the imputing policy, and specify which constant or statistical value will be used instead of NaN/Inf value, this can be defined per column or for all the columns ("*"). The replaced value can be fixed number for constants or $mean, $max, $min, $std, $count for statistical values. “*” is used to specify the default for all features, example: impute_policy={"*": "$mean", "age": 33}
context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7.
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
kwargs -- extra arguments

post_init(mode='sync', **kwargs)[source]#

preprocess(event)[source]#: Turn an entity identifier (source) to a Feature Vector

class mlrun.serving.routers.EnrichmentVotingEnsemble(context=None, name: str | None = None, routes=None, protocol=None, url_prefix: str | None = None, health_prefix: str | None = None, vote_type: str | None = None, executor_type: ParallelRunnerModes | str = ParallelRunnerModes.thread, prediction_col_name: str | None = None, feature_vector_uri: str = '', impute_policy: dict | None = None, **kwargs)[source]#

Voting Ensemble with feature enrichment (from the feature store)

The EnrichmentVotingEnsemble class enables to enrich the incoming event with real-time features read from a feature vector (in MLRun feature store) and apply prediction logic on top of the different added models.

You can use it by calling:

<prefix>/<model>[/versions/<ver>]/operation
Sends the event to the specific <model>[/versions/<ver>]
<prefix>/operation
Sends the event to all models and applies vote(self, event)

The VotingEnsemble applies the following logic: Incoming Event -> Feature enrichment -> Send to model/s -> Apply all model/s logic (Preprocessing -> Prediction -> Postprocessing) -> Router Voting logic -> Router Postprocessing -> Response

The feature vector is specified using the feature_vector_uri, in addition an imputing policy can be specified to substitute None/NaN values with pre defines constant or stats.

When enabling model tracking via set_tracking() the ensemble logic predictions will appear with model name as the given VotingEnsemble name or "VotingEnsemble" by default.

Example:

# Define a serving function
# Note: You can point the function to a file containing you own Router or Classifier Model class
# this basic class supports sklearn based models (with `<model>.predict()` api)
fn = mlrun.code_to_function(
    name='ensemble',
    kind='serving',
    filename='model-server.py',
    image='mlrun/mlrun')


# Set the router class
# You can set your own classes by simply changing the `class_name`
fn.set_topology(
    class_name='mlrun.serving.routers.EnrichmentVotingEnsemble',
    feature_vector_uri="transactions-fraud",
    impute_policy={"*": "$mean"})

# Add models
fn.add_model(<model_name>, <model_path>, <model_class_name>)
fn.add_model(<model_name>, <model_path>, <model_class_name>)

How to extend the VotingEnsemble

The VotingEnsemble applies its logic using the logic(predictions) function. The logic() function receives an array of (# samples, # predictors) which you can then use to apply whatever logic you may need.

If we use this VotingEnsemble as an example, the logic() function tries to figure out whether you are trying to do a classification or a regression prediction by the prediction type or by the given vote_type parameter. Then we apply the appropriate max_vote() or mean_vote() which calculates the actual prediction result and returns it as the VotingEnsemble's prediction.

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default v2)
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
feature_vector_uri -- feature vector uri in the form [project/]name[:tag]
impute_policy -- value imputing (substitute NaN/Inf values with statistical or constant value), you can set the impute_policy parameter with the imputing policy, and specify which constant or statistical value will be used instead of NaN/Inf value, this can be defined per column or for all the columns ("*"). The replaced value can be fixed number for constants or $mean, $max, $min, $std, $count for statistical values. “*” is used to specify the default for all features, example: impute_policy={"*": "$mean", "age": 33}
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7.
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}.
vote_type -- Voting type to be used (from VotingTypes). by default will try to self-deduct upon the first event: * float prediction type: regression * int prediction type: classification
executor_type -- Parallelism mechanism, out of ParallelRunnerModes, by default threads
prediction_col_name -- The dict key for the predictions column in the model's responses output. Example: If the model returns {id: <id>, model_name: <name>, outputs: {..., prediction: [<predictions>], ...}}, the prediction_col_name should be prediction. By default, prediction.
kwargs -- extra arguments

post_init(mode='sync', **kwargs)[source]#

preprocess(event)[source]#: Turn an entity identifier (source) to a Feature Vector

Model Serving Router, route between child models

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
kwargs -- extra arguments

class mlrun.serving.routers.OperationTypes(value)[source]#

Supported opreations for VotingEnsemble

explain = 'explain'#

infer = 'infer'#

predict = 'predict'#

class mlrun.serving.routers.ParallelRun(context=None, name: str | None = None, routes=None, protocol: str | None = None, url_prefix: str | None = None, health_prefix: str | None = None, extend_event=None, executor_type: ParallelRunnerModes | str = ParallelRunnerModes.thread, **kwargs)[source]#

Process multiple steps (child routes) in parallel and merge the results

By default the results dict from each step are merged (by key), when setting the extend_event the results will start from the event body dict (values can be overwritten)

Users can overwrite the merger() method to implement custom merging logic.

Example:

# create a function with a parallel router and 3 children
fn = mlrun.new_function("parallel", kind="serving")
graph = fn.set_topology(
    "router",
    mlrun.serving.routers.ParallelRun(
        extend_event=True, executor_type=executor
    ),
)
graph.add_route("child1", class_name="Cls1")
graph.add_route("child2", class_name="Cls2", my_arg={"c": 7})
graph.add_route("child3", handler="my_handler")
server = fn.to_mock_server()
resp = server.test("", {"x": 8})

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
executor_type -- Parallelism mechanism, Have 3 option : * array - running one by one * process - running in separated process * thread - running in separated threads by default threads
extend_event -- True will add the event body to the result
kwargs -- extra arguments

do_event(event, *args, **kwargs)[source]#: handle incoming events, event is nuclio event class

static init_pool(server_spec, routes, is_mock)[source]#

merger(body, results)[source]#

Merging logic

input the event body and a dict of route results and returns a dict with merged results

class mlrun.serving.routers.ParallelRunnerModes(value)[source]#

Supported parallel running modes for VotingEnsemble

static all()[source]#

array = 'array'#

process = 'process'#

thread = 'thread'#

class mlrun.serving.routers.VotingEnsemble(context=None, name: str | None = None, routes=None, protocol: str | None = None, url_prefix: str | None = None, health_prefix: str | None = None, vote_type: str | None = None, weights: dict[str, float] | None = None, executor_type: ParallelRunnerModes | str = ParallelRunnerModes.thread, format_response_with_col_name_flag: bool = False, prediction_col_name: str = 'prediction', shard_by_endpoint: bool | None = None, **kwargs)[source]#

Voting Ensemble

The VotingEnsemble class enables you to apply prediction logic on top of the different added models.

You can use it by calling:

<prefix>/<model>[/versions/<ver>]/operation
Sends the event to the specific <model>[/versions/<ver>]
<prefix>/operation
Sends the event to all models and applies vote(self, event)

The VotingEnsemble applies the following logic: Incoming Event -> Router Preprocessing -> Send to model/s -> Apply all model/s logic (Preprocessing -> Prediction -> Postprocessing) -> Router Voting logic -> Router Postprocessing -> Response

This enables you to do the general preprocessing and postprocessing steps once on the router level, with only model-specific adjustments at the model level.

When enabling model tracking via set_tracking() the ensemble logic predictions will appear with model name as the given VotingEnsemble name or "VotingEnsemble" by default.

Example:

# Define a serving function
# Note: You can point the function to a file containing you own Router or Classifier Model class
#       this basic class supports sklearn based models (with `<model>.predict()` api)
fn = mlrun.code_to_function(name='ensemble',
                            kind='serving',
                            filename='model-server.py'
                            image='mlrun/mlrun')

# Set the router class
# You can set your own classes by simply changing the `class_name`
fn.set_topology(class_name='mlrun.serving.routers.VotingEnsemble')

# Add models
fn.add_model(<model_name>, <model_path>, <model_class_name>)
fn.add_model(<model_name>, <model_path>, <model_class_name>)

How to extend the VotingEnsemble:

The VotingEnsemble applies its logic using the logic(predictions) function. The logic() function receives an array of (# samples, # predictors) which you can then use to apply whatever logic you may need.

If we use this VotingEnsemble as an example, the logic() function tries to figure out whether you are trying to do a classification or a regression prediction by the prediction type or by the given vote_type parameter. Then we apply the appropriate max_vote() or mean_vote() which calculates the actual prediction result and returns it as the VotingEnsemble's prediction.

Parameters:

context -- for internal use (passed in init)
name -- step name
routes -- for internal use (routes passed in init)
protocol -- serving API protocol (default "v2")
url_prefix -- url prefix for the router (default /v2/models)
health_prefix -- health api url prefix (default /v2/health)
input_path -- when specified selects the key/path in the event to use as body this require that the event body will behave like a dict, example: event: {"data": {"a": 5, "b": 7}}, input_path="data.b" means request body will be 7
result_path -- selects the key/path in the event to write the results to this require that the event body will behave like a dict, example: event: {"x": 5} , result_path="resp" means the returned response will be written to event["y"] resulting in {"x": 5, "resp": <result>}
vote_type -- Voting type to be used (from VotingTypes). by default will try to self-deduct upon the first event: - float prediction type: regression - int prediction type: classification
({"<model_name>" (weights A dictionary) -- <model_weight>}) that specified each model weight, if there is a model that didn't appear in the dictionary his weight will be count as a zero. None means that all the models have the same weight.
executor_type -- Parallelism mechanism, out of ParallelRunnerModes, by default threads
format_response_with_col_name_flag --

If this flag is True the model's responses output format is
{id: <id>, model_name: <name>, outputs: {..., prediction: [<predictions>], ...}}

Else
{id: <id>, model_name: <name>, outputs: [<predictions>]}
prediction_col_name -- The dict key for the predictions column in the model's responses output. Example: If the model returns {id: <id>, model_name: <name>, outputs: {..., prediction: [<predictions>], ...}} the prediction_col_name should be prediction. by default, prediction
shard_by_endpoint -- whether to use the endpoint as the partition/sharding key when writing to model monitoring stream. Defaults to True.
kwargs -- extra arguments

do_event(event, *args, **kwargs)[source]#

Handles incoming requests.

Parameters:: event (nuclio.Event) -- Incoming request as a nuclio.Event.
Returns:: Event response after running the requested logic
Return type:: Response

extract_results_from_response(response)[source]#

Extracts the prediction from the model response. This function is used to allow multiple model return types. and allow for easy extension to the user's ensemble and models best practices.

Parameters:: response (Union[List, Dict]) -- The model response's output field.
Returns:: The model's predictions
Return type:: List

logic(predictions: list[list[int | float]], weights: list[float])[source]#

Returns the final prediction of all the models after applying the desire logic

Parameters:

predictions -- The predictions from all models, per event
weights -- models weights in the prediction order

Returns:

List of the resulting voted predictions

post_init(mode='sync', **kwargs)[source]#

validate(request: dict, method: str)[source]#

Validate the event body (after preprocessing)

Parameters:

request -- Event body.
method -- Event method.

Returns:

The given Event body (request).

Raises:

Exception -- If validation failed.

class mlrun.serving.routers.VotingTypes(value)[source]#

Supported voting types for VotingEnsemble

classification = 'classification'#

regression = 'regression'#

class mlrun.serving.steps.ChoiceByField(field_name: str | list[str], **kwargs)[source]#

Selects downstream outlets to route each event based on a predetermined field. :param field_name: event field name that contains the step name or names of the desired outlet or outlets

select_outlets(event)[source]#: Override this method to route events based on a custom logic. The default implementation will route all events to all outlets.

class mlrun.serving.utils.StepToDict[source]#

auto serialization of graph steps to a python dictionary

to_dict(fields: list | None = None, exclude: list | None = None, strip: bool = False)[source]#: convert the step object to a python dictionary

class mlrun.serving.states.MonitoredStep(*args, name: str, raise_exception=True, **kwargs)[source]#

_calculate_monitoring_data() → dict[str, Any][source]#

Child class must override _calculate_monitoring_data() method and provide meaningful data-structure to the pre-process step in the monitoring flow.

Monitoring data structure should support the following schema:

{
    "inputs": inputs features,
    "outputs": output schema expected,
    "input_path": the path where inputs are,
    "result_path": the path where results are,
    "creation_strategy": model endpoint creation strategy,
    "labels": model endpoint labels,
    "model_endpoint_uid": model endpoint uid (added in deployment),
    "model_class": the model class
}

mlrun.serving

Contents

mlrun.serving#