Model serving API#
MLRun Serving follows the same REST API defined by Triton and KFServing v2.
Nuclio also supports streaming protocols (Kafka, kinesis, MQTT, etc.). When streaming, the
model
name and operation
can be encoded inside the message body.
The APIs are:
explain#
POST /v2/models/
Request body:
{
"id" : $string #optional,
"model" : $string #optional
"parameters" : $parameters #optional,
"inputs" : [ $request_input, ... ],
"outputs" : [ $request_output, ... ] #optional
}
Response structure:
{
"model_name" : $string,
"model_version" : $string #optional,
"id" : $string,
"outputs" : [ $response_output, ... ]
}
get model health / readiness#
GET v2/models/${MODEL_NAME}[/versions/${VERSION}]/ready
Returns 200 for Ok, 40X for not ready.
get model metadata#
GET v2/models/${MODEL_NAME}[/versions/${VERSION}]
Response example: {"name": "m3", "version": "v2", "inputs": [..], "outputs": [..]}
get server info#
GET /
GET /v2/health
Response example: {'name': 'my-server', 'version': 'v2', 'extensions': []}
infer / predict#
POST /v2/models/<model>[/versions/{VERSION}]/infer
Request body:
{
"id" : $string #optional,
"model" : $string #optional
"data_url" : $string #optional
"parameters" : $parameters #optional,
"inputs" : [ $request_input, ... ],
"outputs" : [ $request_output, ... ] #optional
}
id: Unique Id of the request, if not provided a random value is provided.
model: Model to select (for streaming protocols without URLs).
data_url: Option to load the
inputs
from an external file/s3/v3io/… object.parameters: Optional request parameters.
inputs: List of input elements (numeric values, arrays, or dicts).
outputs: Optional, requested output values.
Note
You can also send binary data to the function, for example, a JPEG image. The serving engine pre-processor
detects it based on the HTTP content-type and converts it to the above request structure, placing the
image bytes array in the inputs
field.
Response structure:
{
"model_name" : $string,
"model_version" : $string #optional,
"id" : $string,
"outputs" : [ $response_output, ... ]
}
list models#
GET /v2/models/
Response example: {"models": ["m1", "m2", "m3:v1", "m3:v2"]}