Built-in steps
Contents
Built-in steps#
MlRun provides you with many built-in steps that you can use when building your graph.
Click on the step names in the following sections to see the full usage.
Base Operators#
Class name |
Description |
---|---|
Batches events. This step emits a batch every |
|
Redirects each input element into one of the multiple downstreams. |
|
Adds fields to each incoming event. |
|
Filters events based on a user-provided function. |
|
Maps, or transforms, each incoming event into any number of events. |
|
Flatten is equivalent to FlatMap(lambda x: x). |
|
Applies the given function on each event in the stream, and passes the original event downstream. |
|
Similar to Map, but instead of a function argument, this class should be extended and its do() method overridden. |
|
Maps, or transforms, incoming events using a stateful user-provided function, and an initial state, which can be a database table. |
|
Partitions events by calling a predicate function on each event. Each processed event results in a Partitioned namedtuple of (left=Optional[Event], right=Optional[Event]). |
|
Reduces incoming events into a single value that is returned upon the successful termination of the flow. |
|
Emits a single event in a window of |
Data Transformations#
Class name |
Description |
---|---|
Aggregates the data into the table object provided for later persistence, and outputs an event enriched with the requested aggregation features. |
|
Extract a date-time component. |
|
Replace None values with default values. |
|
Map column values to new values. |
|
Create new binary fields, one per category (one hot encoded). |
|
Set the event metadata (id, key, timestamp) from the event body. |
External IO and data enrichment#
Class name |
Description |
---|---|
A class for calling remote endpoints in parallel. |
|
Data input/output class abstracting access to various local/remote data sources. |
|
Joins each event with data from the given table. |
|
Joins each event with a V3IO table. Used for event augmentation. |
|
Similar to to AggregateByKey, but this step is for serving only and does not aggregate the event. |
|
Class for calling remote endpoints. |
|
Joins each event with data from any HTTP source. Used for event augmentation. |
Sources#
Class name |
Description |
---|---|
Reads Google BigQuery query results as input source for a flow. |
|
Reads a CSV file as input source for a flow. |
|
Reads data frame as input source for a flow. |
|
Sets the kafka source for the flow. |
|
Reads the Parquet file/dir as the input source for a flow. |
|
Sets the stream source for the flow. If the stream doesn’t exist it creates it. |
Targets#
Class name |
Description |
---|---|
Writes events to a CSV file. |
|
Persists the data in table to its associated storage by key. |
|
The Parquet target storage driver, used to materialize feature set/vector data into parquet files. |
|
Writes all incoming events into a V3IO stream. |
|
Create pandas data frame from events. Can appear in the middle of the flow, as opposed to ReduceToDataFrame. |
|
Models#
Class name |
Description |
---|---|
A model serving class for serving ONYX Models. A sub-class of the V2ModelServer class. |
|
A model serving class for serving PyTorch Models. A sub-class of the V2ModelServer class. |
|
A model serving class for serving Sklearn Models. A sub-class of the V2ModelServer class. |
|
A model serving class for serving TFKeras Models. A sub-class of the V2ModelServer class. |
|
A model serving class for serving XGB Models. A sub-class of the V2ModelServer class. |
Routers#
Class name |
Description |
---|---|
Auto enrich the request with data from the feature store. The router input accepts a list of inference requests (each request can be a dict or a list of incoming features/keys). It enriches the request with data from the specified feature vector ( |
|
Auto enrich the request with data from the feature store. The router input accepts a list of inference requests (each request can be a dict or a list of incoming features/keys). It enriches the request with data from the specified feature vector ( |
|
Basic model router, for calling different models per each model path. |
|
An ensemble machine learning model that combines the prediction of several models. |
Other#
Class name |
Description |
---|---|
Validate feature values according to the feature set validation policy. |
|
Builds a pandas DataFrame from events and returns that DataFrame on flow termination. |