mlrun.frameworks.sklearn#

mlrun.frameworks.sklearn.apply_mlrun(model: sklearn.base.BaseEstimator | sklearn.base.BiclusterMixin | sklearn.base.ClassifierMixin | sklearn.base.ClusterMixin | sklearn.base.DensityMixin | sklearn.base.RegressorMixin | sklearn.base.TransformerMixin = None, model_name: str = 'model', tag: str = '', model_path: str = None, modules_map: dict[str, Union[NoneType, str, list[str]]] | str = None, custom_objects_map: dict[str, Union[str, list[str]]] | str = None, custom_objects_directory: str = None, context: MLClientCtx = None, artifacts: list[mlrun.frameworks._ml_common.plan.MLPlan] | list[str] | dict[str, dict] = None, metrics: list[mlrun.frameworks.sklearn.metric.Metric] | list[Union[tuple[Union[Callable, str], dict], Callable, str]] | dict[str, Union[tuple[Union[Callable, str], dict], Callable, str]] = None, x_test: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix = None, y_test: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix = None, sample_set: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix | DataItem | str = None, y_columns: list[str] | list[int] = None, feature_vector: str = None, feature_weights: list[float] = None, labels: dict[str, Union[str, int, float]] = None, parameters: dict[str, Union[str, int, float]] = None, extra_data: dict[str, Union[str, bytes, mlrun.artifacts.base.Artifact, mlrun.datastore.base.DataItem]] = None, auto_log: bool = True, **kwargs) SKLearnModelHandler[source]#

Wrap the given model with MLRun's interface providing it with mlrun's additional features.

Parameters:
  • model -- The model to wrap. Can be loaded from the model path given as well.

  • model_name -- The model name to use for storing the model artifact. Default: "model".

  • tag -- The model's tag to log with.

  • model_path -- The model's store object path. Mandatory for evaluation (to know which model to update). If model is not provided, it will be loaded from this path.

  • modules_map --

    A dictionary of all the modules required for loading the model. Each key is a path to a module and its value is the object name to import from it. All the modules will be imported globally. If multiple objects needed to be imported from the same module a list can be given. The map can be passed as a path to a json file as well. For example:

    {
        "module1": None,  # import module1
        "module2": ["func1", "func2"],  # from module2 import func1, func2
        "module3.sub_module": "func3",  # from module3.sub_module import func3
    }
    

    If the model path given is of a store object, the modules map will be read from the logged modules map artifact of the model.

  • custom_objects_map --

    A dictionary of all the custom objects required for loading the model. Each key is a path to a python file and its value is the custom object name to import from it. If multiple objects needed to be imported from the same py file a list can be given. The map can be passed as a path to a json file as well. For example:

    {
        "/.../custom_model.py": "MyModel",
        "/.../custom_objects.py": ["object1", "object2"],
    }
    

    All the paths will be accessed from the given 'custom_objects_directory', meaning each py file will be read from 'custom_objects_directory/<MAP VALUE>'. If the model path given is of a store object, the custom objects map will be read from the logged custom object map artifact of the model. Notice: The custom objects will be imported in the order they came in this dictionary (or json). If a custom object is depended on another, make sure to put it below the one it relies on.

  • custom_objects_directory -- Path to the directory with all the python files required for the custom objects. Can be passed as a zip file as well (will be extracted during the run before loading the model). If the model path given is of a store object, the custom objects files will be read from the logged custom object artifact of the model.

  • context -- MLRun context to work with. If no context is given it will be retrieved via 'mlrun.get_or_create_ctx(None)'

  • artifacts -- A list of artifacts plans to produce during the run.

  • metrics -- A list of metrics to calculate during the run.

  • x_test -- The validation data for producing and calculating artifacts and metrics post training. Without this, validation will not be performed.

  • y_test -- The test data ground truth for producing and calculating artifacts and metrics post training or post predict / predict_proba.

  • sample_set -- A sample set of inputs for the model for logging its stats along the model in favour of model monitoring. If not given the 'x_train' will be used by default.

  • y_columns -- List of names of all the columns in the ground truth labels in case its a pd.DataFrame or a list of integers in case the dataset is a np.ndarray. If not given 'y_train' is given then the labels / indices in it will be used by default.

  • feature_vector -- Feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights -- List of feature weights, one per input column.

  • labels -- Labels to log with the model.

  • parameters -- Parameters to log with the model.

  • extra_data -- Extra data to log with the model.

  • auto_log -- Whether to apply MLRun's auto logging on the model. Auto logging will add the default artifacts and metrics to the lists of artifacts and metrics. Default: True.

Returns:

The model handler initialized with the provided model.