mlrun.frameworks.xgboost#

mlrun.frameworks.xgboost.apply_mlrun(model: xgboost.XGBModel = None, model_name: str = 'model', tag: str = '', model_path: str = None, modules_map: Dict[str, None | str | List[str]] | str = None, custom_objects_map: Dict[str, str | List[str]] | str = None, custom_objects_directory: str = None, context: MLClientCtx = None, artifacts: List[MLPlan] | List[str] | Dict[str, dict] = None, metrics: List[Metric] | List[Tuple[Callable | str, dict] | Callable | str] | Dict[str, Tuple[Callable | str, dict] | Callable | str] = None, x_test: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix | xgboost.DMatrix = None, y_test: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix | xgboost.DMatrix = None, sample_set: list | tuple | dict | ndarray | DataFrame | Series | scipy.sparse.base.spmatrix | xgboost.DMatrix | DataItem | str = None, y_columns: List[str] | List[int] = None, feature_vector: str = None, feature_weights: List[float] = None, labels: Dict[str, str | int | float] = None, parameters: Dict[str, str | int | float] = None, extra_data: Dict[str, str | bytes | Artifact | DataItem] = None, auto_log: bool = True, **kwargs) XGBoostModelHandler[source]#

Wrap the given model with MLRun's interface providing it with mlrun's additional features.

Parameters:
  • model -- The model to wrap. Can be loaded from the model path given as well.

  • model_name -- The model name to use for storing the model artifact. Default: "model".

  • tag -- The model's tag to log with.

  • model_path -- The model's store object path. Mandatory for evaluation (to know which model to update). If model is not provided, it will be loaded from this path.

  • modules_map --

    A dictionary of all the modules required for loading the model. Each key is a path to a module and its value is the object name to import from it. All the modules will be imported globally. If multiple objects needed to be imported from the same module a list can be given. The map can be passed as a path to a json file as well. For example:

    {
        "module1": None,  # import module1
        "module2": ["func1", "func2"],  # from module2 import func1, func2
        "module3.sub_module": "func3",  # from module3.sub_module import func3
    }
    

    If the model path given is of a store object, the modules map will be read from the logged modules map artifact of the model.

  • custom_objects_map --

    A dictionary of all the custom objects required for loading the model. Each key is a path to a python file and its value is the custom object name to import from it. If multiple objects needed to be imported from the same py file a list can be given. The map can be passed as a path to a json file as well. For example:

    {
        "/.../custom_model.py": "MyModel",
        "/.../custom_objects.py": ["object1", "object2"]
    }
    

    All the paths will be accessed from the given 'custom_objects_directory', meaning each py file will be read from 'custom_objects_directory/<MAP VALUE>'. If the model path given is of a store object, the custom objects map will be read from the logged custom object map artifact of the model. Notice: The custom objects will be imported in the order they came in this dictionary (or json). If a custom object is depended on another, make sure to put it below the one it relies on.

  • custom_objects_directory -- Path to the directory with all the python files required for the custom objects. Can be passed as a zip file as well (will be extracted during the run before loading the model). If the model path given is of a store object, the custom objects files will be read from the logged custom object artifact of the model.

  • context -- MLRun context to work with. If no context is given it will be retrieved via 'mlrun.get_or_create_ctx(None)'

  • artifacts -- A list of artifacts plans to produce during the run.

  • metrics -- A list of metrics to calculate during the run.

  • x_test -- The validation data for producing and calculating artifacts and metrics post training. Without this, validation will not be performed.

  • y_test -- The test data ground truth for producing and calculating artifacts and metrics post training or post predict / predict_proba.

  • sample_set -- A sample set of inputs for the model for logging its stats along the model in favour of model monitoring.

  • y_columns -- List of names of all the columns in the ground truth labels in case its a pd.DataFrame or a list of integers in case the dataset is a np.ndarray. If not given but 'y_train' / 'y_test' is given then the labels / indices in it will be used by default.

  • feature_vector -- Feature store feature vector uri (store://feature-vectors/<project>/<name>[:tag])

  • feature_weights -- List of feature weights, one per input column.

  • labels -- Labels to log with the model.

  • parameters -- Parameters to log with the model.

  • extra_data -- Extra data to log with the model.

  • auto_log -- Whether to apply MLRun's auto logging on the model. Auto logging will add the default artifacts and metrics to the lists of artifacts and metrics. Default: True.

Returns:

The model handler initialized with the provided model.