mlrun.package.packagers.pandas_packagers.PandasDataFramePackager#

class mlrun.package.packagers.pandas_packagers.PandasDataFramePackager[source]#

Bases: DefaultPackager

pd.DataFrame packager.

Packager Summary

Packing Type: pandas.core.frame.DataFrame

Packing Sub-Classes: False

Priority: Default priority (5)

Default Artifact Types:

Artifact Types:

  • dataset - Pack a pandas dataframe as a dataset.

    • file_format - The file format to save as. Default is parquet.

  • file - Pack a dataframe as a file by the given format.

    • file_format - The file format to save as. Default is parquet or csv (depends on the column names asparquet cannot be used for non string column names).

    • flatten - Whether to flatten the dataframe before saving. For some formats it is mandatory to enableflattening, otherwise saving and loading the dataframe will cause unexpected behaviorespecially in case it is multi-level or multi-index. Default to True.

    • to_kwargs - Additional keyword arguments to pass to the pandas to_x functions.

  • object - Pack a python object, pickling it into a pkl file and store it in an artifact.

    • pickle_module_name - The pickle module name to use for serializing the object.

  • result - Pack a dataframe as a result.

Attributes

DEFAULT_PACKING_ARTIFACT_TYPE

The default artifact type to pack as.

DEFAULT_UNPACKING_ARTIFACT_TYPE

The default artifact type to unpack from.

PACK_SUBCLASSES

A flag for indicating whether to pack all subclasses of the PACKABLE_OBJECT_TYPE as well.

PRIORITY

The priority of this packager in the packagers collection of the manager (lower is better).

DEFAULT_PACKING_ARTIFACT_TYPE = 'dataset'#

The default artifact type to pack as.

DEFAULT_UNPACKING_ARTIFACT_TYPE = 'object'#

The default artifact type to unpack from.

PACK_SUBCLASSES = False#

A flag for indicating whether to pack all subclasses of the PACKABLE_OBJECT_TYPE as well.

PRIORITY: int = Ellipsis#

The priority of this packager in the packagers collection of the manager (lower is better).

Methods

__init__()

add_future_clearing_path(path[, ...])

Mark a path to be cleared by this packager's manager post logging the packaged artifacts.

get_default_packing_artifact_type(obj)

Get the default artifact type for packing an object of this packager.

get_default_unpacking_artifact_type(data_item)

Get the default artifact type used for unpacking.

get_future_clearing_path_list()

Get the packager's future clearing path list.

get_supported_artifact_types()

Get all the supported artifact types on this packager.

is_packable(obj[, artifact_type, configurations])

Check if this packager can pack an object of the provided type as the provided artifact type.

is_unpackable(data_item, type_hint[, ...])

Check if this packager can unpack an input according to the user given type hint and the provided artifact type.

pack(obj[, key, artifact_type, configurations])

Pack an object as the given artifact type using the provided configurations.

pack_dataset(obj, key[, file_format])

Pack a pandas dataframe as a dataset.

pack_file(obj, key[, file_format, flatten])

Pack a dataframe as a file by the given format.

pack_object(obj, key[, pickle_module_name])

Pack a python object, pickling it into a pkl file and store it in an artifact.

pack_result(obj, key)

Pack a dataframe as a result.

unpack(data_item[, artifact_type, instructions])

Unpack the data item's artifact by the provided type using the given instructions.

unpack_dataset(data_item)

Unpack a padnas dataframe from a dataset artifact.

unpack_file(data_item[, file_format, ...])

Unpack a pandas dataframe from file.

unpack_object(data_item[, ...])

Unpack the data item's object, unpickle it using the instructions and return.

__init__()#
classmethod add_future_clearing_path(path: Union[str, Path], add_temp_paths_only: bool = True)#

Mark a path to be cleared by this packager’s manager post logging the packaged artifacts.

Parameters:
  • path – The path to clear.

  • add_temp_paths_only – Whether to add only temporary files. When running locally on local files DataItem.local() will return the local given path which should not be deleted. This flag helps to avoid deleting files in that scenario.

classmethod get_default_packing_artifact_type(obj: Any) str#

Get the default artifact type for packing an object of this packager.

Parameters:

obj – The about to be packed object.

Returns:

The default artifact type.

classmethod get_default_unpacking_artifact_type(data_item: DataItem) str[source]#

Get the default artifact type used for unpacking. Returns dataset if the data item represents a DatasetArtifact and otherwise, file.

Parameters:

data_item – The about to be unpacked data item.

Returns:

The default artifact type.

classmethod get_future_clearing_path_list() List[str]#

Get the packager’s future clearing path list.

Returns:

The clearing path list.

classmethod get_supported_artifact_types() List[str]#

Get all the supported artifact types on this packager.

Returns:

A list of all the supported artifact types.

classmethod is_packable(obj: Any, artifact_type: Optional[str] = None, configurations: Optional[dict] = None) bool#

Check if this packager can pack an object of the provided type as the provided artifact type.

The method is implemented to validate the object’s type and artifact type by checking if the object type given match to the variable PACKABLE_OBJECT_TYPE with respect to the PACK_SUBCLASSES class variable. If it does, it will check if the artifact type given is in the list returned from get_supported_artifact_types.

Parameters:
  • obj – The object to pack.

  • artifact_type – The artifact type to log the object as.

  • configurations – The log hint configurations passed by the user.

Returns:

True if packable and False otherwise.

classmethod is_unpackable(data_item: DataItem, type_hint: Type, artifact_type: Optional[str] = None) bool#

Check if this packager can unpack an input according to the user given type hint and the provided artifact type.

The default implementation tries to match the packable object type of this packager to the given type hint, if it does match, it will look for the artifact type in the list returned from get_supported_artifact_types.

Parameters:
  • data_item – The input data item to check if unpackable.

  • type_hint – The type hint of the input to unpack (the object type to be unpacked).

  • artifact_type – The artifact type to unpack the object as.

Returns:

True if unpackable and False otherwise.

classmethod pack(obj: Any, key: Optional[str] = None, artifact_type: Optional[str] = None, configurations: Optional[dict] = None) Union[Tuple[Artifact, dict], dict]#

Pack an object as the given artifact type using the provided configurations.

Parameters:
  • obj – The object to pack.

  • key – The key of the artifact.

  • artifact_type – Artifact type to log to MLRun. If passing None, the default artifact type will be used.

  • configurations – Log hints configurations to pass to the packing method.

Returns:

If the packed object is an artifact, a tuple of the packed artifact and unpacking instructions dictionary. If the packed object is a result, a dictionary containing the result key and value.

classmethod pack_dataset(obj: DataFrame, key: str, file_format: str = 'parquet')[source]#

Pack a pandas dataframe as a dataset.

Parameters:
  • obj – The dataframe to pack.

  • key – The key to use for the artifact.

  • file_format – The file format to save as. Default is parquet.

Returns:

The packed artifact and instructions.

classmethod pack_file(obj: DataFrame, key: str, file_format: Optional[str] = None, flatten: bool = True, **to_kwargs) Tuple[Artifact, dict][source]#

Pack a dataframe as a file by the given format.

Parameters:
  • obj – The series to pack.

  • key – The key to use for the artifact.

  • file_format – The file format to save as. Default is parquet or csv (depends on the column names as parquet cannot be used for non string column names).

  • flatten – Whether to flatten the dataframe before saving. For some formats it is mandatory to enable flattening, otherwise saving and loading the dataframe will cause unexpected behavior especially in case it is multi-level or multi-index. Default to True.

  • to_kwargs – Additional keyword arguments to pass to the pandas to_x functions.

Returns:

The packed artifact and instructions.

classmethod pack_object(obj: Any, key: str, pickle_module_name: str = 'cloudpickle') Tuple[Artifact, dict]#

Pack a python object, pickling it into a pkl file and store it in an artifact.

Parameters:
  • obj – The object to pack and log.

  • key – The artifact’s key.

  • pickle_module_name – The pickle module name to use for serializing the object.

Returns:

The artifacts and it’s pickling instructions.

classmethod pack_result(obj: DataFrame, key: str) dict[source]#

Pack a dataframe as a result.

Parameters:
  • obj – The dataframe to pack and log.

  • key – The result’s key.

Returns:

The result dictionary.

classmethod unpack(data_item: DataItem, artifact_type: Optional[str] = None, instructions: Optional[dict] = None) Any#

Unpack the data item’s artifact by the provided type using the given instructions.

Parameters:
  • data_item – The data input to unpack.

  • artifact_type – The artifact type to unpack the data item as. If passing None, the default artifact type will be used.

  • instructions – Additional instructions noted in the package to pass to the unpacking method.

Returns:

The unpacked data item’s object.

Raises:

MLRunPackageUnpackingError – In case the packager could not unpack the data item.

classmethod unpack_dataset(data_item: DataItem)[source]#

Unpack a padnas dataframe from a dataset artifact.

Parameters:

data_item – The data item to unpack.

Returns:

The unpacked dataframe.

classmethod unpack_file(data_item: DataItem, file_format: Optional[str] = None, read_kwargs: Optional[dict] = None) DataFrame[source]#

Unpack a pandas dataframe from file.

Parameters:
  • data_item – The data item to unpack.

  • file_format – The file format to use for reading the series. Default is None - will be read by the file extension.

  • read_kwargs – Keyword arguments to pass to the read of the formatter.

Returns:

The unpacked series.

classmethod unpack_object(data_item: DataItem, pickle_module_name: str = 'cloudpickle', object_module_name: Optional[str] = None, python_version: Optional[str] = None, pickle_module_version: Optional[str] = None, object_module_version: Optional[str] = None) Any#

Unpack the data item’s object, unpickle it using the instructions and return.

Warnings of mismatching python and module versions between the original pickling interpreter and this one may be raised.

Parameters:
  • data_item – The data item holding the pkl file.

  • pickle_module_name – Module to use for unpickling the object.

  • object_module_name – The original object’s module. Used to verify the current interpreter object module version match the pickled object version before unpickling the object.

  • python_version – The python version in which the original object was pickled. Used to verify the current interpreter python version match the pickled object version before unpickling the object.

  • pickle_module_version – The pickle module version. Used to verify the current interpreter module version match the one who pickled the object before unpickling it.

  • object_module_version – The original object’s module version to match to the interpreter’s module version.

Returns:

The un-pickled python object.