mlrun.package.packagers.numpy_packagers.NumPyNDArrayPackager#

class mlrun.package.packagers.numpy_packagers.NumPyNDArrayPackager[source]#

Bases: DefaultPackager

numpy.ndarray packager.

Packager Summary

Packing Type: numpy.ndarray

Packing Sub-Classes: False

Priority: Default priority (5)

Default Artifact Types:

Artifact Types:

  • dataset - Pack an array as a dataset.

    • file_format - The file format to save as. Default is parquet.

  • file - Pack an array as a file by the given format.

    • file_format - The file format to save as. Default is npy.

    • save_kwargs - Additional keyword arguments to pass to the numpy save functions.

  • object - Pack a python object, pickling it into a pkl file and store it in an artifact.

    • pickle_module_name - The pickle module name to use for serializing the object.

  • result - Pack an array as a result.

Attributes

DEFAULT_PACKING_ARTIFACT_TYPE

The default artifact type to pack as.

DEFAULT_UNPACKING_ARTIFACT_TYPE

The default artifact type to unpack from.

PACK_SUBCLASSES

A flag for indicating whether to also pack all subclasses of the PACKABLE_OBJECT_TYPE.

PRIORITY

The priority of this packager in the packagers collection of the manager (lower is better).

future_clearing_path_list

Get the packager's future clearing path list.

priority

Get the packager's priority.

DEFAULT_PACKING_ARTIFACT_TYPE = 'object'#

The default artifact type to pack as.

DEFAULT_UNPACKING_ARTIFACT_TYPE = 'object'#

The default artifact type to unpack from.

PACK_SUBCLASSES = False#

A flag for indicating whether to also pack all subclasses of the PACKABLE_OBJECT_TYPE.

PRIORITY: int = Ellipsis#

The priority of this packager in the packagers collection of the manager (lower is better).

future_clearing_path_list#

Get the packager's future clearing path list.

Returns:

The clearing path list.

priority#

Get the packager's priority.

Returns:

The packager's priority.

Methods

__init__()

add_future_clearing_path(path)

Mark a path to be cleared by this packager's manager after logging the packaged artifacts.

get_data_item_local_path(data_item[, ...])

Get the local path to the item handled by the data item provided.

get_default_packing_artifact_type(obj)

Get the default artifact type.

get_default_unpacking_artifact_type(data_item)

Get the default artifact type used for unpacking.

get_supported_artifact_types()

Get all the supported artifact types on this packager.

is_packable(obj[, artifact_type, configurations])

Check if this packager can pack an object of the provided type as the provided artifact type.

is_unpackable(data_item, type_hint[, ...])

Check if this packager can unpack an input according to the user-given type hint and the provided artifact type.

pack(obj[, key, artifact_type, configurations])

Pack an object as the given artifact type using the provided configurations.

pack_dataset(obj, key[, file_format])

Pack an array as a dataset.

pack_file(obj, key[, file_format])

Pack an array as a file by the given format.

pack_object(obj, key[, pickle_module_name])

Pack a python object, pickling it into a pkl file and store it in an artifact.

pack_result(obj, key)

Pack an array as a result.

unpack(data_item[, artifact_type, instructions])

Unpack the data item's artifact by the provided type using the given instructions.

unpack_dataset(data_item)

Unpack a numppy array from a dataset artifact.

unpack_file(data_item[, file_format, ...])

Unpack a numppy array from file.

unpack_object(data_item[, ...])

Unpack the data item's object, unpickle it using the instructions, and return.

__init__()#
add_future_clearing_path(path: str | Path)#

Mark a path to be cleared by this packager's manager after logging the packaged artifacts.

Parameters:

path -- The path to clear post logging the artifacts.

get_data_item_local_path(data_item: DataItem, add_to_future_clearing_path: bool | None = None) str#

Get the local path to the item handled by the data item provided. The local path can be the same as the data item in case the data item points to a local path, or will be downloaded to a temporary directory and return this newly created temporary local path.

Parameters:
  • data_item -- The data item to get its item local path.

  • add_to_future_clearing_path -- Whether to add the local path to the future clearing paths list. If None, it will add the path to the list only if the data item is not of kind 'file', meaning it represents a local file and hence we don't want to delete it post running automatically. We wish to delete it only if the local path is temporary (and that will be in case kind is not 'file', so it is being downloaded to a temporary directory).

Returns:

The data item local path.

get_default_packing_artifact_type(obj: ndarray) str[source]#

Get the default artifact type. Will be a result if the array size is less than 10, otherwise file.

Parameters:

obj -- The about to be packed array.

Returns:

The default artifact type.

get_default_unpacking_artifact_type(data_item: DataItem) str[source]#

Get the default artifact type used for unpacking. Returns dataset if the data item represents a DatasetArtifact and otherwise, file.

Parameters:

data_item -- The about to be unpacked data item.

Returns:

The default artifact type.

get_supported_artifact_types() list[str]#

Get all the supported artifact types on this packager.

Returns:

A list of all the supported artifact types.

is_packable(obj: Any, artifact_type: str | None = None, configurations: dict | None = None) bool#

Check if this packager can pack an object of the provided type as the provided artifact type.

The method is implemented to validate the object's type and artifact type by checking if the given object type matches the variable PACKABLE_OBJECT_TYPE with respect to the PACK_SUBCLASSES class variable. If it does, it checks if the given artifact type is in the list returned from get_supported_artifact_types.

Parameters:
  • obj -- The object to pack.

  • artifact_type -- The artifact type to log the object as.

  • configurations -- The log hint configurations passed by the user.

Returns:

True if packable and False otherwise.

is_unpackable(data_item: DataItem, type_hint: type, artifact_type: str | None = None) bool#

Check if this packager can unpack an input according to the user-given type hint and the provided artifact type.

The default implementation tries to match the packable object type of this packager to the given type hint. If it matches, it looks for the artifact type in the list returned from get_supported_artifact_types.

Parameters:
  • data_item -- The input data item to check if unpackable.

  • type_hint -- The type hint of the input to unpack (the object type to be unpacked).

  • artifact_type -- The artifact type to unpack the object as.

Returns:

True if unpackable and False otherwise.

pack(obj: Any, key: str | None = None, artifact_type: str | None = None, configurations: dict | None = None) tuple[mlrun.artifacts.base.Artifact, dict] | dict#

Pack an object as the given artifact type using the provided configurations.

Parameters:
  • obj -- The object to pack.

  • key -- The key of the artifact.

  • artifact_type -- Artifact type to log to MLRun. If passing None, the default artifact type is used.

  • configurations -- Log hints configurations to pass to the packing method.

Returns:

If the packed object is an artifact, a tuple of the packed artifact and unpacking instructions dictionary. If the packed object is a result, a dictionary containing the result key and value.

pack_dataset(obj: ndarray, key: str, file_format: str = '') tuple[mlrun.artifacts.base.Artifact, dict][source]#

Pack an array as a dataset.

Parameters:
  • obj -- The aray to pack.

  • key -- The key to use for the artifact.

  • file_format -- The file format to save as. Default is parquet.

Returns:

The packed artifact and instructions.

Raises:

MLRunInvalidArgumentError -- IF the shape of the array is not 1D / 2D.

pack_file(obj: ndarray, key: str, file_format: str = 'npy', **save_kwargs) tuple[mlrun.artifacts.base.Artifact, dict][source]#

Pack an array as a file by the given format.

Parameters:
  • obj -- The aray to pack.

  • key -- The key to use for the artifact.

  • file_format -- The file format to save as. Default is npy.

  • save_kwargs -- Additional keyword arguments to pass to the numpy save functions.

Returns:

The packed artifact and instructions.

pack_object(obj: Any, key: str, pickle_module_name: str = 'cloudpickle') tuple[mlrun.artifacts.base.Artifact, dict]#

Pack a python object, pickling it into a pkl file and store it in an artifact.

Parameters:
  • obj -- The object to pack and log.

  • key -- The artifact's key.

  • pickle_module_name -- The pickle module name to use for serializing the object.

Returns:

The artifacts and its pickling instructions.

pack_result(obj: ndarray, key: str) dict[source]#

Pack an array as a result.

Parameters:
  • obj -- The array to pack and log.

  • key -- The result's key.

Returns:

The result dictionary.

unpack(data_item: DataItem, artifact_type: str | None = None, instructions: dict | None = None) Any#

Unpack the data item's artifact by the provided type using the given instructions.

Parameters:
  • data_item -- The data input to unpack.

  • artifact_type -- The artifact type to unpack the data item as. If passing None, the default artifact type is used.

  • instructions -- Additional instructions noted in the package to pass to the unpacking method.

Returns:

The unpacked data item's object.

Raises:

MLRunPackageUnpackingError -- In case the packager could not unpack the data item.

unpack_dataset(data_item: DataItem) ndarray[source]#

Unpack a numppy array from a dataset artifact.

Parameters:

data_item -- The data item to unpack.

Returns:

The unpacked array.

unpack_file(data_item: DataItem, file_format: str | None = None, allow_pickle: bool = False) ndarray[source]#

Unpack a numppy array from file.

Parameters:
  • data_item -- The data item to unpack.

  • file_format -- The file format to use for reading the array. Default is None - will be read by the file extension.

  • allow_pickle -- Whether to allow loading pickled arrays in case of object type arrays. Only relevant to 'npy' format. Default is False for security reasons.

Returns:

The unpacked array.

unpack_object(data_item: DataItem, pickle_module_name: str = 'cloudpickle', object_module_name: str | None = None, python_version: str | None = None, pickle_module_version: str | None = None, object_module_version: str | None = None) Any#

Unpack the data item's object, unpickle it using the instructions, and return.

Warnings of mismatching python and module versions between the original pickling interpreter and this one may be raised.

Parameters:
  • data_item -- The data item holding the pkl file.

  • pickle_module_name -- Module to use for unpickling the object.

  • object_module_name -- The original object's module. Used to verify that the current interpreter object module version matches the pickled object version before unpickling the object.

  • python_version -- The python version in which the original object was pickled. Used to verify that the current interpreter python version matches the pickled object version before unpickling the object.

  • pickle_module_version -- The pickle module version. Used to verify that the current interpreter module version matches the one that pickled the object before unpickling it.

  • object_module_version -- The original object's module version to match to the interpreter's module version.

Returns:

The un-pickled python object.