- class mlrun.artifacts.dataset.DatasetArtifact(key: str | None = None, df=None, preview: int | None = None, format: str = '', stats: bool | None = None, target_path: str | None = None, extra_data: dict | None = None, column_metadata: dict | None = None, ignore_preview_limits: bool = False, label_column: str | None = None, **kwargs)[source]#
Bases:
Artifact
- SUPPORTED_FORMATS = ['csv', 'parquet', 'pq', 'tsdb', 'kv']#
- property column_metadata#
- property df: DataFrame#
Get the dataset in this artifact.
- Returns:
The dataset as a DataFrame.
- property header#
- static is_format_supported(fmt: str) bool [source]#
Check whether the given dataset format is supported by the DatasetArtifact.
- Parameters:
fmt -- The format string to check.
- Returns:
True if the format is supported and False if not.
- kind = 'dataset'#
- property preview#
- property schema#
- property spec: DatasetArtifactSpec#
- property stats#
- class mlrun.artifacts.dataset.TableArtifact(key=None, body=None, df=None, viewer=None, visible=False, inline=False, format=None, header=None, schema=None)[source]#
Bases:
Artifact
- kind = 'table'#
- property spec: TableArtifactSpec#
- mlrun.artifacts.dataset.update_dataset_meta(artifact, from_df=None, schema: dict | None = None, header: list | None = None, preview: list | None = None, stats: dict | None = None, extra_data: dict | None = None, column_metadata: dict | None = None, labels: dict | None = None, ignore_preview_limits: bool = False)[source]#
Update dataset object attributes/metadata
this method will edit or add metadata to a dataset object
Example
- update_dataset_meta(dataset, from_df=df,
extra_data={'histogram': 's3://mybucket/..'})
- Parameters:
from_df -- read metadata (schema, preview, ..) from provided df
artifact -- dataset artifact object or path (store://..) or DataItem
schema -- dataset schema, see pandas build_table_schema
header -- column headers
preview -- list of rows and row values (from df.values.tolist())
stats -- dict of column names and their stats (cleaned df.describe(include='all'))
extra_data -- extra data items (key: path string | artifact)
column_metadata -- dict of metadata per column
labels -- metadata labels
ignore_preview_limits -- whether to ignore the preview size limits