mlrun.datastore

class mlrun.datastore.DataItem(key: str, store: mlrun.datastore.base.DataStore, subpath: str, url: str = '', meta=None, artifact_url=None)[source]

Bases: object

Data input/output class abstracting access to various local/remote data sources

DataItem objects are passed into functions and can be used inside the function, when a function run completes users can access the run data via the run.artifact(key) which returns a DataItem object. users can also convert a data url (e.g. s3://bucket/key.csv) to a DataItem using mlrun.get_dataitem(url).

Example:

# using data item inside a function
def my_func(context, data: DataItem):
    df = data.as_df()


# reading run results using DataItem (run.artifact())
train_run = train_iris_func.run(inputs={'dataset': dataset},
                                params={'label_column': 'label'})

train_run.artifact('confusion-matrix').show()
test_set = train_run.artifact('test_set').as_df()

# create and use DataItem from uri
data = mlrun.get_dataitem('http://xyz/data.json').get()
property artifact_url

DataItem artifact url (when its an artifact) or url for simple dataitems

as_df(columns=None, df_module=None, format='', **kwargs)[source]

return a dataframe object (generated from the dataitem).

Parameters
  • columns – optional, list of columns to select

  • df_module – optional, dataframe class (e.g. pd, dd, cudf, ..)

  • format – file format, if not specified it will be deducted from the suffix

download(target_path)[source]

download to the target dir/path

Parameters

target_path – local target path for the downloaded item

get(size=None, offset=0, encoding=None)[source]

read all or a byte range and return the content

Parameters
  • size – number of bytes to get

  • offset – fetch from offset (in bytes)

  • encoding – encoding (e.g. “utf-8”) for converting bytes to str

property key

DataItem key

property kind

DataItem store kind (file, s3, v3io, ..)

listdir()[source]

return a list of child file names

local()[source]

get the local path of the file, download to tmp first if its a remote object

ls()[source]

return a list of child file names

property meta

Artifact Metadata, when the DataItem is read from the artifacts store

open(mode)[source]

return fsspec file handler, if supported

put(data, append=False)[source]

write/upload the data, append is only supported by some datastores

Parameters
  • data – data (bytes/str) to write

  • append – append data to the end of the object, NOT SUPPORTED BY SOME OBJECT STORES!

show(format=None)[source]

show the data object content in Jupyter

Parameters

format – format to use (when there is no/wrong suffix), e.g. ‘png’

stat()[source]

return FileStats class (size, modified, content_type)

property store

DataItem store object

property suffix

DataItem suffix (file extension) e.g. ‘.png’

upload(src_path)[source]

upload the source file (src_path)

Parameters

src_path – source file path to read from and upload

property url

//bucket/path

Type

DataItem url e.g. /dir/path, s3

mlrun.datastore.get_store_resource(uri, db=None, secrets=None, project=None)[source]

get store resource object by uri