databricks.koalas.read_delta¶

databricks.koalas.read_delta(path: str, version: Optional[str] = None, timestamp: Optional[str] = None, index_col: Union[str, List[str], None] = None, **options) → databricks.koalas.frame.DataFrame[source]¶

Read a Delta Lake table on some file system and return a DataFrame.

If the Delta Lake table is already stored in the catalog (aka the metastore), use ‘read_table’.

Parameters

pathstring: Path to the Delta Lake table.
versionstring, optional: Specifies the table version (based on Delta’s internal transaction version) to read from, using Delta’s time travel feature. This sets Delta’s ‘versionAsOf’ option.
timestampstring, optional: Specifies the table version (based on timestamp) to read from, using Delta’s time travel feature. This must be a valid date or timestamp string in Spark, and sets Delta’s ‘timestampAsOf’ option.
index_colstr or list of str, optional, default: None: Index column of table in Spark.
options: Additional options that can be passed onto Delta.

Returns

DataFrame

See also

DataFrame.to_delta
read_table
read_spark_io
read_parquet

Examples

>>> ks.range(1).to_delta('%s/read_delta/foo' % path)
>>> ks.read_delta('%s/read_delta/foo' % path)
   id
0   0

>>> ks.range(10, 15, num_partitions=1).to_delta('%s/read_delta/foo' % path, mode='overwrite')
>>> ks.read_delta('%s/read_delta/foo' % path)
   id
0  10
1  11
2  12
3  13
4  14

>>> ks.read_delta('%s/read_delta/foo' % path, version=0)
   id
0   0

You can preserve the index in the roundtrip as below.

>>> ks.range(10, 15, num_partitions=1).to_delta(
...     '%s/read_delta/bar' % path, index_col="index")
>>> ks.read_delta('%s/read_delta/bar' % path, index_col="index")
... 
       id
index
0      10
1      11
2      12
3      13
4      14

databricks.koalas.DataFrame.to_table

databricks.koalas.DataFrame.to_delta