DataFrame.
to_delta
Write the DataFrame out as a Delta Lake table.
Path to write to.
‘overwrite’. Specifies the behavior of the save operation when the destination exists already.
‘append’: Append the new data to existing data.
‘overwrite’: Overwrite existing data.
‘ignore’: Silently ignore this operation if data already exists.
‘error’ or ‘errorifexists’: Throw an exception if data already exists.
Names of partitioning columns
Column names to be used in Spark to represent Koalas’ index. The index name in Koalas is ignored. By default, the index is always lost.
All other options passed directly into Delta Lake.
See also
read_delta, DataFrame.to_parquet, DataFrame.to_table, DataFrame.to_spark_io
read_delta
DataFrame.to_parquet
DataFrame.to_table
DataFrame.to_spark_io
Examples
>>> df = ks.DataFrame(dict( ... date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')), ... country=['KR', 'US', 'JP'], ... code=[1, 2 ,3]), columns=['date', 'country', 'code']) >>> df date country code 0 2012-01-31 12:00:00 KR 1 1 2012-02-29 12:00:00 US 2 2 2012-03-31 12:00:00 JP 3
Create a new Delta Lake table, partitioned by one column:
>>> df.to_delta('%s/to_delta/foo' % path, partition_cols='date')
Partitioned by two columns:
>>> df.to_delta('%s/to_delta/bar' % path, partition_cols=['date', 'country'])
Overwrite an existing table’s partitions, using the ‘replaceWhere’ capability in Delta:
>>> df.to_delta('%s/to_delta/bar' % path, ... mode='overwrite', replaceWhere='date >= "2012-01-01"')