DataFrame.to_parquet(path: str, mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None, compression: Optional[str] = None, index_col: Union[str, List[str], None] = None, **options) → None[source]

Write the DataFrame out as a Parquet file or directory.

pathstr, required

Path to write to.

modestr {‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’},

default ‘overwrite’. Specifies the behavior of the save operation when the destination exists already.

  • ‘append’: Append the new data to existing data.

  • ‘overwrite’: Overwrite existing data.

  • ‘ignore’: Silently ignore this operation if data already exists.

  • ‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional, default None

Names of partitioning columns

compressionstr {‘none’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘lz4’, ‘zstd’}

Compression codec to use when saving to file. If None is set, it uses the value specified in spark.sql.parquet.compression.codec.

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent Koalas’ index. The index name in Koalas is ignored. By default, the index is always lost.


All other options passed directly into Spark’s data source.


>>> df = ks.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df
                 date country  code
0 2012-01-31 12:00:00      KR     1
1 2012-02-29 12:00:00      US     2
2 2012-03-31 12:00:00      JP     3
>>> df.to_parquet('%s/to_parquet/foo.parquet' % path, partition_cols='date')
>>> df.to_parquet(
...     '%s/to_parquet/foo.parquet' % path,
...     mode = 'overwrite',
...     partition_cols=['date', 'country'])