databricks.koalas.DataFrame.to_csv¶

DataFrame.to_csv(path=None, sep=',', na_rep='', columns=None, header=True, quotechar='"', date_format=None, escapechar=None, num_files=None, mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options)¶

Write object to a comma-separated values (csv) file.

Note

Koalas to_csv writes files to a path or URI. Unlike pandas’, Koalas respects HDFS’s property such as ‘fs.default.name’.

Note

Koalas writes CSV files into the directory, path, and writes multiple part-… files in the directory when path is specified. This behaviour was inherited from Apache Spark. The number of files can be controlled by num_files.

Parameters

pathstr, default None

File path. If None is provided the result is returned as a string.

sepstr, default ‘,’

String of length 1. Field delimiter for the output file.

na_repstr, default ‘’

Missing data representation.

columnssequence, optional

Columns to write.

headerbool or list of str, default True

Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

quotecharstr, default ‘"’

String of length 1. Character used to quote fields.

date_formatstr, default None

Format string for datetime objects.

escapecharstr, default None

String of length 1. Character used to escape sep and quotechar when appropriate.

num_filesthe number of files to be written in path directory when

this is a path.

modestr {‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’},

default ‘overwrite’. Specifies the behavior of the save operation when the destination exists already.

‘append’: Append the new data to existing data.
‘overwrite’: Overwrite existing data.
‘ignore’: Silently ignore this operation if data already exists.
‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional, default None

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent Koalas’ index. The index name in Koalas is ignored. By default, the index is always lost.

options: keyword arguments for additional options specific to PySpark.

This kwargs are specific to PySpark’s CSV options to pass. Check the options in PySpark’s API documentation for spark.write.csv(…). It has higher priority and overwrites all other options. This parameter only works when path is specified.

Examples

>>> df = ks.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df.sort_values(by="date")  
                   date country  code
... 2012-01-31 12:00:00      KR     1
... 2012-02-29 12:00:00      US     2
... 2012-03-31 12:00:00      JP     3

>>> print(df.to_csv())  
date,country,code
2012-01-31 12:00:00,KR,1
2012-02-29 12:00:00,US,2
2012-03-31 12:00:00,JP,3

>>> df.cummax().to_csv(path=r'%s/to_csv/foo.csv' % path, num_files=1)
>>> ks.read_csv(
...    path=r'%s/to_csv/foo.csv' % path
... ).sort_values(by="date")  
                   date country  code
... 2012-01-31 12:00:00      KR     1
... 2012-02-29 12:00:00      US     2
... 2012-03-31 12:00:00      US     3

In case of Series,

>>> print(df.date.to_csv())  
date
2012-01-31 12:00:00
2012-02-29 12:00:00
2012-03-31 12:00:00

>>> df.date.to_csv(path=r'%s/to_csv/foo.csv' % path, num_files=1)
>>> ks.read_csv(
...     path=r'%s/to_csv/foo.csv' % path
... ).sort_values(by="date")  
                   date
... 2012-01-31 12:00:00
... 2012-02-29 12:00:00
... 2012-03-31 12:00:00

You can preserve the index in the roundtrip as below.

>>> df.set_index("country", append=True, inplace=True)
>>> df.date.to_csv(
...     path=r'%s/to_csv/bar.csv' % path,
...     num_files=1,
...     index_col=["index1", "index2"])
>>> ks.read_csv(
...     path=r'%s/to_csv/bar.csv' % path, index_col=["index1", "index2"]
... ).sort_values(by="date")  
                             date
index1 index2
...    ...    2012-01-31 12:00:00
...    ...    2012-02-29 12:00:00
...    ...    2012-03-31 12:00:00

databricks.koalas.read_csv databricks.koalas.read_clipboard