databricks.koalas.DataFrame.to_spark_io¶

DataFrame.to_spark_io(path: Optional[str] = None, format: Optional[str] = None, mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options) → None[source]¶

Write the DataFrame out to a Spark data source. DataFrame.spark.to_spark_io() is an alias of DataFrame.to_spark_io().

Parameters

pathstring, optional

Path to the data source.

formatstring, optional

Specifies the output data source format. Some common ones are:

‘delta’
‘parquet’
‘orc’
‘json’
‘csv’

modestr {‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’}, default

‘overwrite’. Specifies the behavior of the save operation when data already.

‘append’: Append the new data to existing data.
‘overwrite’: Overwrite existing data.
‘ignore’: Silently ignore this operation if data already exists.
‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent Koalas’ index. The index name in Koalas is ignored. By default, the index is always lost.

optionsdict

All other options passed directly into Spark’s data source.

Returns

None

See also

read_spark_io
DataFrame.to_delta
DataFrame.to_parquet
DataFrame.to_table
DataFrame.to_spark_io
DataFrame.spark.to_spark_io

Examples

>>> df = ks.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df
                 date country  code
0 2012-01-31 12:00:00      KR     1
1 2012-02-29 12:00:00      US     2
2 2012-03-31 12:00:00      JP     3

>>> df.to_spark_io(path='%s/to_spark_io/foo.json' % path, format='json')

databricks.koalas.read_spark_io

databricks.koalas.read_csv