DataFrame.to_json(path=None, compression='uncompressed', num_files=None, mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options) → Optional[str]

Convert the object to a JSON string.


Koalas to_json writes files to a path or URI. Unlike pandas’, Koalas respects HDFS’s property such as ‘’.


Koalas writes JSON files into the directory, path, and writes multiple part-… files in the directory when path is specified. This behaviour was inherited from Apache Spark. The number of files can be controlled by num_files.


output JSON format is different from pandas’. It always use orient=’records’ for its output. This behaviour might have to change in the near future.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

pathstring, optional

File path. If not specified, the result is returned as a string.

compression{‘gzip’, ‘bz2’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

num_filesthe number of files to be written in path directory when

this is a path.

modestr {‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’},

default ‘overwrite’. Specifies the behavior of the save operation when the destination exists already.

  • ‘append’: Append the new data to existing data.

  • ‘overwrite’: Overwrite existing data.

  • ‘ignore’: Silently ignore this operation if data already exists.

  • ‘error’ or ‘errorifexists’: Throw an exception if data already exists.

partition_colsstr or list of str, optional, default None

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent Koalas’ index. The index name in Koalas is ignored. By default, the index is always lost.

options: keyword arguments for additional options specific to PySpark.

It is specific to PySpark’s JSON options to pass. Check the options in PySpark’s API documentation for spark.write.json(…). It has a higher priority and overwrites all other options. This parameter only works when path is specified.

str or None


>>> df = ks.DataFrame([['a', 'b'], ['c', 'd']],
...                   columns=['col 1', 'col 2'])
>>> df.to_json()
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
>>> df['col 1'].to_json()
'[{"col 1":"a"},{"col 1":"c"}]'
>>> df.to_json(path=r'%s/to_json/foo.json' % path, num_files=1)
>>> ks.read_json(
...     path=r'%s/to_json/foo.json' % path
... ).sort_values(by="col 1")
  col 1 col 2
0     a     b
1     c     d
>>> df['col 1'].to_json(path=r'%s/to_json/foo.json' % path, num_files=1, index_col="index")
>>> ks.read_json(
...     path=r'%s/to_json/foo.json' % path, index_col="index"
... ).sort_values(by="col 1")  
      col 1
0         a
1         c