Version 0.18.0

Multi-index columns support

We continue improving multi-index columns support (#793, #776). We made the following APIs support multi-index columns:

Also, we can set tuple or None name for Series and Index. (#776)

>>> import databricks.koalas as ks
>>> kser = ks.Series([1, 2, 3])
>>> kser.name = ('a', 'b')
>>> kser
0    1
1    2
2    3
Name: (a, b), dtype: int64

Plots

We also continue adding plot APIs as follows:

For Series:

For DataFrame:

  • plot.hist() (#780)

Options

In addition, we added the support for namespace-access in options (#785).

>>> import databricks.koalas as ks
>>> ks.options.display.max_rows
1000
>>> ks.options.display.max_rows = 10
>>> ks.options.display.max_rows
10

See also User Guide of our project docs.

Other new features and improvements

We added the following new features:

koalas.DataFrame:

koalas.indexes.Index/MultiIndex

Along with the following improvements:

  • Add index_col for read_json (#797)

  • Add index_col for spark IO reads (#769, #775)

  • Add “sep” parameter for read_csv (#777)

  • Add axis parameter to dataframe.diff (#774)

  • Add read_json and let to_json use spark.write.json (#753)

  • Use spark.write.csv in to_csv of Series and DataFrame (#749)

  • Handle TimestampType separately when convert to pandas’ dtype. (#798)

  • Fix spark_df when set_index(.., drop=False). (#792)

Backward compatibility

  • We removed some parameters in DataFrame.to_csv and DataFrame.to_json to allow distributed writing (#749, #753)