Version 0.18.0¶

Multi-index columns support¶

We continue improving multi-index columns support (#793, #776). We made the following APIs support multi-index columns:

applymap (#793)
shift (#793)
diff (#793)
fillna (#793)
rank (#793)

Also, we can set tuple or None name for Series and Index. (#776)

>>> import databricks.koalas as ks
>>> kser = ks.Series([1, 2, 3])
>>> kser.name = ('a', 'b')
>>> kser
0    1
1    2
2    3
Name: (a, b), dtype: int64

Plots¶

We also continue adding plot APIs as follows:

For Series:

plot.kde() (#767)

For DataFrame:

plot.hist() (#780)

Options¶

In addition, we added the support for namespace-access in options (#785).

>>> import databricks.koalas as ks
>>> ks.options.display.max_rows
1000
>>> ks.options.display.max_rows = 10
>>> ks.options.display.max_rows
10

See also User Guide of our project docs.

Other new features and improvements¶

We added the following new features:

koalas.DataFrame:

aggregate (#796)
agg (#796)
items (#787)

koalas.indexes.Index/MultiIndex

is_boolean (#795)
is_categorical (#795)
is_floating (#795)
is_integer (#795)
is_interval (#795)
is_numeric (#795)
is_object (#795)

Along with the following improvements:

Add index_col for read_json (#797)
Add index_col for spark IO reads (#769, #775)
Add “sep” parameter for read_csv (#777)
Add axis parameter to dataframe.diff (#774)
Add read_json and let to_json use spark.write.json (#753)
Use spark.write.csv in to_csv of Series and DataFrame (#749)
Handle TimestampType separately when convert to pandas’ dtype. (#798)
Fix spark_df when set_index(.., drop=False). (#792)

Backward compatibility¶

We removed some parameters in DataFrame.to_csv and DataFrame.to_json to allow distributed writing (#749, #753)

Version 0.19.0 Version 0.17.0