Version 0.22.0

Enable Arrow 0.15.1+

Apache Arrow 0.15.0 did not work well with PySpark 2.4 so it was disabled in the previous version. With Arrow 0.15.1, now it works in Koalas (#902).

Expanding and Rolling

We also added expanding() and rolling() APIs in all groupby(), Series and Frame (#985, #991, #990, #1015, #996, #1034, #1037)

  • min

  • max

  • sum

  • mean

  • std

  • var

Multi-index columns support

We continue improving multi-index columns support. We made the following APIs support multi-index columns:

Documentation

We added “Best Practices” section in the documentation (#1041) so that Koalas users can read and follow. Please see https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html

Other new features and improvements

We added the following new features:

koalas.DataFrame:

koalas.Series:

koalas.MultiIndex:

Along with the following improvements:

  • Introduce column_scols in InternalFrame substitude for data_columns. (#956)

  • Fix different index level assignment when ‘compute.ops_on_diff_frames’ is enabled (#1045)

  • Fix Dataframe.melt function & Add doctest case for melt function (#987)

  • Enable creating Index from list like ‘Index([1, 2, 3])’ (#986)

  • Fix combine_frames to handle where the right hand side arguments are modified Series (#1020)

  • setup.py should support Python 2 to show a proper error message. (#1027)

  • Remove Series.schema. (#993)