Version 1.6.0

Improved Plotly backend support

We improved plotting support by implementing pie, histogram and box plots with Plotly plot backend. Koalas now can plot data with Plotly via:

  • DataFrame.plot.pie and Series.plot.pie (#1971) Screen Shot 2021-01-22 at 6 32 48 PM

  • DataFrame.plot.hist and Series.plot.hist (#1999) Screen Shot 2021-01-22 at 6 32 38 PM

  • Series.plot.box (#2007) Screen Shot 2021-01-22 at 6 32 31 PM

In addition, we optimized histogram calculation as a single pass in DataFrame (#1997) instead of launching each job to calculate each Series in DataFrame.

Operations between Series and Index

The operations between Series and Index are now supported as below (#1996):

>>> kser = ks.Series([1, 2, 3, 4, 5, 6, 7])
>>> kidx = ks.Index([0, 1, 2, 3, 4, 5, 6])

>>> (kser + 1 + 10 * kidx).sort_index()
0     2
1    13
2    24
3    35
4    46
5    57
6    68
dtype: int64
>>> (kidx + 1 + 10 * kser).sort_index()
0    11
1    22
2    33
3    44
4    55
5    66
6    77
dtype: int64

Support setting to a Series via attribute access

We have added the support of setting a column via attribute assignment in DataFrame, (#1989).

>>> kdf = ks.DataFrame({'A': [1, 2, 3, None]})
>>> kdf.A = kdf.A.fillna(kdf.A.median())
>>> kdf
     A
0  1.0
1  2.0
2  3.0
3  2.0

Other new features, improvements and bug fixes

We added the following new features:

Series:

DataFrame

In addition, we also implement new parameters:

  • Add min_count parameter for Frame.sum. (#1978)

  • Added ddof parameter for GroupBy.std() and GroupBy.var() (#1994)

  • Support ddof parameter for std and var. (#1986)

Along with the following fixes:

  • Fix stat functions with no numeric columns. (#1967)

  • Fix DataFrame.replace with NaN/None values (#1962)

  • Fix cumsum and cumprod. (#1982)

  • Use Python type name instead of Spark’s in error messages. (#1985)

  • Use object.__setattr__ in Series. (#1991)

  • Adjust Series.mode to match pandas Series.mode (#1995)

  • Adjust data when all the values in a column are nulls. (#2004)

  • Fix as_spark_type to not support “bigint”. (#2011)