Version 0.27.0

head ordering

Since Koalas doesn’t guarantee the row ordering, head could return some rows from distributed partition and the result is not deterministic, which might confuse users.

We added a configuration compute.ordered_head (#1231), and if it is set to True, Koalas performs natural ordering beforehand and the result will be the same as pandas’. The default value is False because the ordering will cause a performance overhead.

>>> kdf = ks.DataFrame({'a': range(10)})
>>> pdf = kdf.to_pandas()
>>> pdf.head(3)
   a
0  0
1  1
2  2

>>> kdf.head(3)
   a
5  5
6  6
7  7
>>> kdf.head(3)
   a
0  0
1  1
2  2

>>> ks.options.compute.ordered_head = True
>>> kdf.head(3)
   a
0  0
1  1
2  2
>>> kdf.head(3)
   a
0  0
1  1
2  2

GitHub Actions

We started trying to use GitHub Actions for CI. (#1254, #1265, #1264, #1267, #1269)

Other new features and improvements

We added the following new feature:

DataFrame: - apply (#1259)

Other improvements

  • Fix identical and equals for the comparison between the same object. (#1220)

  • Select the series correctly in SeriesGroupBy APIs (#1224)

  • Fixes DataFrame/Series.clip function to preserve its index. (#1232)

  • Throw a better exception in DataFrame.sort_values when multi-index column is used (#1238)

  • Fix fillna not to change index values. (#1241)

  • Fix DataFrame.__setitem__ with tuple-named Series. (#1245)

  • Fix corr to support multi-index columns. (#1246)

  • Fix output of print() matches with pandas of Series (#1250)

  • Fix fillna to support partial column index for multi-index columns. (#1244)

  • Add as_index check logic to groupby parameter (#1253)

  • Raising NotImplementedError for elements that actually are not implemented. (#1256)

  • Fix where to support multi-index columns. (#1249)