head
Since Koalas doesn’t guarantee the row ordering, head could return some rows from distributed partition and the result is not deterministic, which might confuse users.
We added a configuration compute.ordered_head (#1231), and if it is set to True, Koalas performs natural ordering beforehand and the result will be the same as pandas’. The default value is False because the ordering will cause a performance overhead.
compute.ordered_head
True
False
>>> kdf = ks.DataFrame({'a': range(10)}) >>> pdf = kdf.to_pandas() >>> pdf.head(3) a 0 0 1 1 2 2 >>> kdf.head(3) a 5 5 6 6 7 7 >>> kdf.head(3) a 0 0 1 1 2 2 >>> ks.options.compute.ordered_head = True >>> kdf.head(3) a 0 0 1 1 2 2 >>> kdf.head(3) a 0 0 1 1 2 2
We started trying to use GitHub Actions for CI. (#1254, #1265, #1264, #1267, #1269)
We added the following new feature:
DataFrame: - apply (#1259)
Fix identical and equals for the comparison between the same object. (#1220)
Select the series correctly in SeriesGroupBy APIs (#1224)
Fixes DataFrame/Series.clip function to preserve its index. (#1232)
DataFrame/Series.clip
Throw a better exception in DataFrame.sort_values when multi-index column is used (#1238)
DataFrame.sort_values
Fix fillna not to change index values. (#1241)
fillna
Fix DataFrame.__setitem__ with tuple-named Series. (#1245)
DataFrame.__setitem__
Fix corr to support multi-index columns. (#1246)
corr
Fix output of print() matches with pandas of Series (#1250)
print()
Fix fillna to support partial column index for multi-index columns. (#1244)
Add as_index check logic to groupby parameter (#1253)
Raising NotImplementedError for elements that actually are not implemented. (#1256)
Fix where to support multi-index columns. (#1249)