Version 0.25.0

loc and iloc indexers improvement

We improved loc and iloc indexers. Now, loc can support scalar values as indexers (#1172).

>>> import databricks.koalas as ks
>>>
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=['cobra', 'viper', 'sidewinder'],
...                   columns=['max_speed', 'shield'])
>>> df.loc['sidewinder']
max_speed    7
shield       8
Name: sidewinder, dtype: int64
>>> df.loc['sidewinder', 'max_speed']
7

In addition, Series derived from a different Frame can be used as indexers (#1155).

>>> import databricks.koalas as ks
>>>
>>> ks.options.compute.ops_on_diff_frames = True
>>>
>>> df1 = ks.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [100, 200, 300, 400, 500]},
...                    index=[20, 10, 30, 0, 50])
>>> df2 = ks.DataFrame({'A': [0, -1, -2, -3, -4], 'B': [-100, -200, -300, -400, -500]},
...                    index=[20, 10, 30, 0, 50])
>>> df1.A.loc[df2.A > -3].sort_index()
10    1
20    0
30    2

Lastly, now loc uses its natural order according to index identically with pandas’ when using the slice (#1159, #1174, #1179). See the example below.

>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
...                   index=['cobra', 'viper', 'sidewinder'],
...                   columns=['max_speed', 'shield'])
>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Other new features and improvements

We added the following new features:

koalas.Series:

koalas.Index

koalas.MultiIndex:

Other improvements

  • Add support from_pandas for Index/MultiIndex. (#1170)

  • Add a hidden column __natural_order__. (#1146)

  • Introduce _LocIndexerLike and consolidate some logic. (#1149)

  • Refactor LocIndexerLike.__getitem__. (#1152)

  • Remove sort in GroupBy._reduce_for_stat_function. (#1147)

  • Randomize index in tests and fix some window-like functions. (#1151)

  • Explicitly don’t support Index.duplicated (#1131)

  • Fix DataFrame._repr_html_(). (#1177)