Version 0.14.0ΒΆ

We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.

>>> import databricks.koalas as ks
>>> import numpy as np
>>>
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
...           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
>>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)
>>> kdf
        bar                 baz                 foo                 qux
        one       two       one       two       one       two       one       two
A -1.574777  0.805108  0.139748  1.287946 -1.782297 -0.152292  0.680594  1.419407
B  0.076886 -1.560807  0.403807 -0.715029  1.236899 -0.364483 -1.548554  0.076003
C -0.575168  0.061539 -2.083615 -0.816090 -1.267440  0.745949 -1.194421  0.468818
>>> kdf['bar']
        one       two
A -1.574777  0.805108
B  0.076886 -1.560807
C -0.575168  0.061539
>>> kdf['bar']['two']
A    0.805108
B   -1.560807
C    0.061539
Name: two, dtype: float64

In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.

We also added the following features:

koalas.DataFrame:

koalas.Series:

koalas.indexes.Index:

  • Index.rename() (#581)

koalas.groupby.GroupBy:

Along with the following improvements:

  • pandas 0.25 support (#579)

  • method and limit parameter support in DataFrame.fillna() (#565)

  • Dots (.) in columns names are allowed (#490)

  • Add support of level argument for DataFrame/Series.sort_index() (#583)