We added a basic multi-index support in columns (#590) as below. pandas multi-index can be also mapped.
>>> import databricks.koalas as ks >>> import numpy as np >>> >>> arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), ... np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])] >>> kdf = ks.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=arrays)
>>> kdf bar baz foo qux one two one two one two one two A -1.574777 0.805108 0.139748 1.287946 -1.782297 -0.152292 0.680594 1.419407 B 0.076886 -1.560807 0.403807 -0.715029 1.236899 -0.364483 -1.548554 0.076003 C -0.575168 0.061539 -2.083615 -0.816090 -1.267440 0.745949 -1.194421 0.468818
>>> kdf['bar'] one two A -1.574777 0.805108 B 0.076886 -1.560807 C -0.575168 0.061539
>>> kdf['bar']['two'] A 0.805108 B -1.560807 C 0.061539 Name: two, dtype: float64
In addition, we are triaging APIs to support and unsupport explicitly (#574)(#580). Some of pandas APIs would explicitly be unsupported according to Guardrails to prevent users from shooting themselves in the foot and based upon other justifications such as the cost of their operations.
We also added the following features:
koalas.DataFrame:
ffill() (#571)
bfill() (#570)
filter() (#589)
koalas.Series:
idxmax() (#587)
idxmin() (#587)
koalas.indexes.Index:
Index.rename() (#581)
koalas.groupby.GroupBy:
apply() (#584)
transform() (#585)
Along with the following improvements:
pandas 0.25 support (#579)
method and limit parameter support in DataFrame.fillna() (#565)
method
limit
DataFrame.fillna()
Dots (.) in columns names are allowed (#490)
.
Add support of level argument for DataFrame/Series.sort_index() (#583)