We continue improving multi-index columns support (#793, #776). We made the following APIs support multi-index columns:
applymap (#793)
applymap
shift (#793)
shift
diff (#793)
diff
fillna (#793)
fillna
rank (#793)
rank
Also, we can set tuple or None name for Series and Index. (#776)
>>> import databricks.koalas as ks >>> kser = ks.Series([1, 2, 3]) >>> kser.name = ('a', 'b') >>> kser 0 1 1 2 2 3 Name: (a, b), dtype: int64
We also continue adding plot APIs as follows:
For Series:
plot.kde() (#767)
plot.kde()
For DataFrame:
plot.hist() (#780)
plot.hist()
In addition, we added the support for namespace-access in options (#785).
>>> import databricks.koalas as ks >>> ks.options.display.max_rows 1000 >>> ks.options.display.max_rows = 10 >>> ks.options.display.max_rows 10
See also User Guide of our project docs.
We added the following new features:
koalas.DataFrame:
aggregate (#796)
aggregate
agg (#796)
agg
items (#787)
items
koalas.indexes.Index/MultiIndex
is_boolean (#795)
is_boolean
is_categorical (#795)
is_categorical
is_floating (#795)
is_floating
is_integer (#795)
is_integer
is_interval (#795)
is_interval
is_numeric (#795)
is_numeric
is_object (#795)
is_object
Along with the following improvements:
Add index_col for read_json (#797)
index_col
read_json
Add index_col for spark IO reads (#769, #775)
Add “sep” parameter for read_csv (#777)
Add axis parameter to dataframe.diff (#774)
Add read_json and let to_json use spark.write.json (#753)
Use spark.write.csv in to_csv of Series and DataFrame (#749)
Handle TimestampType separately when convert to pandas’ dtype. (#798)
Fix spark_df when set_index(.., drop=False). (#792)
spark_df
set_index(.., drop=False)
We removed some parameters in DataFrame.to_csv and DataFrame.to_json to allow distributed writing (#749, #753)
DataFrame.to_csv
DataFrame.to_json