Version 0.32.0¶

Koalas documentation redesign¶

Koalas documentation was redesigned with a better theme, pydata-sphinx-theme. Please check the new Koalas documentation site out.

`transform_batch` and `apply_batch`¶

We added the APIs that enable you to directly transform and apply a function against Koalas Series or DataFrame. map_in_pandas is deprecated and now renamed to apply_batch.

import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
    return pdf + 1  # should always return the same length as input.

kdf.transform_batch(pandas_plus)

import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
    return pdf[pdf.a > 1]  # allow arbitrary length

kdf.apply_batch(pandas_plus)

Please also check Transform and apply a function in Koalas documentation.

Other new features and improvements¶

We added the following new feature:

DataFrame:

truncate (#1408)
hint (#1415)

SeriesGroupBy:

unique (#1426)

Index:

spark_column (#1438)

Series:

spark_column (#1438)

MultiIndex:

spark_column (#1438)

Other improvements¶

Fix from_pandas to handle the same index name as a column name. (#1419)
Add documentation about non-Koalas APIs (#1420)
Hot-fixing the lack of keyword argument ‘deep’ for DataFrame.copy() (#1423)
Fix Series.div when divide by zero (#1412)
Support expand parameter if n is a positive integer in Series.str.split/rsplit. (#1432)
Make Series.astype(bool) follow the concept of “truthy” and “falsey”. (#1431)
Fix incompatible behaviour with pandas for floordiv with np.nan (#1429)
Use mapInPandas for apply_batch API in Spark 3.0 (#1440)
Use F.datediff() for subtraction of dates as a workaround. (#1439)

Version 0.33.0 Version 0.31.0

Version 0.32.0¶

Koalas documentation redesign¶

transform_batch and apply_batch¶

Other new features and improvements¶

Other improvements¶

`transform_batch` and `apply_batch`¶