Version 0.32.0

Koalas documentation redesign

Koalas documentation was redesigned with a better theme, pydata-sphinx-theme. Please check the new Koalas documentation site out.

transform_batch and apply_batch

We added the APIs that enable you to directly transform and apply a function against Koalas Series or DataFrame. map_in_pandas is deprecated and now renamed to apply_batch.

import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
    return pdf + 1  # should always return the same length as input.

kdf.transform_batch(pandas_plus)
import databricks.koalas as ks
kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]})
def pandas_plus(pdf):
    return pdf[pdf.a > 1]  # allow arbitrary length

kdf.apply_batch(pandas_plus)

Please also check Transform and apply a function in Koalas documentation.

Other new features and improvements

We added the following new feature:

DataFrame:​

SeriesGroupBy:

Index:

Series:

MultiIndex:

Other improvements

  • Fix from_pandas to handle the same index name as a column name. (#1419)

  • Add documentation about non-Koalas APIs (#1420)

  • Hot-fixing the lack of keyword argument ‘deep’ for DataFrame.copy() (#1423)

  • Fix Series.div when divide by zero (#1412)

  • Support expand parameter if n is a positive integer in Series.str.split/rsplit. (#1432)

  • Make Series.astype(bool) follow the concept of “truthy” and “falsey”. (#1431)

  • Fix incompatible behaviour with pandas for floordiv with np.nan (#1429)

  • Use mapInPandas for apply_batch API in Spark 3.0 (#1440)

  • Use F.datediff() for subtraction of dates as a workaround. (#1439)