Koalas documentation was redesigned with a better theme, pydata-sphinx-theme. Please check the new Koalas documentation site out.
transform_batch
apply_batch
We added the APIs that enable you to directly transform and apply a function against Koalas Series or DataFrame. map_in_pandas is deprecated and now renamed to apply_batch.
map_in_pandas
import databricks.koalas as ks kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]}) def pandas_plus(pdf): return pdf + 1 # should always return the same length as input. kdf.transform_batch(pandas_plus)
import databricks.koalas as ks kdf = ks.DataFrame({'a': [1,2,3], 'b':[4,5,6]}) def pandas_plus(pdf): return pdf[pdf.a > 1] # allow arbitrary length kdf.apply_batch(pandas_plus)
Please also check Transform and apply a function in Koalas documentation.
We added the following new feature:
DataFrame:
truncate (#1408)
truncate
hint (#1415)
hint
SeriesGroupBy:
unique (#1426)
unique
Index:
spark_column (#1438)
spark_column
Series:
MultiIndex:
Fix from_pandas to handle the same index name as a column name. (#1419)
Add documentation about non-Koalas APIs (#1420)
Hot-fixing the lack of keyword argument ‘deep’ for DataFrame.copy() (#1423)
Fix Series.div when divide by zero (#1412)
Support expand parameter if n is a positive integer in Series.str.split/rsplit. (#1432)
Make Series.astype(bool) follow the concept of “truthy” and “falsey”. (#1431)
Fix incompatible behaviour with pandas for floordiv with np.nan (#1429)
Use mapInPandas for apply_batch API in Spark 3.0 (#1440)
Use F.datediff() for subtraction of dates as a workaround. (#1439)