Version 1.3.0

pandas 1.1 support

We verified the behaviors of pandas 1.1 in Koalas. Koalas now supports pandas 1.1 officially (#1688, #1822, #1829).

Support for non-string names

Now we support for non-string names (#1784). Previously names in Koalas, e.g., df.columns, df.colums.names, df.index.names, needed to be a string or a tuple of string, but it should allow other data types which are supported by Spark.

Before:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Index(['0', '1'], dtype='object')

After:

>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']])
>>> kdf.columns
Int64Index([0, 1], dtype='int64')

Improve distributed-sequence default index

The performance is improved when creating a distributed-sequence as a default index type by avoiding the interaction between Python and JVM (#1699).

Standardize binary operations between int and str columns

Make behaviors of binary operations (+, -, *, /, //, %) between int and str columns consistent with respective pandas behaviors (#1828).

It standardizes binary operations as follows:

  • +: raise TypeError between int column and str column (or string literal)

  • *: act as spark SQL repeat between int column(or int literal) and str columns; raise TypeError if a string literal is involved

  • -, /, //, %(modulo): raise TypeError if a str column (or string literal) is involved

Other new features and improvements

We added the following new features:

DataFrame:

Series:

Index:

MultiIndex:

GroupBy:

Other improvements

  • Fix DataFrame.mad to work properly (#1749)

  • Fix Series name after binary operations. (#1753)

  • Fix GroupBy.cum~ for matching with pandas’ behavior (#1708)

  • Fix cumprod to work properly with Integer columns. (#1750)

  • Fix DataFrame.join for MultiIndex (#1771)

  • Exception handling for from_frame properly (#1791)

  • Fix iloc for slice(None, 0) (#1767)

  • Fix Series.__repr__ when Series.name is None. (#1796)

  • DataFrame.reindex supports koalas Index parameter (#1741)

  • Fix Series.fillna with inplace=True on non-nullable column. (#1809)

  • Input check in various APIs (#1808, #1810, #1811, #1812, #1813, #1814, #1816, #1824)

  • Fix to_list work properly in pandas==0.23 (#1823)

  • Fix Series.astype to work properly (#1818)

  • Frame.groupby supports dropna (#1815)