We verified the behaviors of pandas 1.1 in Koalas. Koalas now supports pandas 1.1 officially (#1688, #1822, #1829).
Now we support for non-string names (#1784). Previously names in Koalas, e.g., df.columns, df.colums.names, df.index.names, needed to be a string or a tuple of string, but it should allow other data types which are supported by Spark.
df.columns
df.colums.names
df.index.names
Before:
>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']]) >>> kdf.columns Index(['0', '1'], dtype='object')
After:
>>> kdf = ks.DataFrame([[1, 'x'], [2, 'y'], [3, 'z']]) >>> kdf.columns Int64Index([0, 1], dtype='int64')
distributed-sequence
The performance is improved when creating a distributed-sequence as a default index type by avoiding the interaction between Python and JVM (#1699).
Make behaviors of binary operations (+, -, *, /, //, %) between int and str columns consistent with respective pandas behaviors (#1828).
+
-
*
/
//
%
int
str
It standardizes binary operations as follows:
+: raise TypeError between int column and str column (or string literal)
TypeError
*: act as spark SQL repeat between int column(or int literal) and str columns; raise TypeError if a string literal is involved
repeat
-, /, //, %(modulo): raise TypeError if a str column (or string literal) is involved
%(modulo)
We added the following new features:
DataFrame:
product (#1739)
product
from_dict (#1778)
from_dict
pad (#1786)
pad
backfill (#1798)
backfill
Series:
reindex (#1737)
reindex
explode (#1777)
explode
argmin (#1790)
argmin
argmax (#1790)
argmax
argsort (#1793)
argsort
Index:
inferred_type (#1745)
inferred_type
item (#1744)
item
is_unique (#1766)
is_unique
asi8 (#1764)
asi8
is_type_compatible (#1765)
is_type_compatible
view (#1788)
view
insert (#1804)
insert
MultiIndex:
from_frame (#1762)
from_frame
GroupBy:
get_group (#1783)
get_group
Fix DataFrame.mad to work properly (#1749)
Fix Series name after binary operations. (#1753)
Fix GroupBy.cum~ for matching with pandas’ behavior (#1708)
Fix cumprod to work properly with Integer columns. (#1750)
Fix DataFrame.join for MultiIndex (#1771)
Exception handling for from_frame properly (#1791)
Fix iloc for slice(None, 0) (#1767)
Fix Series.__repr__ when Series.name is None. (#1796)
DataFrame.reindex supports koalas Index parameter (#1741)
Fix Series.fillna with inplace=True on non-nullable column. (#1809)
Input check in various APIs (#1808, #1810, #1811, #1812, #1813, #1814, #1816, #1824)
Fix to_list work properly in pandas==0.23 (#1823)
Fix Series.astype to work properly (#1818)
Frame.groupby supports dropna (#1815)