# Version 0.16.0¶

Firstly, we introduced new mode to enable operations on different DataFrames (#633). This mode can be enabled by setting OPS_ON_DIFF_FRAMES environment variable is set to true as below:

>>> import databricks.koalas as ks
>>>
>>> kdf1 = ks.range(5)
>>> kdf2 = ks.DataFrame({'id': [5, 4, 3]})
>>> (kdf1 - kdf2).sort_index()
id
0 -5.0
1 -3.0
2 -1.0
3  NaN
4  NaN

>>> import databricks.koalas as ks
>>>
>>> kdf = ks.range(5)
>>> kdf['new_col'] = ks.Series([1, 2, 3, 4])
>>> kdf
id  new_col
0   0      1.0
1   1      2.0
3   3      4.0
2   2      3.0
4   4      NaN


Secondly, we also introduced default index and disallowed Koalas DataFrame with no index internally (#639)(#655). For example, if you create Koalas DataFrame from Spark DataFrame, the default index is used. The default index implementation can be configured by setting DEFAULT_INDEX as one of three types:

• (default) one-by-one: It implements a one-by-one sequence by Window function without specifying partition. This index type should be avoided when the data is large.

>>> ks.range(3)
id
0   0
1   1
2   2

• distributed-one-by-one: It implements a one-by-one sequence by group-by and group-map approach. It still generates a one-by-one sequential index globally. If the default index must be a one-by-one sequence in a large dataset, this index can be used.

>>> ks.range(3)
id
0   0
1   1
2   2

• distributed: It implements a monotonically increasing sequence simply by using Spark’s monotonically_increasing_id function. If the index does not have to be a one-by-one sequence, this index can be used. Performance-wise, this index almost does not have any penalty comparing to other index types.

>>> ks.range(3)
id
25769803776   0
60129542144   1
94489280512   2


Thirdly, we implemented many plot APIs in Series as follows:

See the example below:

import databricks.koalas as ks

ks.range(10).to_pandas().id.plot.pie()


image

Fourthly, we rapidly improved multi-index columns support continuously. Now multi-index columns are supported in multiple APIs:

• DataFrame.sort_index()(#637)

• GroupBy.diff()(#653)

• GroupBy.rank()(#653)

• Series.any()(#652)

• Series.all()(#652)

• DataFrame.any()(#652)

• DataFrame.all()(#652)

• DataFrame.assign()(#657)

• DataFrame.drop()(#658)

• DataFrame.reindex()(#659)

• Series.quantile()(#663)

• Series,transform()(#663)

• DataFrame.select_dtypes()(#662)

• DataFrame.transpose()(#664).

Lastly we added new functionalities, especially for groupby-related functionalities, in the past weeks. We added the following features:

koalas.DataFrame

koalas.groupby.GroupBy:

Along with the following improvements:

• Add a basic infrastructure for configurations. (#645)

• Always use column_index. (#648)

• Allow to omit type hint in GroupBy.transform, filter, apply (#646)