Version 1.5.0

Index operations support

We improved Index operations support (#1944, #1955).

Here are some examples:

  • Before

    >>> kidx = ks.Index([1, 2, 3, 4, 5])
    >>> kidx + kidx
    Int64Index([2, 4, 6, 8, 10], dtype='int64')
    >>> kidx + kidx + kidx
    Traceback (most recent call last):
    ...
    AssertionError: args should be single DataFrame or single/multiple Series
    
    >>> ks.Index([1, 2, 3, 4, 5]) + ks.Index([6, 7, 8, 9, 10])
    Traceback (most recent call last):
    ...
    AssertionError: args should be single DataFrame or single/multiple Series
    
  • After

    >>> kidx = ks.Index([1, 2, 3, 4, 5])
    >>> kidx + kidx + kidx
    Int64Index([3, 6, 9, 12, 15], dtype='int64')
    
    >>> ks.options.compute.ops_on_diff_frames = True
    >>> ks.Index([1, 2, 3, 4, 5]) + ks.Index([6, 7, 8, 9, 10])
    Int64Index([7, 9, 13, 11, 15], dtype='int64')
    

Other new features and improvements

We added the following new features:

DataFrame:

Series:

Index:

MultiIndex:

GroupBy: - tail (#1949) - median (#1957)

Other improvements and bug fixes

  • Support DataFrame parameter in Series.dot (#1931)

  • Add a best practice for checkpointing. (#1930)

  • Remove implicit switch-ons of “compute.ops_on_diff_frames” (#1953)

  • Fix Series._to_internal_pandas and introduce Index._to_internal_pandas. (#1952)

  • Fix first/last_valid_index to support empty column DataFrame. (#1923)

  • Use pandas’ transpose when the data is expected to be small. (#1932)

  • Fix tail to use the resolved copy (#1942)

  • Avoid unneeded reset_index in DataFrameGroupBy.describe. (#1951)

  • TypeError when Index.name / Series.name is not a hashable type (#1883)

  • Adjust data column names before attaching default index. (#1947)

  • Add plotly into the optional dependency in Koalas (#1939)

  • Add plotly backend test cases (#1938)

  • Don’t pass stacked in plotly area chart (#1934)

  • Set upperbound of matplotlib to avoid failure on Ubuntu (#1959)

  • Fix GroupBy.descirbe for multi-index columns. (#1922)

  • Upgrade pandas version in CI (#1961)

  • Compare Series from the same anchor (#1956)

  • Add videos from Data+AI Summit 2020 EUROPE. (#1963)

  • Set PYARROW_IGNORE_TIMEZONE for binder. (#1965)