Version 0.17.0

Options

We started using options to configure the Koalas’ behavior. Now we have the following options:

  • display.max_rows (#714, #742)

  • compute.max_rows (#721, #736)

  • compute.shortcut_limit (#717)

  • compute.ops_on_diff_frames (#725)

  • compute.default_index_type (#723)

  • plotting.max_rows (#728)

  • plotting.sample_ratio (#737)

We can also see the list and their descriptions in the User Guide of our project docs.

Plots

We continue adding plot APIs as follows:

For Series:

  • plot.area() (#704)

For DataFrame:

Multi-index columns support

We also continue improving multi-index columns support. We made the following APIs support multi-index columns:

  • koalas.concat() (#680)

  • koalas.get_dummies() (#695)

  • DataFrame.pivot_table() (#635)

Other new features and improvements

We added the following new features:

koalas:

  • read_sql_table() (#741)

  • read_sql_query() (#741)

  • read_sql() (#741)

koalas.DataFrame:

Along with the following improvements:

  • GroupBy.apply should return Koalas DataFrame instead of pandas DataFrame (#731)

  • Fix rpow and rfloordiv to use proper operators in Series (#735)

  • Fix rpow and rfloordiv to use proper operators in DataFrame (#740)

  • Add schema inference support at DataFrame.transform (#732)

  • Add Option class to support type check and value check in options (#739)

  • Added missing tests (#687, #692, #694, #709, #711, #730, #729, #733, #734)

Backward compatibility

  • We renamed two of the default index names from one-by-one and distributed-one-by-one to sequence and distributed-sequence respectively. (#679)

  • We moved the configuration for enabling operations on different DataFrames from the environment variable to the option. (#725)

  • We moved the configuration for the default index from the environment variable to the option. (#723)