Table Of Contents

Search

Enter search terms or a module, class or function name.

Koalas: pandas APIs on Apache Spark

The Koalas project makes data scientists more productive when interacting with big data, by augmenting Apache Spark’s Python DataFrame API to be compatible with Pandas’.

Pandas is the de facto standard (single-node) dataframe implementation in Python, while Spark is the de facto standard for big data processing. With this package, data scientists can:

  • Be immediately productive with Spark, with no learning curve, if one is already familiar with Pandas.
  • Have a single codebase that works both with Pandas (tests, smaller datasets) and with Spark (distributed datasets).
Scroll To Top