Table Of Contents

Search

Enter search terms or a module, class or function name.

Koalas: pandas API on Apache SparkΒΆ

The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.

pandas is the de facto standard (single-node) DataFrame implementation in Python, while Spark is the de facto standard for big data processing. With this package, you can:

  • Be immediately productive with Spark, with no learning curve, if you are already familiar with pandas.

  • Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets).

Scroll To Top