Table Of Contents


Enter search terms or a module, class or function name.

Koalas: pandas API on Apache SparkΒΆ

Koalas makes data scientists more productive when interacting with big data, by augmenting the Apache Spark Python DataFrame API to be compatible with the pandas DataFrame API.

pandas is the de facto standard (single-node) DataFrame implementation in Python, while Spark is the de facto standard for big data processing. With Koalas package, you can:

  • Be immediately productive with Spark, with no learning curve, if you are already familiar with pandas.
  • Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets).
Scroll To Top