Getting started
User Guide
API Reference
Development
Release Notes
Options and settings
Working with pandas and PySpark
Transform and apply a function
Type Support In Koalas
Type Hints In Koalas
Best Practices
FAQ
User Guide
¶
Options and settings
Getting and setting options
Operations on different DataFrames
Default Index type
Available options
Working with pandas and PySpark
pandas
PySpark
Transform and apply a function
transform
and
apply
koalas.transform_batch
and
koalas.apply_batch
Type Support In Koalas
Type casting between PySpark and Koalas
Type casting between pandas and Koalas
Internal type mapping
Type Hints In Koalas
Koalas DataFrame and Pandas DataFrame
Type Hinting with Names
Best Practices
Leverage PySpark APIs
Check execution plans
Use checkpoint
Avoid shuffling
Avoid computation on single partition
Avoid reserved column names
Do not use duplicated column names
Specify the index column in conversion from Spark DataFrame to Koalas DataFrame
Use
distributed
or
distributed-sequence
default index
Reduce the operations on different DataFrame/Series
Use Koalas APIs directly whenever possible
FAQ
What’s the project’s status?
Is it Koalas or koalas?
Should I use PySpark’s DataFrame API or Koalas?
Does Koalas support Structured Streaming?
How can I request support for a method?
How is Koalas different from Dask?
How can I contribute to Koalas?
Why a new project (instead of putting this in Apache Spark itself)?
Koalas Talks and Blogs
Options and settings