User Guide¶
- Options and settings
- Working with pandas and PySpark
- Transform and apply a function
- Type Support In Koalas
- Type Hints In Koalas
- From/to other DBMSes
- Best Practices
- Leverage PySpark APIs
- Check execution plans
- Use checkpoint
- Avoid shuffling
- Avoid computation on single partition
- Avoid reserved column names
- Do not use duplicated column names
- Specify the index column in conversion from Spark DataFrame to Koalas DataFrame
- Use
distributed
ordistributed-sequence
default index - Reduce the operations on different DataFrame/Series
- Use Koalas APIs directly whenever possible
- FAQ
- What’s the project’s status?
- Is it Koalas or koalas?
- Should I use PySpark’s DataFrame API or Koalas?
- Does Koalas support Structured Streaming?
- How can I request support for a method?
- How is Koalas different from Dask?
- How can I contribute to Koalas?
- Why a new project (instead of putting this in Apache Spark itself)?