Installation¶
Koalas requires PySpark so please make sure your PySpark is available.
To install Koalas, you can use:
To install PySpark, you can use:
Python version support¶
Officially Python 3.5 to 3.8.
Note
Koalas support for Python 3.5 is deprecated and will be dropped in the future release. At that point, existing Python 3.5 workflows that use Koalas will continue to work without modification, but Python 3.5 users will no longer get access to the latest Koalas features and bugfixes. We recommend that you upgrade to Python 3.6 or newer.
Installing Koalas¶
Installing with Conda¶
First you will need Conda to be installed. After that, we should create a new conda environment. A conda environment is similar with a virtualenv that allows you to specify a specific version of Python and set of libraries. Run the following commands from a terminal window:
conda create --name koalas-dev-env
This will create a minimal environment with only Python installed in it. To put your self inside this environment run:
conda activate koalas-dev-env
The final step required is to install Koalas. This can be done with the following command:
conda install -c conda-forge koalas
To install a specific Koalas version:
conda install -c conda-forge koalas=1.3.0
Installing from source¶
See the Contribution Guide for complete instructions.
Installing PySpark¶
Installing with the official release channel¶
You can install PySpark by downloading a release in the official release channel. Once you download the release, un-tar it first as below:
tar xzvf spark-2.4.4-bin-hadoop2.7.tgz
After that, make sure set SPARK_HOME
environment variable to indicate the directory you untar-ed:
cd spark-2.4.4-bin-hadoop2.7
export SPARK_HOME=`pwd`
Also, make sure your PYTHONPATH
can find the PySpark and Py4J under $SPARK_HOME/python/lib
:
export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
Installing from source¶
To install PySpark from source, refer Building Spark.
Likewise, make sure you set SPARK_HOME
environment variable to the git-cloned directory, and your
PYTHONPATH
environment variable can find the PySpark and Py4J under $SPARK_HOME/python/lib
:
export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH