Apply function func group-wise and combine the results together.

The function passed to apply must take a DataFrame as its first argument and return a DataFrame. apply will then take care of combining the results back together into a single dataframe. apply is therefore a highly flexible grouping method.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform. Koalas offers a wide range of method that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply.


this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.

To avoid this, specify return type in func, for instance, as below:

>>> def pandas_div_sum(x) -> ks.DataFrame[float, float]:
...    return x[['B', 'C']] / x[['B', 'C']].sum()

If the return type is specified, the output column names become c0, c1, c2 … cn. These names are positionally mapped to the returned DataFrame in func. See examples below.


the dataframe within func is actually a pandas dataframe. Therefore, any pandas APIs within this function is allowed.


A callable that takes a DataFrame as its first argument, and returns a dataframe.


See also


Apply aggregate function to the GroupBy object.


Apply a function to a Series.


>>> df = ks.DataFrame({'A': 'a a b'.split(),
...                    'B': [1, 2, 3],
...                    'C': [4, 6, 5]}, columns=['A', 'B', 'C'])
>>> g = df.groupby('A')

Notice that g has two groups, a and b. Calling apply in various ways, we can get different grouping results:

Below the functions passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> def pandas_div_sum(x) -> ks.DataFrame[float, float]:
...    return x[['B', 'C']] / x[['B', 'C']].sum()
>>> g.apply(pandas_div_sum)  
         c0   c1
0  1.000000  1.0
1  0.333333  0.4
2  0.666667  0.6
>>> def plus_max(x) -> ks.DataFrame[str,,]:
...    return x + x.max()
>>> g.apply(plus_max)  
   c0  c1  c2
0  bb   6  10
1  aa   3  10
2  aa   4  12

You can omit the type hint and let Koalas infer its type.

>>> def plus_min(x):
...    return x + x.min()
>>> g.apply(plus_min).sort_index()  
    A  B   C
0  aa  2   8
1  aa  3  10
2  bb  6  10

In case of Series, it works as below.

>>> def plus_max(x) -> ks.Series[]:
...    return x + x.max()
>>> df.B.groupby(df.A).apply(plus_max)
0    6
1    3
2    4
Name: B, dtype: int32
>>> def plus_min(x):
...    return x + x.min()
>>> df.B.groupby(df.A).apply(plus_min)
0    2
1    3
2    6
Name: B, dtype: int64
Scroll To Top