GroupBy.
transform
Apply function column-by-column to the GroupBy object.
The function passed to transform must take a Series as its first argument and return a Series. The given function is executed for each series in each grouped data.
While transform is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform. Koalas offers a wide range of method that will be much faster than using transform for their specific purposes, so try to use them before reaching for transform.
Note
this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.
To avoid this, specify return type in func, for instance, as below:
func
>>> def convert_to_string(x) -> ks.Series[str]: ... return x.apply("a string {}".format)
the series within func is actually a pandas series. Therefore, any pandas APIs within this function is allowed.
A callable that takes a Series as its first argument, and returns a Series.
Positional arguments to pass to func.
Keyword arguments to pass to func.
See also
aggregate
Apply aggregate function to the GroupBy object.
Series.apply
Apply a function to a Series.
Examples
>>> df = ks.DataFrame({'A': [0, 0, 1], ... 'B': [1, 2, 3], ... 'C': [4, 6, 5]}, columns=['A', 'B', 'C'])
>>> g = df.groupby('A')
Notice that g has two groups, 0 and 1. Calling transform in various ways, we can get different grouping results: Below the functions passed to transform takes a Series as its argument and returns a Series. transform applies the function on each series in each grouped data, and combine them into a new DataFrame:
g
0
1
>>> def convert_to_string(x) -> ks.Series[str]: ... return x.apply("a string {}".format) >>> g.transform(convert_to_string) B C 0 a string 1 a string 4 1 a string 2 a string 6 2 a string 3 a string 5
>>> def plus_max(x) -> ks.Series[np.int]: ... return x + x.max() >>> g.transform(plus_max) B C 0 3 10 1 4 12 2 6 10
You can omit the type hint and let Koalas infer its type.
>>> def plus_min(x): ... return x + x.min() >>> g.transform(plus_min) B C 0 2 8 1 3 10 2 6 10
In case of Series, it works as below.
>>> df.B.groupby(df.A).transform(plus_max) 0 3 1 4 2 6 Name: B, dtype: int64
>>> (df * -1).B.groupby(df.A).transform(abs) 0 1 1 2 2 3 Name: B, dtype: int64
You can also specify extra arguments to pass to the function.
>>> def calculation(x, y, z) -> ks.Series[np.int]: ... return x + x.min() + y + z >>> g.transform(calculation, 5, z=20) B C 0 27 33 1 28 35 2 31 35