DataFrameGroupBy.
aggregate
Aggregate using one or more operations over the specified axis.
a dict mapping from column name (string) to aggregate functions (string or list of strings).
The return can be:
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions
Return Series or DataFrame.
See also
databricks.koalas.Series.groupby, databricks.koalas.DataFrame.groupby
databricks.koalas.Series.groupby
databricks.koalas.DataFrame.groupby
Notes
agg is an alias for aggregate. Use the alias.
Examples
>>> df = ks.DataFrame({'A': [1, 1, 2, 2], ... 'B': [1, 2, 3, 4], ... 'C': [0.362, 0.227, 1.267, -0.562]}, ... columns=['A', 'B', 'C'])
>>> df A B C 0 1 1 0.362 1 1 2 0.227 2 2 3 1.267 3 2 4 -0.562
Different aggregations per column
>>> aggregated = df.groupby('A').agg({'B': 'min', 'C': 'sum'}) >>> aggregated[['B', 'C']].sort_index() B C A 1 1 0.589 2 3 0.705
>>> aggregated = df.groupby('A').agg({'B': ['min', 'max']}) >>> aggregated.sort_index() B min max A 1 1 2 2 3 4
>>> aggregated = df.groupby('A').agg('min') >>> aggregated.sort_index() B C A 1 1 0.227 2 3 -0.562
>>> aggregated = df.groupby('A').agg(['min', 'max']) >>> aggregated.sort_index() B C min max min max A 1 1 2 0.227 0.362 2 3 4 -0.562 1.267
To control the output names with different aggregations per column, Koalas also supports ‘named aggregation’ or nested renaming in .agg. It can also be used when applying multiple aggregation functions to specific columns.
>>> aggregated = df.groupby('A').agg(b_max=ks.NamedAgg(column='B', aggfunc='max')) >>> aggregated.sort_index() b_max A 1 2 2 4
>>> aggregated = df.groupby('A').agg(b_max=('B', 'max'), b_min=('B', 'min')) >>> aggregated.sort_index() b_max b_min A 1 2 1 2 4 3
>>> aggregated = df.groupby('A').agg(b_max=('B', 'max'), c_min=('C', 'min')) >>> aggregated.sort_index() b_max c_min A 1 2 0.227 2 4 -0.562