databricks.koalas.groupby.DataFrameGroupBy.agg¶
-
DataFrameGroupBy.
agg
(func_or_funcs=None, *args, **kwargs) → databricks.koalas.frame.DataFrame¶ Aggregate using one or more operations over the specified axis.
- Parameters
- func_or_funcsdict, str or list
a dict mapping from column name (string) to aggregate functions (string or list of strings).
- Returns
- Series or DataFrame
The return can be:
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions
Return Series or DataFrame.
Notes
agg is an alias for aggregate. Use the alias.
Examples
>>> df = ks.DataFrame({'A': [1, 1, 2, 2], ... 'B': [1, 2, 3, 4], ... 'C': [0.362, 0.227, 1.267, -0.562]}, ... columns=['A', 'B', 'C'])
>>> df A B C 0 1 1 0.362 1 1 2 0.227 2 2 3 1.267 3 2 4 -0.562
Different aggregations per column
>>> aggregated = df.groupby('A').agg({'B': 'min', 'C': 'sum'}) >>> aggregated[['B', 'C']].sort_index() B C A 1 1 0.589 2 3 0.705
>>> aggregated = df.groupby('A').agg({'B': ['min', 'max']}) >>> aggregated.sort_index() B min max A 1 1 2 2 3 4
>>> aggregated = df.groupby('A').agg('min') >>> aggregated.sort_index() B C A 1 1 0.227 2 3 -0.562
>>> aggregated = df.groupby('A').agg(['min', 'max']) >>> aggregated.sort_index() B C min max min max A 1 1 2 0.227 0.362 2 3 4 -0.562 1.267
To control the output names with different aggregations per column, Koalas also supports ‘named aggregation’ or nested renaming in .agg. It can also be used when applying multiple aggregation functions to specific columns.
>>> aggregated = df.groupby('A').agg(b_max=ks.NamedAgg(column='B', aggfunc='max')) >>> aggregated.sort_index() b_max A 1 2 2 4
>>> aggregated = df.groupby('A').agg(b_max=('B', 'max'), b_min=('B', 'min')) >>> aggregated.sort_index() b_max b_min A 1 2 1 2 4 3
>>> aggregated = df.groupby('A').agg(b_max=('B', 'max'), c_min=('C', 'min')) >>> aggregated.sort_index() b_max c_min A 1 2 0.227 2 4 -0.562