databricks.koalas.DataFrame.groupby¶

DataFrame.groupby(by, axis=0, as_index: bool = True)¶

Group DataFrame or Series using a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters

bySeries, label, or list of labels: Used to determine the groups for the groupby. If Series is passed, the Series or dict VALUES will be used to determine the groups. A label or list of labels may be passed to group by the columns in self.
axisint, default 0 or ‘index’: Can only be set to 0 at the moment.
as_indexbool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.

Returns

DataFrameGroupBy or SeriesGroupBy: Depends on the calling object and returns groupby object that contains information about the groups.

See also

koalas.groupby.GroupBy

Examples

>>> df = ks.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]},
...                   columns=['Animal', 'Max Speed'])
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0

>>> df.groupby(['Animal']).mean().sort_index()  
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

>>> df.groupby(['Animal'], as_index=False).mean().sort_values('Animal')
... 
   Animal  Max Speed
...Falcon      375.0
...Parrot       25.0

databricks.koalas.DataFrame.aggregate databricks.koalas.DataFrame.rolling