Knowledge Base

Aggregating Data

Built-In Aggregation Functions

1# Important operations for columns with numeric values
2data['column'].sum()
3data['column'].min()
4data['column'].max()
5data['column'].mean()
6data['column'].median()

Applying Functions to Columns with Dictionaries and agg()

The agg() method is used to apply functions to particular columns. The column name and the functions themselves are recorded in a data structure called a dictionary. Dictionaries are comprised of keys and values. The key is the name of the column the functions must be used on, while the value is the list of function names.

1{'column':['function1','function2']}

When you use the agg() method, column names become complex. Take a look at the result of using ['function1'] on ['column']. We can simply write them one after the other:

1data['column']['function1']

1# Performing several operations on a column while grouping
2data.groupby('column1').agg({'column2': ['count', 'sum'], 'column3': ['min', 'max']})

Pivot Tables

Pivot tables are your best friends when it comes to processing rearranged or concentrated data derived from huge tables, focused on particular aspects.

In order to prepare pivot tables in pandas, we use the pivot_table() method.

Method arguments:

  • index: the column or columns data is grouped by
  • columns: the column with the values used to group the data
  • values: the values we want to see in the pivot table
  • aggfunc: the function applied to those values
1data_pivot = data.pivot_table(
2 index=['column1', 'column2'],
3 columns='source',
4 values='column_pivot',
5 aggfunc='function'
6)
Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,