Aggregating Data
Built-In Aggregation Functions
1# Important operations for columns with numeric values2data['column'].sum()3data['column'].min()4data['column'].max()5data['column'].mean()6data['column'].median()
Applying Functions to Columns with Dictionaries and agg()
The agg()
method is used to apply functions to particular columns. The column name and the functions themselves are recorded in a data structure called a dictionary. Dictionaries are comprised of keys and values. The key is the name of the column the functions must be used on, while the value is the list of function names.
1{'column':['function1','function2']}
When you use the agg()
method, column names become complex. Take a look at the result of using ['function1']
on ['column']
. We can simply write them one after the other:
1data['column']['function1']
1# Performing several operations on a column while grouping2data.groupby('column1').agg({'column2': ['count', 'sum'], 'column3': ['min', 'max']})
Pivot Tables
Pivot tables are your best friends when it comes to processing rearranged or concentrated data derived from huge tables, focused on particular aspects.
In order to prepare pivot tables in pandas, we use the pivot_table()
method.
Method arguments:
index
: the column or columns data is grouped bycolumns
: the column with the values used to group the datavalues
: the values we want to see in the pivot tableaggfunc
: the function applied to those values
1data_pivot = data.pivot_table(2 index=['column1', 'column2'],3 columns='source',4 values='column_pivot',5 aggfunc='function'6)