Chapter Summary: Cohort Analysis
Cohort Analysis
A cohort is a group of people for whom a certain event took place around the same time.
To form a cohort, you need to determine:
- the event that's common to them
- the time period during which this event must occur
An event is a recorded instance of a user's performing a certain action.
The period is the span of time during which the event took place.
Once you've defined an event and a time period, you can form a cohort.
Analysts divide the customer base into cohorts to identify trends in user behavior. This is retrospective analysis since it shows how users behaved in the past, so it's rarely used to forecast user behavior.
Assessing Changes in Absolute Values by Month
The majority of quantitative observations are recorded as absolute values.
An absolute value is the volume, size, or magnitude of an observed event or phenomenon.
Using cohort analysis we can track changes in the number of active buyers within a single cohort. This tells us how many people will continue making purchases in the following months.
Tracking absolute values by month can help you identify changes that take place over time. Moreover, it reveals features common to all cohorts, such as seasonal drops in the number of users. We can also draw a conclusion about the share of each cohort in the total number of purchasers per month.
Assessing Changes in Relative Values by Lifetime
Cohort analysis can be used to find the average purchase size (total sales divided by the number of customers) changes over time. To do this we need to calculate the lifetime. This shows which month an event occurred in in terms of distance from the cohort month.
Visualizing Cohort Analysis
A heatmap is a table visualization where cells vary in color depending on their proximity to the maximum and minimum values.
To make a heatmap In Python, you'll need the heatmap()
method from the seaborn library. Its parameters are:
annot=True
(annotate), which means the value will be displayed for each cellfmt='.1f'
(format), which sets the printing format (here, one decimal place)linewidths=1
which sets the width of the line separating heatmap cells (here, 1 pixel);linecolor='gray'
which sets the line color (here, gray)
1import seaborn as sns2from matplotlib import pyplot as plt34plt.figure(figsize=(13, 9))5sns.heatmap(dataframe, annot=True, fmt='.1f', linewidths=1, linecolor='gray')
Retention Rate and Churn Rate
Cohort analysis is used to analyze user behavior for digital products such as Retention rate and churn rate.
The retention rate tells you how many users from a cohort have remained active compared to their initial number.
The churn rate tells you the share of users that quit using the product over time.
To calculate the churn rate in Python use the pct_change()
method.
1cohorts['churn_rate'] = cohorts.groupby(['column1'])['column2'].pct_change()
Behavioral Cohorts
Behavioral cohort analysis involves singling out a cohort of users who performed an action or a sequence of actions during a certain period of time. The goal is to study how a target metric (e.g. retention rate) changed over time for the behavioral cohort.