Implementing New Functionality

Glossary

A/B testing/Split testing — a technique for hypothesis testing that helps to monitor the impact of service or product changes on the users. The technique implies the following: the population is split into control group that uses the regular service with no changes and treatment group that uses the new version, the one we need to test.

Confidence interval — a segment of the number axis, which the population parameter of interest falls into with a predetermined probability.

Peeking problem — the overall result is distorted when new data is added at the beginning of the experiment.

Type I error — when the null hypothesis is correct but it is rejected (false positive result; new functionality is approved hence positive)

Type II error — when the null hypothesis is wrong but it is accepted (false negative result)

Practice


1# Finding confidence interval for the mean
2# alpha — level of significance;
3# df — number of degrees of freedom, = n - 1;
4# loc — mean distribution, equals to mean estimate.
5# sample = sample.mean();
6# scale — distribution standard error, equals standard error estimate.
7# Calculation: sample.sem()
8
9from scipy import stats as st
10
11confidence_interval = st.t.interval(
12  alpha = alpha,
13  df=df,
14  loc=sample.mean(),
15  scale=sample.sem()
16)


1# Extracting subsample for bootstrap
2
3from numpy.random import RandomState
4state = RandomState(12345)
5
6# without replacement
7print(example_data.sample(frac=1, replace=False, random_state=state))
8# with replacement
9print(example_data.sample(frac=1, replace=True, random_state=state))


1# A/B test analysis using bootstrap
2
3import pandas as pd
4import numpy as np
5
6# actual difference between the means in the groups
7AB_difference = samples_B.mean() - samples_A.mean()
8
9alpha = 0.05
10
11state = np.random.RandomState(54321)
12
13bootstrap_samples = 1000
14count = 0
15for i in range(bootstrap_samples):
16    # calculate how many times the difference
17    # between the means will exceed the actual value,
18    # provided that the null hypothesis is true
19    united_samples = pd.concat([samples_A, samples_B])
20    subsample = united_samples.sample(
21      frac=1,
22      replace=True,
23      random_state=state
24    )
25
26    subsample_A = subsample[:len(samples_A)]
27    subsample_B = subsample[len(samples_A):]
28    bootstrap_difference = subsample_B.mean() - subsample_A.mean()
29
30    if bootstrap_difference >= AB_difference:
31        count += 1
32
33pvalue = 1. * count / bootstrap_samples
34print('p-value =', pvalue)
35
36if pvalue < alpha:
37    print("Reject null hypothesis: average purchase amount is likely to increase")
38else:
39    print("Failed to reject null hypothesis: average purchase amount is unlikely to increase")