Implementing New Functionality
Glossary
A/B testing/Split testing — a technique for hypothesis testing that helps to monitor the impact of service or product changes on the users. The technique implies the following: the population is split into control group that uses the regular service with no changes and treatment group that uses the new version, the one we need to test.
Confidence interval — a segment of the number axis, which the population parameter of interest falls into with a predetermined probability.
Peeking problem — the overall result is distorted when new data is added at the beginning of the experiment.
Type I error — when the null hypothesis is correct but it is rejected (false positive result; new functionality is approved hence positive)
Type II error — when the null hypothesis is wrong but it is accepted (false negative result)
Practice
1# Finding confidence interval for the mean2# alpha — level of significance;3# df — number of degrees of freedom, = n - 1;4# loc — mean distribution, equals to mean estimate.5# sample = sample.mean();6# scale — distribution standard error, equals standard error estimate.7# Calculation: sample.sem()89from scipy import stats as st1011confidence_interval = st.t.interval(12 alpha = alpha,13 df=df,14 loc=sample.mean(),15 scale=sample.sem()16)
1# Extracting subsample for bootstrap23from numpy.random import RandomState4state = RandomState(12345)56# without replacement7print(example_data.sample(frac=1, replace=False, random_state=state))8# with replacement9print(example_data.sample(frac=1, replace=True, random_state=state))
1# A/B test analysis using bootstrap23import pandas as pd4import numpy as np56# actual difference between the means in the groups7AB_difference = samples_B.mean() - samples_A.mean()89alpha = 0.051011state = np.random.RandomState(54321)1213bootstrap_samples = 100014count = 015for i in range(bootstrap_samples):16 # calculate how many times the difference17 # between the means will exceed the actual value,18 # provided that the null hypothesis is true19 united_samples = pd.concat([samples_A, samples_B])20 subsample = united_samples.sample(21 frac=1,22 replace=True,23 random_state=state24 )2526 subsample_A = subsample[:len(samples_A)]27 subsample_B = subsample[len(samples_A):]28 bootstrap_difference = subsample_B.mean() - subsample_A.mean()2930 if bootstrap_difference >= AB_difference:31 count += 13233pvalue = 1. * count / bootstrap_samples34print('p-value =', pvalue)3536if pvalue < alpha:37 print("Reject null hypothesis: average purchase amount is likely to increase")38else:39 print("Failed to reject null hypothesis: average purchase amount is unlikely to increase")