Glossary

AUC-ROC: an evaluation metric for classification tasks that equals the area under the ROC curve. The metric's values are in the range from 0 to 1. The AUC-ROC value for a random model is 0.5.

Downsampling: a class balancing technique that decreases the number of observations by randomly dropping the majority class observations.

False Positive Rate (FPR): is the result of the FP answers divided by all negative answers:

PR curve: a plot that shows precision against recall.

ROC curve: a plot that shows the true positive rate against the false positive rate.

Threshold: the probability boundary that separates the positive and negative classes.

True Positive Rate (TPR): the result of the TP answers divided by all positive answers:

Upsampling: a class balancing technique that increases the number of observations by duplicating the rarer class observations several times.

Coefficient of determination:/R2 metric: divides the Model MSE by the Mean MSE and then subtracts the obtained value from one. The formula is:

\text{R2} = 1-\frac{\text{Model MSE}}{\text{Mean MSE}}

Mean Absolute Error (MAE): regression evaluation metric.

MAE = \frac{1}{N} \sum_{i=1}^N |y_i - \hat y_i|

Conventional Notations for Data Science

$y_i$ is the target value for the observation with serial number $i$ in the sample

$\hat y_i$ is the prediction value for the observation with serial number $i$

$y_i - \hat y_i$ is the observation error

$|y_i - \hat y_i|$ is the observation absolute error

$N$ is the number of observations in the sample

$\displaystyle\sum_{i=1}^N$ is summation over all observations of the sample ( $i$ varies in the range from 1 to $N$ ).