Glossary
Glossary
AUC-ROC: an evaluation metric for classification tasks that equals the area under the ROC curve. The metric's values are in the range from 0 to 1. The AUC-ROC value for a random model is 0.5.
Downsampling: a class balancing technique that decreases the number of observations by randomly dropping the majority class observations.
False Positive Rate (FPR): is the result of the FP answers divided by all negative answers:
PR curve: a plot that shows precision against recall.
ROC curve: a plot that shows the true positive rate against the false positive rate.
Threshold: the probability boundary that separates the positive and negative classes.
True Positive Rate (TPR): the result of the TP answers divided by all positive answers:
Upsampling: a class balancing technique that increases the number of observations by duplicating the rarer class observations several times.
Coefficient of determination:/R2 metric: divides the Model MSE by the Mean MSE and then subtracts the obtained value from one. The formula is:
Mean Absolute Error (MAE): regression evaluation metric.
Conventional Notations for Data Science
is the target value for the observation with serial number in the sample
is the prediction value for the observation with serial number
is the observation error
is the observation absolute error
is the number of observations in the sample
is summation over all observations of the sample ( varies in the range from 1 to ).