Knowledge Base

Glossary

Glossary

AUC-ROC: an evaluation metric for classification tasks that equals the area under the ROC curve. The metric's values are in the range from 0 to 1. The AUC-ROC value for a random model is 0.5.

Downsampling: a class balancing technique that decreases the number of observations by randomly dropping the majority class observations.

False Positive Rate (FPR): is the result of the FP answers divided by all negative answers:

PR curve: a plot that shows precision against recall.

ROC curve: a plot that shows the true positive rate against the false positive rate.

Threshold: the probability boundary that separates the positive and negative classes.

True Positive Rate (TPR): the result of the TP answers divided by all positive answers: https://pictures.s3.yandex.net/resources/tpr-73506a98-737c-4dfe-b0eb-492c6a4045aa_1573943196.jpg

Upsampling: a class balancing technique that increases the number of observations by duplicating the rarer class observations several times.

Coefficient of determination:/R2 metric: divides the Model MSE by the Mean MSE and then subtracts the obtained value from one. The formula is:

R2=1Model MSEMean MSE\text{R2} = 1-\frac{\text{Model MSE}}{\text{Mean MSE}}

Mean Absolute Error (MAE): regression evaluation metric.

MAE=1Ni=1Nyiy^iMAE = \frac{1}{N} \sum_{i=1}^N |y_i - \hat y_i|

Conventional Notations for Data Science

yiy_i is the target value for the observation with serial number ii in the sample

y^i\hat y_i is the prediction value for the observation with serial number ii

yiy^iy_i - \hat y_i is the observation error

yiy^i|y_i - \hat y_i| is the observation absolute error

NN is the number of observations in the sample

i=1N\displaystyle\sum_{i=1}^N is summation over all observations of the sample (ii varies in the range from 1 to NN).

Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,