Regression Metrics
Coefficient of Determination
The coefficient of determination/R2 metric (R-squared) divides the Model MSE by the Mean MSE and then subtracts the obtained value from one. If the metric increases, the model’s quality also improves.
R2 is calculated as follows:
- R2 equals one only if the MSE value is zero.
- R2 is zero: the model works as well as the mean.
- When R2 is negative, the model quality is very low.
- R2 can't have values greater than one.
To calculate this metric, you can use the r2_score()
function from the sklearn.metrics library:
1from sklearn.metrics import r2_score23print("R2 =", r2_score(target, predicted))
Mean Absolute Error
We need to state the conventional notation for Data Science:
- The target value for the observation with serial number i in the sample used to measure the quality. The subscript indicates the serial number of the observation.
- The prediction value for the observation with serial number i (in the test sample, for instance).
MAE (mean absolute error) is another evaluation metric that is similar to MSE, but not squared.
An observation's error is written as follows:
To cancel out the difference between underfitting and overfitting, absolute error is calculated:
To collect the errors throughout the sample, let's add the following notation:
- The number of observations in the sample.
- Summation over all observations of the sample (i varies in the range from 1 to ).
The formula for Mean Absolute Error is (MAE):
To calculate this metric, you can use the mean_absolute_error()
function from the sklearn.metrics library:
1from sklearn.metrics import mean_absolute_error23mae = mean_absolute_error(target, predicted))
To calculate the MSE value we used the mean value as a constant.
The constant model should be picked in the manner that allows you to obtain the lowest possible MAE metric value. We need to find the value a, at which the minimum is reached:
The minimum is reached when a is equal to the target median.
Unlike MAE, the RMSE metric is more sensitive to large values. Significant errors strongly affect the final RMSE value.