Linear Regression From the Inside
Linear regression model
In linear regression, features are a vector of numbers in n-dimensional space (let's say ). The prediction of the model () is calculated as follows: the feature vector is scalar multiplied by the weight vector (), then the value of the prediction bias is added to this product:
The vector and a scalar are parameters of the model. There are parameters in the vector, and one in .
If the length of the features vector is equal to one, then there is only one feature in the sample.
Prediction plots for linear regression are set by the equation:
If you change the parameters and , you will get any straight line:
Training objective
We need to analyze the learning algorithm. Our quality metric will be MSE: the model should achieve its lowest value on the test data. The goal of the training task is formulated as follows: find the model parameters for which the value of the loss function on the training set is minimal.
Let's write the goal of the training task in vector format. The training set is represented as matrix , in which the rows correspond to objects, and the columns correspond to features. Let's denote the linear regression parameters as and . To get the prediction vector , multiply the matrix by the vector and add the prediction bias value.
The formula is:
To shorten it, let's change the notation. In the matrix, add a column consisting only of ones (it will be the 0 column); and the parameter add to the vector:
Then multiply the matrix by the vector. The prediction bias is multiplied by a vector of ones (column zero). We get the resulting prediction vector :
Now we can introduce a new notation - the vector of target feature values for the training set.
Write the formula for training the linear regression of the MSE loss function:
The argmin()
function finds the minimum and returns the indices at which it was reached.
Inverse matrix
An identity matrix is a square matrix with ones on the main diagonal and zeros elsewhere. If any matrix is multiplied by an identity matrix, we will get the same matrix :
The inverse matrix for a square matrix is a matrix with a superscript -1 whose product with is equal to the identity matrix. Multiplication can be performed in any order:
Matrices for which you can find inverses are called invertible matrix. But not every matrix has an inverse. This matrix is called a non-invertible matrix.
Non-invertible matrices are rare. If you generate a random matrix with the numpy.random.normal()
function, the probability of getting a non-invertible matrix is close to zero.
To find the inverse matrix, call the numpy.linalg.inv()
function. It will also help you check the matrix for invertibility: if the matrix is non-invertible, an error will be detected.
Training linear regression
The training linear regression is:
The minimum MSE value is obtained when the weights are equal to this value:
How did we get this formula:
- The transposed feature matrix is multiplied by itself;
- The matrix inverse to the result is calculated;
- The inverse matrix is multiplied by the transposed feature matrix;
- The result is multiplied by the vector of the target feature values.