Algorithm Analysis
Computational complexity
Algorithm running time isn't measured in seconds. It is determined by the number of elementary operations performed by the algorithm. The running time on any given computer is usually called real running time. An algorithm's running time is also influenced by its arguments.
Denote the length of the list as . The working time is a function of , written as . The asymptotic running time of an algorithm shows how grows when increases.
When is a polynomial, the asymptotic running time is equal to the term of the highest power without a coefficient (for example, instead of ). As reaches greater values, the other terms become unimportant.
- If , the asymptotic running time . The algorithm has linear complexity. The tilde symbol () means that the asymptotic runtime is .
- If , the asymptotic running time . The algorithm has quadratic complexity.
- If , then . The algorithm has cubic complexity.
- If , then . The algorithm has constant complexity, that is, it doesn't depend on .
Linear regression model training time
The linear regression training objective is represented as follows:
Weights are calculated by this formula:
- Define the number of observations in the training set as , and the number of features as
- The size of matrix will be , and the size of vector will be
- The computational complexity will be defined as because it depends on two parameters: and
To calculate the complexity of algorithm training, add up the answers:
There are usually fewer features than observations, meaning . Multiplying both parts by results in . Taking only the term with the highest power, we get: .
Iterative methods
The following formula is employed as a direct method in linear regression model training:
Direct methods help to find a precise solution using a given formula or algorithm. Their computational complexity is independent of the data.
Iterative methods, or iterative algorithms perform similar iterations repeatedly, the solution becoming more accurate with each step. If there's no need for high accuracy, just a few iterations will do.
The computational complexity of iterative methods depends on the number of steps they take, which may be affected by the amount of data.
Bisection method
The bisection method takes a continuous function and segment as input. The values and have different signs.
When these two conditions are fulfilled:
- The function is continuous
- The values at the ends of the segment have different signs
then the root of the equation is located somewhere on the given segment.
At each iteration, the bisection method:
- Checks if any value or equals zero. If it does, the solution has been found
- Finds the middle of the segment
- Compares the sign of with the signs of and
- If and have different signs, the root is located on the segment . The algorithm analyzes this segment on its next iteration
- If and have different signs, the root is located on the segment . The algorithm analyzes this segment on its next iteration
- The signs of and are different, so there are no other options
The solution's accuracy is usually chosen beforehand, for example, (margin of error) . At each iteration, the segment with the root is divided by 2. Once it reaches a length of the segment that is less than , the algorithm can be stopped. This condition is called the stopping criterion.
Comparing methods
Gradient descent:
- Works faster with large datasets in linear regressions with the MSE loss function
- Is also suitable for linear regressions with other loss functions (not all of them have formulas)
- Can be used for training neural networks which also lack direct formulas