Knowledge Base

Algorithm Analysis

Computational complexity

Algorithm running time isn't measured in seconds. It is determined by the number of elementary operations performed by the algorithm. The running time on any given computer is usually called real running time. An algorithm's running time is also influenced by its arguments.

Denote the length of the list as nn. The working time is a function of nn, written as T(n)T(n). The asymptotic running time of an algorithm shows how T(n)T(n) grows when nn increases.

When T(n)T(n) is a polynomial, the asymptotic running time is equal to the term of the highest power without a coefficient (for example, n2n^2 instead of 5n25n^2). As nn reaches greater values, the other terms become unimportant.

  • If T(n)=4n+3T(n) = 4n + 3, the asymptotic running time T(n)nT(n) \sim n. The algorithm has linear complexity. The tilde symbol (\sim) means that the asymptotic runtime is nn.
  • If T(n)=5n2+3n1T(n) = 5n^2 + 3n - 1, the asymptotic running time T(n)n2T(n) \sim n^2. The algorithm has quadratic complexity.
  • If T(n)=10n32n2+5T(n) = 10n^3 - 2n^2 + 5, then T(n)n3T(n) \sim n^3. The algorithm has cubic complexity.
  • If T(n)=10T(n) = 10, then T(n)1T(n) \sim 1. The algorithm has constant complexity, that is, it doesn't depend on nn.

Linear regression model training time

The linear regression training objective is represented as follows:

w=arg minwMSE(Xw,y)w=\argmin_{\qquad \enspace w}\text{MSE}(Xw,y)

Weights are calculated by this formula:

w=(XX)1Xyw=(X^\top X)^{-1}X^\top y
  • Define the number of observations in the training set as nn, and the number of features as pp
  • The size of matrix XX will be n×pn \times p, and the size of vector yy will be nn
  • The computational complexity will be defined as T(n,p)T(n, p) because it depends on two parameters: nn and pp

To calculate the complexity of algorithm training, add up the answers:

T(n,p)np2+p3+np2+npT(n, p) \sim np^2 + p^3 + np^2 + np

There are usually fewer features than observations, meaning p<np < n. Multiplying both parts by p2p^2 results in p3<np2p^3 < np^2. Taking only the term with the highest power, we get: T(n,p)np2T(n, p) \sim np^2.

Iterative methods

The following formula is employed as a direct method in linear regression model training:

w=(XX)1Xyw=(X^\top X)^{-1}X^\top y

Direct methods help to find a precise solution using a given formula or algorithm. Their computational complexity is independent of the data.

Iterative methods, or iterative algorithms perform similar iterations repeatedly, the solution becoming more accurate with each step. If there's no need for high accuracy, just a few iterations will do.

The computational complexity of iterative methods depends on the number of steps they take, which may be affected by the amount of data.

Bisection method

The bisection method takes a continuous function and segment [a,b][a, b] as input. The values f(a)f(a) and f(b)f(b) have different signs.

When these two conditions are fulfilled:

  1. The function is continuous
  2. The values at the ends of the segment have different signs

then the root of the equation is located somewhere on the given segment.

At each iteration, the bisection method:

  • Checks if any value f(a)f(a) or f(b)f(b) equals zero. If it does, the solution has been found
  • Finds the middle of the segment c=a+b2c = \frac{a + b}{2}
  • Compares the sign of f(c)f(c) with the signs of f(a)f(a) and f(b)f(b)
    • If f(c)f(c) and f(a)f(a) have different signs, the root is located on the segment [a,c][a, c]. The algorithm analyzes this segment on its next iteration
    • If f(c)f(c) and f(b)f(b) have different signs, the root is located on the segment [b,c][b, c]. The algorithm analyzes this segment on its next iteration
    • The signs of f(a)f(a) and f(b)f(b) are different, so there are no other options

The solution's accuracy is usually chosen beforehand, for example, ee (margin of error) =0.000001= 0.000001. At each iteration, the segment with the root is divided by 2. Once it reaches a length of the segment that is less than ee, the algorithm can be stopped. This condition is called the stopping criterion.

Comparing methods

Gradient descent:

  • Works faster with large datasets in linear regressions with the MSE loss function
  • Is also suitable for linear regressions with other loss functions (not all of them have formulas)
  • Can be used for training neural networks which also lack direct formulas
Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,