Hello, Big-O
Resources
Algorithm analysis: introduction
- When estimating an algorithm's performance, the primary consideration is the number of basic operaions required by the algorithm to process an input of a certain size. [FloodIt game]
- Running time: running time of an algorithm often depends on the size of input ($n$), expressed as $T(n)$ ($T(n) \geq 0$).
For example, $T(n) = cn$, where $c$ is a constant for fixed amount of time for basic operation and $n$ is the input size. - Largest-value sequential search algorithm: to solve the problem of finding the largest value in an array of n integers by looking at each integer in turn (Example 3.1).
static int largest(int[] A) { int currlarge = 0; //Holds largest element position for (int i = 1; i < A.length; i ++) if (A[currlarge] < A[i]) currlarge = i; return currlarge; }
The growth rate for the running time of the largest-value sequential search algorithm? T(n) = cn, where c is a constant (each comparison costs c time). [See a detailed explanation] - Another example (Example 3.3):
sum = 0; for(i=1; i<=n; i++) for(j=1; j<=n; j++) sum ++;
What is the growth rate for the running time of this algorithm?
- More examples
Best, Worst, and Average Cases
The sequential algorithm: searching an array containing $n$ integers to find the one with a particular value $K$.Best case: one integer is examined.
Worst case: all $n$ integers need to be examined.
Average case: $n/2$.
For real-time applications, worst-case analysis. Otherwise, average-case analysis if the data distribution is known, or worst-case analysis if not.
Asymptotic Analysis
Asymptotic analysis is used for most algorithm comparisons. Asymptotic analysis refers to the study of an algorithm as the input size "gets big" or reaches a limit (in the calculus sense; so all constant factors can be ignored).Big-O, Big-Omega, and Big-Theta
- Big-O: Upper bound for the growth of an algorithm's resource (often running time); the upper or highest growth rate that the algorithm can have.
Definition:Let $T(n)$ represent the true running time of the algorithm.
An algorithm whose running time has a constant upper bound is said to be in $O(1)$.
$f(n)$ is some expression for the upper bound.
For $T(n)$ a non-negatively valued function, $T(n)$ is said to be set $O(f(n))$ if there exist two positive constants $c$ and $n_0$ such that $T(n) \leq cf(n)$ for all $n > n_0$.
The constant $n_0$ is the smallest value of $n$ for which the claim of an upper bound holds true.
Example 1: $T(n) = c_1 n $, $T(n)$ is in $O(n)$
Example 2: $T(n) = c_1 n^2 + c_2 n$, $T(n)$ is in $O(n^2)$
Example 3: $T(n) = c$, $T(n)$ is in $O(1)$
We always seek to define the running time of an algorithm with the tightest (lowest) possible upper bound.
So we say that the sequential search is in $O(n)$
(instead of $O(n^2)$), even though $O(n)$ is also in $O(n^2)$ (but $O(n^2)$ is not in $O(n)$). - Big-Omega: Lower bound for the growth of an algorithm's running time; the least amount of a resource that an algorithm needs for some class of input.
Definition: $T(n)$ represents the true running time of the algorithm. $g(n)$ is some expression for the lower bound.
For $T(n)$ a non-negatively valued function, $T(n)$ is in set $\Omega(g(n))$ if there exist two positive constants $c$ and $n_0$ such that $T(n) \geq cg(n)$ for all $n > n_0$.
- Big-Theta: when the upper and lower bounds are the same within a constant factor.
An algorithm is said to be $\Theta(h(n))$, if it is in $O(h(n))$ and it is in $\Omega(h(n))$. Note there is no 'in' for $\Theta$ notation.
(A function is in Big-Theta of $f$ if it is not much worse but also not much better than $f$ -- a definition found in Wolfram Mathworld)
There is a strict equality for two equations with the same $\Theta$, i.e., if $f(n)$ is $\Theta(g(n))$, then $g(n)$ is $\Theta(f(n))$.
For example: the sequential search algorithm is both in $O(n)$ and in $\Omega(n)$ in the average case, so it is $\Theta(n)$ in the average case.
Given an algebraic equation describing the time requirement for an algorithm, the upper and lower bounds always meet. This is because in some sense we have a perfect analysis for the algorithm, embodied by the running time equation.
The method of limits
Given two functions $f(n)$ and $g(n)$, which one grows faster (or how we know $f(n)$ is in $O(g(n))$, $\Omega(g(n))$ or $\Theta(g(n))$)?$lim_{n\to\infty} \frac{f(n)}{g(n)}$
If the limit goes to $\infty$, then $f(n)$ is in $\Omega(g(n))$; if the limit goes to zero, then $f(n)$ is in $O(g(n))$; if the limit goes to a constrant (other than zero), then $f(n) = \Theta(g(n))$ because both grow at the same rate.
Example: $f(n) = 2nlogn$ and $g(n) = n^2$, then $f(n)$ is in $O(g(n))$ (or $g(n)$ is in $\Omega(f(n))$).
Simplifying rules
- If $f(n)$ is in $O(g(n))$ and $g(n)$ is in $O(h(n))$, then $f(n)$ is in $O(h(n))$.
- If $f(n)$ is in $O(kg(n))$ for any constant $k > 0$, then $f(n)$ is in $O(g(n))$.
- If $f_1(n)$ is in $O(g_1(n))$ and $f_2(n)$ is in $O(g_2(n))$, then $f_1(n) + f_2(n)$ is in $O(max(g_1(n), g_2(n)))$.
- If $f_1(n)$ is in $O(g_1(n))$ and $f_2(n)$ is in $O(g_2(n))$, then $f_1(n)f_2(n)$ is in $O(g_1(n)g_2(n))$.
The first three rules say that one can ignore all the constants and lower-order terms to determine the asymptotic growth rate for any cost function.
The fourth rule can be used to analyze simple loops in programs.
Calculating the running time for a program
- Example 1:
sum1 = 0; for(i = 1; i <= n; i ++) sum1 ++;
- Example 2:
sum2 = 0; for(i = 1; i <= n; i ++) for(j = 1; j <= n; j ++) sum2 ++;
- Example 3:
sum3 = 0; for(i = 1; i <= n; i ++) for(j = 1; j <= i; j ++) sum3 ++;
- Example 4:
sum4 = 0; for(i = 1; i <= n; i *= 2) sum4 ++;
- Example 5:
sum5 = 0; for(k = 1; k <= n; k *= 2) for(j = 1; j <= n; j ++) sum5 ++;
- Example 6:
sum6 = 0; for(k = 1; k <= n; k *= 2) for(j = 1; j <= k; j ++) sum6 ++;
- Example 7:
Given two sequences, $a=a_1...a_m$ and $b=b_1...b_n$
d[0][0] = 0; for(i = 1; i <= m; i ++) { for(j = 1; j <= n; j ++) { if(a[i] == b[j]) c = 0; else c = 1; d[i][j] = min(d[i-1][j] + 1, d[i][j-1] + 1, d[i-1][j-1]+c) } }