Heaps and Applications
Heaps
- When a collection of objects is organized by importance or priority, we call this a priority queue. Priority queues are often implemented as heaps.
- A normal queue data structure will not implement a priority queue efficiently because search for the element with highest priority takes $\Theta(n)$ time.
- A list (sorted or not) will require $\Theta(n)$ time for either insertion or removal.
- A BST, if unbalanced, can have bad performance.
- A heap is a complete binary tree with values partially ordered. There is a relationship between the value stored at any node and the values of its children: in a max-heap every node stores a value that is greater than its children; and a min-heap has the property that every node stores a value that is less than or equal to that of its children.
- Heap is only partially ordered (e.g., a heap {100 40 20 10} becomes {200 100 20 10 40} after adding 200). The heap for any given set of numbers is not unique, but some heaps require fewer operations to build than others.
- A max-heap is good for finding the maximum (but not efficient when searching for an arbitary value).
- Heaps can be implemented using arrays (since heaps are complete binary trees).
- "Heapify" algorithm (see the code)
To insert a new value to a heap, first add it to the end of the array (heaps are stored in arrays). If the new value is greater than its parent, swap the parent and the new value. Do this iteratively until the new value reaches its correct position. - The cost of insert (of one element) into the correct position in the tree: $\Theta(logn)$ in the worst case, because the value being inserted can move (siftUp) at most the distance from the bottom of the tree to the top of the tree (the height of the tree).
- The cost of remove (the maximum value from the max-heap, or the minimum value from the min-heap): ? in the worst case.
Here is the algorithm: Swap the first element with the last element; remove the last element; and fix the max-heap (or min-heap) property using siftDown(). - The cost of building a heap?
Given $n$ values, if we insert them one at a time, it takes $\Theta(nlogn)$ time in the worst case.
But if all $n$ values are available at the beginning of the building process, heap can be built faster. The amortized running time is $\Theta(n)$ (by contrast, building a BST takes $\Theta(nlogn)$ average-case time and $\Theta(n^2)$ worst-case time).
Total number of basic operations is: $\frac{n}{4}\times 1 + \frac{n}{8} \times 2 + \frac{n}{16} \times 3 + ... = \frac{n}{2}\sum\limits_{i=1}^{log n}\frac{i}{2^{i}}$
The algorithm takes $\Theta(n)$ time in the worst case (as $\sum\limits_{i=1}^n\frac{i}{2^i}$ has a closed-form solution, $2-\frac{n+2}{2^n}$, which is close to 2 for large $n$). - A little more on amortized analysis
Amortized analysis is the analysis for a series of operations taken as a whole. In particular, amortized analysis allows us to deal with the situation where the worst-case cost for $n$ operations is less than $n$ times the worst-case cost of any one operation. (See textbook 14.3)
What's available in Java
Java doc on Priority QueueSee see a demo code (for using priority queue & comparator)
Heapsort
- Heapsort is based on the heap data structure. Advantages of using heap: the tree is balanced (heaps are complete binary trees); array representation is space efficient; and building heap is relatively cheap.
- The algorithm works by repeatedly removing the maximum value (the element at position 0), and then restoring the heap property, until the heap is empty (see the code).
- Time complexity of heapsort: building heap takes $\Theta(n)$, and heapsort operations take $\Theta(nlogn)$; so the cost is $\Theta(nlogn)$.
- If we wish to find the $k$ largest (or least) elements in an array using heapsort algorithm, what's the time complexity?
Applications in graph algorithms
- Kruskal's Minimum-cost spanning tree (MST) algorithm
The algorithm requires that edges be visited in ascending order; so min-heap can be used for this process. As the process stops when the MST is complete, only a relatively small fraction of the edges need be sorted (one of the advantages of using Heapsort).