Tries and Applications
Useful Data Structures in String-related Problems
Problem Example
-
String Matching
-
starting from a text T, i.e. a
string
composed of letters in the alphabet A
-
find occurrences of a pattern P, i.e. another string composed of letters in the alphabet A
-
inside T, the pattern string P could occur many times
(some occurrences of P could even overlap over other occurrences of P)
-
variations of the problem could include "wildcard characters", GREP patterns, etc.
-
String Matching starting from a static (possibly very large) text string P
-
goal: to search for the string P in the string T, we would like to:
-
spend no more than O(P) time to search the data structure
-
using no more than O(T) space to maintain the data structure in memory
Definition
- Recall that BST can be extremely unbalanced, and the shape of a BST
is determined by the order in which its data records are inserted, a
result of object space decomposition.
- Alternative to object space decomposition is to predefine the
splitting position within the key range for each node in the tree. So
the root could be predefined to split the key range into two equal
halves, regardless of the particular values or order of insertion for
the data records.
- A tree data structure based on key space decomposition is called a trie (pronounced as "try"). A trie stores data only in leaf nodes.
- A trie is called a binary trie when the trie structure is based on
the value of the key interpreted as a binary number (Huffman coding tree
is an example of a binary trie). The binary value of the key determines
whether to select the left or right branch at any given point during
the search. The most significant bit determines the branch direction at
the root.
See an example of binary trie:
- Alphabet trie (compressed trie)
A compressed trie is space optimized: if a node is the only child of its parent, merge the node with the parent.
Applications
Suffix trees for string matching