Title
Profiles of Tries
Abstract
Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to the (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fill-up level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have hardly been studied in the past. The analysis of profiles is surprisingly arduous, but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles varies. Near the root, the external profiles tend to zero at an exponential rate, and then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically and then polynomially; they then tend polynomially to zero again. Furthermore, the expected profiles of asymmetric tries are oscillating in a range where profiles grow polynomially, while symmetric tries are nonoscillating, in contrast to most shape parameters of random tries studied previously. Such a periodic behavior for asymmetric tries implies that the depth satisfies a central limit theorem but not a local limit theorem of the usual form. Also the widest levels in symmetric tries contain a linear number of nodes, differing from the order $n/\sqrt{\log n}$ for asymmetric tries, $n$ being the size of the trees. Finally, it is observed that profiles satisfy central limit theorems when the variance goes unbounded, while near the height they are distributed according to Poisson laws. As a consequence of these results we find typical behaviors of the height, shortest path, fill-up level, and depth. These results are derived here by methods of analytic algorithmics such as generating functions, Mellin transform, Poissonization and de-Poissonization, the saddle-point method, singularity analysis, and uniform asymptotic analysis.
Year
DOI
Venue
2008
10.1137/070685531
LATIN
Keywords
DocType
Volume
markov source,external profile,analytic algorithmics,detailed study,height,singularity analysis.,tries,saddle-point method,popular data structure,linear number,fill-up level,asymmetric try,profile,interesting behavior,analytic poissonization,digital trees,shortest path,central limit theorem,trie parameter,memoryless source,expected profile,depth,symmetric try,local limit theorem,uniform asymptotic analysis,singularity analysis,mellin transform
Conference
38
Issue
ISSN
ISBN
5
0097-5397
3-540-78772-0
Citations 
PageRank 
References 
8
0.55
51
Authors
4
Name
Order
Citations
PageRank
Gahyun Park1194.58
Hsien-Kuei Hwang236538.02
Pierre Nicodème310414.51
Wojciech Szpankowski41557192.33