Title
Non-parametric density estimation of streaming data using orthogonal series
Abstract
Computer technology in the 21st century has allowed us to gather and collect data at rates that would have seemed impossible less than a decade ago. As such, typical data base management systems (DBMS) are having great difficulty storing and analyzing data in the traditional way. Systems that receive large amounts of data in transient data streams generally need to analyze the data immediately without storing it on a disk. These systems are referred to as data stream management systems (DSMS). This emerging field has been pushed to the forefront by technology that demands analysis of data in real time. Babcock et al. [2002] analyzed the issues involved in mining rapid time-varying data streams. To date, most of the work in the area of DSMS has primarily been concerned with querying the data streams. These queries provide estimates of parameters, such as the mean, and then continuously update them as more data arrives. Recently, Heinz and Seeger [2004] used data streams to provide an estimate of the underlying probability density function by dividing the data up into bins or windows containing the most recent data. An estimate of the density is then created by using the standard wavelet cascading algorithm on the binned data. This dissertation will provide an alternative approach to finding the probability density function of streaming data. This approach provides an estimate of the density by using an orthogonal series. Obtaining a density estimate by orthogonal series has several advantages which will be discussed throughout this dissertation. Although the approach is applicable to a myriad of basis functions, the density estimation problem will be studied by using wavelets as the basis functions. The history of wavelets as a mathematical tool dates back to the early 1900s. In the 1990s, Donoho and Johnstone [1992,1994] really established wavelets as a scientific discipline by applying them in the areas of image compression, denoising and density estimation. Devroye [1985], Silverman [1986] and Scott [1992] provide excellent background material on density estimation in general. The first paper that used wavelets in density estimation is attributed to Doukhan and Leon [1990]. This work was followed by Walter [1990] and Kerkyacharian and Picard [1992]. As a mathematical tool for representing functions, and specifically probability densities, wavelets work especially well. This is due in part, to the fact that they form an orthonormal basis for L2R . Another pioneer in the field of wavelet density estimation was Vidakovic [1994], who constructed density estimations based on the square root of the density. This dissertation will first provide a history of wavelets and the density estimation problem in Chapter 2. Next, in Chapter 3, the framework for obtaining a density estimate of streaming data using orthogonal series will be established. In Chapter 4, I will address the problem of discounting old data that is no longer relevant to the density estimate. Chapter 5 provides a simulation study first using simulated data, and then actual data from a case study using Internet header traffic data. Chapter 6 will summarize my findings as well as address possible areas of future study.
Year
DOI
Venue
2009
10.1016/j.csda.2009.06.014
Non-parametric density estimation of streaming data using orthogonal series
Keywords
Field
DocType
actual data,data stream management system,density estimate,old data,density estimation,Internet header traffic data,binned data,Non-parametric density estimation,data stream,orthogonal series,density estimation problem
Density estimation,Econometrics,On the fly,Orthogonal series,Probability distribution,Streaming data,Statistics,Probability density function,Mathematics,Nonparametric density estimation,Recursion
Journal
Volume
Issue
ISSN
53
12
Computational Statistics and Data Analysis
ISBN
Citations 
PageRank 
0-542-31740-0
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Kyle A. Caudle101.01
Edward J. Wegman2367.84