Title
Stream prediction using a generative model based on frequent episodes in event sequences
Abstract
This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wish to predict. The algorithm examines windows of events that precede occurrences of the target event types in historical data. The set of significant frequent episodes associated with each target event type is obtained based on formal connections between frequent episodes and Hidden Markov Models (HMMs). Each significant episode is associated with a specialized HMM, and a mixture of such HMMs is estimated for every target event type. The likelihoods of the current window of events, under these mixture models, are used to predict future occurrences of target events in the data. The only user-defined model parameter in the algorithm is the length of the windows of events used during model estimation. We first evaluate the algorithm on synthetic data that was generated by embedding (in varying levels of noise) patterns which are preselected to characterize occurrences of target events. We then present an application of the algorithm for predicting targeted user-behaviors from large volumes of anonymous search session interaction logs from a commercially-deployed web browser tool-bar.
Year
DOI
Venue
2008
10.1145/1401890.1401947
KDD
Keywords
Field
DocType
target event,new algorithm,mixture model,significant episode,target event type,historical data,model estimation,long categorical event stream,generative model,frequent episode,synthetic data,stream prediction,event sequence,hidden markov models,hidden markov model
Data mining,Event type,Computer science,Categorical variable,Synthetic data,Artificial intelligence,Embedding,Pattern recognition,Hidden Markov model,Mixture model,Machine learning,Model parameter,Generative model
Conference
Citations 
PageRank 
References 
34
1.38
12
Authors
3
Name
Order
Citations
PageRank
Srivatsan Laxman142121.65
Vikram Tankasali2361.76
Ryen W. White3341.38