Title
On Analysis of Nonstationary Categorical Data Time Series: Dynamical Dimension Reduction, Model Selection, and Applications To Computational Sociology.
Abstract
Many real-life processes can be described as simplified categorical (or discrete) data models, switching between a finite number of states or regimes. Two main computational problems can be considered in the context of their analysis: (a) dimension reduction (i.e., identification of the essential (discrete) degrees of freedom or categories), and (b) parameterization of reduced dynamical models (e. g., Markov chains or Bernoulli processes) that can be used for analysis and prediction. Existing analysis methods, addressing the above issues (a) and (b), are limited with respect to possibilities for analyzing the influences of exogenous factors and have an implicit assumption of the stationarity built in. A general framework for analysis and online prediction of categorical data dynamics is presented. In contrast to standard approaches of categorical data analysis (like generalized linear discrete models, e. g., logit and probit regression methods) that are based on the transformation of the discrete categorical data into continuous representation, the presented methods allow us to build the (auto) regressive models of the data and to find the reduced data representation directly in the discrete setting. This general framework is based on an extension of the principal component analysis method to structure-preserving dimension reduction of (nonstationary) discrete jump processes, combined with a data-based estimation of nonhomogeneous Markov chain models under the influence of external factors in the reduced representation. Efficiently parallelizable numerical methods for the solution of the resulting mixed discrete-continuous optimization problems are described, applicable to the analysis of systems with a large number of discrete categories. Described methods have a favorable scaling with the dimension (expressed via a number of discrete states). General applicability is illustrated on the generic toy model example and applied to two problems from computational sociology: (i) estimation of the time-dependent transition graphs describing the dynamics of political preferences of the mean German voter, and (ii) parameterization of the discrete jump process describing the transition between the employed and unemployed states of an average German individual. The resulting online predictions are compared to the ones obtained by standard methods of the time series analysis, and the influence of implicit statistical assumptions is discussed.
Year
DOI
Venue
2011
10.1137/100790549
MULTISCALE MODELING & SIMULATION
Keywords
Field
DocType
jump processes,Markov chains,nonstationary processes,computational sociology
Applied mathematics,Data modeling,Dimensionality reduction,Categorical variable,Artificial intelligence,Logit,Computational problem,Probit model,Mathematical optimization,Markov chain,Model selection,Machine learning,Mathematics
Journal
Volume
Issue
ISSN
9
4
1540-3459
Citations 
PageRank 
References 
0
0.34
0
Authors
1
Name
Order
Citations
PageRank
Illia Horenko14410.89