Title
A framework for streamlined statistical prediction using topic models.
Abstract
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.
Year
DOI
Venue
2019
10.18653/v1/w19-2508
LaTeCH@NAACL-HLT
Field
DocType
Volume
Econometrics,Dimensionality reduction,Narrative inquiry,Regression analysis,Text corpus,Information extraction,Statistical learning,Artificial intelligence,Topic model,Predictive modelling,Machine learning,Mathematics
Journal
abs/1904.06941
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Vanessa G. Glenny100.68
jonathan tuke242.49
nigel g bean34710.77
Lewis Mitchell415517.70