Title
Statistical Relational Learning for Document Mining
Abstract
A major obstacle to fully integrated deployment of manydata mining algorithms is the assumption that data sitsin a single table, even though most real-world databaseshave complex relational structures. We propose an integratedapproach to statistical modeling from relationaldatabases. We structure the search space based on "refinementgraphs", which are widely used in inductive logic programmingfor learning logic descriptions. The use of statisticsallows us to extend the search space to include richerset of features, including many which are not boolean.Search and model selection are integrated into a single process,allowing information criteria native to the statisticalmodel, for example logistic regression, to make feature selectiondecisions in a step-wise manner. We present experimentalresults for the task of predicting where scientific paperswill be published based on relational data taken fromCiteSeer. Our approach results in classification accuraciessuperior to those achieved when using classical "flat" features.The resulting classifier can be used to recommendwhere to publish articles.
Year
Venue
Keywords
2003
ICDM
approach result,relational data,integrated deployment,statistical relational learning,document mining,real-world databaseshave complex relational,logic description,search space,classification accuraciessuperior,single table,single process,inductive logic programmingfor,learning artificial intelligence,statistical model,relational databases,decision theory,logistic regression,statistical modelling,model selection,regression analysis,data mining,feature selection,relational database
Field
DocType
ISBN
Inductive logic programming,Data mining,Relational database,Feature selection,Computer science,Statistical relational learning,Model selection,Relational Model/Tasmania,Artificial intelligence,Statistical model,Classifier (linguistics),Machine learning
Conference
0-7695-1978-4
Citations 
PageRank 
References 
32
2.73
27
Authors
4
Name
Order
Citations
PageRank
Alexandrin Popescul11067104.49
Lyle H. Ungar22850279.67
Steve Lawrence36194872.30
David M. Pennock43823451.85