Title
Constrained Semi-supervised Learning in the Presence of Unanticipated Classes.
Abstract
Traditional semi-supervised learning (SSL) techniques consider the missing labels of unlabeled datapoints as latent/unobserved variables, and model these variables, and the parameters of the model, using techniques like Expectation Maximization (EM). Such semisupervised learning techniques are widely used for Automatic Knowledge Base Construction (AKBC) tasks. We consider two extensions to traditional SSL methods which make it more suitable for a variety of AKBC tasks. First, we consider jointly assigning multiple labels to each instance, with a flexible scheme for encoding constraints between assigned labels: this makes it possible, for instance, to assign labels at multiple levels from a hierarchy. Second, we account for another type of latent variable, in the form of unobserved classes. In open-domain webscale information extraction problems, it is an unrealistic assumption that the class ontology or topic hierarchy we are using is complete. Our proposed framework combines structural search for the best class hierarchy with SSL, reducing the semantic drift associated with erroneously grouping unanticipated classes with expected classes. Together, these extensions allow a single framework to handle a large number of knowledge extraction tasks, including macro-reading, noun-phrase classification, word sense disambiguation, alignment of KBs to wikipedia or on-line glossaries, and ontology extension. To summarize, this thesis argues that many AKBC tasks which have previously been addressed separately can be viewed as instances of single abstract problem: multiview semisupervised learning with an incomplete class hierarchy. In this thesis we present a generic EM framework for solving this abstract task.
Year
DOI
Venue
2015
10.1145/2888422.2888447
SIGIR Forum
Field
DocType
Volume
Ontology,Data mining,Semi-supervised learning,Computer science,Latent variable,Artificial intelligence,Knowledge base,Hierarchy,Information retrieval,Class hierarchy,Information extraction,Knowledge extraction,Machine learning
Journal
49
Issue
ISSN
Citations 
2
0163-5840
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Bhavana Bharat Dalvi120117.31
DalviBhavana Bharat200.34