Title
Learning to join everything
Abstract
Text, speech, images, video, DNA sequences provide information about entities that people can recognize when looking at a particular instance. But those entities and their attributes and relationships are not directly accessible to queries that join across types of sources. Information extraction methods based on supervised machine learning recognize mentions of entities and relationships of predefined types in different kinds of sources, which can then be used to answer some useful types of queries. However, supervised learning relies on hand-annotated training sets that are difficult to create and limit what types of entities and relationships can be joined for new applications. These limitations have prompted research into unsupervised extraction methods that rely on correlations among sources rather than hand-annotated training sets. While these methods are not yet as accurate as those based on supervised learning, they have the potential for a new query-by-example approach to information integration in which seed sets of query answers are expanded into ranked lists of potential answers by learning occurrence patterns from the seed answers. I will give examples of both types of methods from our research on biomedical information extraction, leading to some ideas on a possible convergence of search and databases through machine learning.
Year
DOI
Venue
2007
10.1145/1321440.1321443
CIKM
Keywords
Field
DocType
new application,supervised machine learning,information extraction method,machine learning,information integration,biomedical information extraction,unsupervised extraction method,supervised learning,new query-by-example approach,hand-annotated training set,information extraction,dna sequence,query by example
Online machine learning,Information integration,Data mining,Stability (learning theory),Semi-supervised learning,Active learning (machine learning),Information retrieval,Computer science,Supervised learning,Information extraction,Unsupervised learning
Conference
Citations 
PageRank 
References 
0
0.34
1
Authors
1
Name
Order
Citations
PageRank
Fernando Pereira1177172124.79