Combining multiple sources of evidence in web information extraction - Citegraph

Paper Info

Title
Combining multiple sources of evidence in web information extraction

Abstract
Extraction of meaningful content from collections of web pages with unknown structure is a challenging task, which can only be successfully accomplished by exploiting multiple heterogeneous resources. In the Ex information extraction tool, so-called extraction ontologies are used by human designers to specify the domain semantics, to manually provide extraction evidence, as well as to define extraction subtasks to be carried out via trainable classifiers. Elements of an extraction ontology can be endowed with probability estimates, which are used for selection and ranking of attribute and instance candidates to be extracted. At the same time, HTML formatting regularities are locally exploited.

Year	DOI	Venue
2008	10.1007/978-3-540-68123-6_51	ISMIS
Keywords	Field	DocType
challenging task,instance candidate,extraction evidence,extraction ontology,ex information extraction tool,domain semantics,extraction subtasks,human designer,web information extraction,html formatting regularity,multiple source,so-called extraction ontology,web pages,information extraction	Ontology (information science),Data mining,Ontology,Web page,Information retrieval,Ranking,Computer science,Information extraction,Disk formatting,Semantics,Relationship extraction	Conference
Volume	ISSN	ISBN
4994	0302-9743	3-540-68122-1
Citations	PageRank	References
0	0.34	9
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Martin Labský	1	23	6.77
Vojtěch Svátek	2	53	7.90

1