Automated template-based metadata extraction architecture - Citegraph

Paper Info

Title
Automated template-based metadata extraction architecture

Abstract
This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, but is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. Our automated process enables many more documents to be available online than would otherwise have been possible due to time and cost constraints. We describe our architecture and implementation and illustrate the effectiveness of the tool-set by providing experimental results on two major collections DTIC (Defense Technical Information Center) and NASA (National Aeronautics and Space Administration).

Year	DOI	Venue
2007	10.1007/978-3-540-77094-7_42	ICADL
Keywords	Field	DocType
diverse layout,metadata field,available online,automated metadata extraction,collections online,space administration,defense technical information center,national aeronautics,large collection,automated process,metadata extraction architecture	Metadata repository,Metadata,Architecture,World Wide Web,Information retrieval,Computer science,Meta Data Services,Automation,Technical information,Publishing	Conference
Volume	ISSN	ISBN
4822	0302-9743	3-540-77093-3
Citations	PageRank	References
9	0.61	6
Authors
5

Authors (5 rows)

Cited by (9 rows)

References (6 rows)

Name	Order	Citations	PageRank
Paul Flynn	1	9	0.61
Li Zhou	2	9	0.61
Kurt Maly	3	567	139.93
Steven Zeil	4	13	1.05
Mohammad Zubair	5	587	89.90

1