Title
AHA: Asset Harvester Assistant
Abstract
Information assets in service enterprises are typically available as unstructured documents. There is an increasing need for unraveling information from these documents into a structured and semantic format. Structured data can be more effectively queried, which increases information reuse from asset repositories. This paper addresses the problem of extracting XML models, which follow a given target schema, from enterprise documents. We discuss why existing approaches for information extraction do not suffice for the enterprise documents created during service delivery. To address this limitation, we present the Asset Harvester Assistant (AHA), a tool that automatically extracts structured models from MS-Word documents, and supports manual refinement of the extracted models within an interactive environment. We present the results of empirical studies conducted using business-process documents from real service-delivery engagements. Our results indicate that the AHA approach can be effective in extracting accurate models from unstructured documents and improving user productivity.
Year
DOI
Venue
2010
10.1109/SCC.2010.55
Services Computing
Keywords
Field
DocType
unstructured document,service delivery,asset harvester assistant,unraveling information,enterprise document,structured data,service enterprise,information reuse,information asset,aha approach,information extraction,semantics,data mining,pediatrics,information assets,enterprise,business process,services,empirical study,web pages,xml,business,ontologies,data structures
Ontology (information science),Data structure,World Wide Web,XML,Web page,Asset (computer security),Computer science,Information extraction,Data model,Empirical research
Conference
ISBN
Citations 
PageRank 
978-0-7695-4126-6
4
0.45
References 
Authors
17
7
Name
Order
Citations
PageRank
Debdoot Mukherjee1627.61
Senthil Mani224724.05
Vibha Singhal Sinha318915.92
Rema Ananthanarayanan4244.32
Biplav Srivastava566067.14
Pankaj Dhoolia611811.80
Prahlad Chowdhury740.45