Abstract | ||
---|---|---|
Information assets in service enterprises are typically available as unstructured documents. There is an increasing need for unraveling information from these documents into a structured and semantic format. Structured data can be more effectively queried, which increases information reuse from asset repositories. This paper addresses the problem of extracting XML models, which follow a given target schema, from enterprise documents. We discuss why existing approaches for information extraction do not suffice for the enterprise documents created during service delivery. To address this limitation, we present the Asset Harvester Assistant (AHA), a tool that automatically extracts structured models from MS-Word documents, and supports manual refinement of the extracted models within an interactive environment. We present the results of empirical studies conducted using business-process documents from real service-delivery engagements. Our results indicate that the AHA approach can be effective in extracting accurate models from unstructured documents and improving user productivity. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/SCC.2010.55 | Services Computing |
Keywords | Field | DocType |
unstructured document,service delivery,asset harvester assistant,unraveling information,enterprise document,structured data,service enterprise,information reuse,information asset,aha approach,information extraction,semantics,data mining,pediatrics,information assets,enterprise,business process,services,empirical study,web pages,xml,business,ontologies,data structures | Ontology (information science),Data structure,World Wide Web,XML,Web page,Asset (computer security),Computer science,Information extraction,Data model,Empirical research | Conference |
ISBN | Citations | PageRank |
978-0-7695-4126-6 | 4 | 0.45 |
References | Authors | |
17 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Debdoot Mukherjee | 1 | 62 | 7.61 |
Senthil Mani | 2 | 247 | 24.05 |
Vibha Singhal Sinha | 3 | 189 | 15.92 |
Rema Ananthanarayanan | 4 | 24 | 4.32 |
Biplav Srivastava | 5 | 660 | 67.14 |
Pankaj Dhoolia | 6 | 118 | 11.80 |
Prahlad Chowdhury | 7 | 4 | 0.45 |