Abstract | ||
---|---|---|
More and more companies are migrating their legacy document management sys- tems toward XML format, the industrial standard for data exchange. In order to reduce the migration cost we propose an approach aimed at automating the conversion of layout-oriented documents to semantic-oriented annotations. The conversion module uses supervised machine learning techniques to learn a conversion model for a collection of documents. The conver- sion is achieved through a semantic annotation of the document content and structuring the annotations, accordingly to a XML schema that specify the class of target documents. |
Year | DOI | Venue |
---|---|---|
2006 | 10.3166/dn.9.1.9-24 | Document Numérique |
Keywords | Field | DocType |
xml.,extraction d'informations,xml. keywords:machine learning,mots-clés :apprentissage supervisé,information extraction,document management,machine learning,xml schema,data exchange | XML,Computer science,Document Structure Description,Electronic document,Humanities,XML schema,Automatic processing,Linguistics,Markup language | Journal |
Volume | Issue | Citations |
9 | 1 | 0 |
PageRank | References | Authors |
0.34 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jérôme Fuselier | 1 | 28 | 3.63 |
Boris Chidlovskii | 2 | 411 | 52.58 |
Domaine Universitaire | 3 | 19 | 3.45 |