Abstract | ||
---|---|---|
Manually annotated metadata usually contains errors from mistyping; however, correcting those metadata manually could be costly and time consuming. This paper proposed a framework to ease metadata correction processed by proposing a system that utilizes OCR and NLP techniques to automatically extract metadata from document image. The system firstly converts images into text using OCR and then extracts metadata from OCR results. After that, the extracted metadata are compared with the data in existing repository to locate error entries. The error entries are then displayed to users whom will correct them using supporting information. Although human decision is required to correct the error manually, this step is necessary with only error entries. The experimental results with 3,712 thesis abstracts show that the proposed solution can automatically extract the relevance information with 91.41% accuracy. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11599517_69 | ICADL |
Keywords | Field | DocType |
electronic library repository,system firstly,proposed solution,ocr result,error entry,metadata correction,supporting information,extracts metadata,utilizes ocr,relevance information,annotated metadata,data cleansing | Data warehouse,Metadata,Metadata repository,Data mining,Data cleansing,Information retrieval,Character recognition,Computer science,Data element,Optical character recognition,Error detection and correction | Conference |
Volume | ISSN | ISBN |
3815 | 0302-9743 | 3-540-30850-4 |
Citations | PageRank | References |
0 | 0.34 | 1 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
asanee kawtrakul | 1 | 161 | 25.90 |