Data cleansing and preparation for moving toward electronic library repository - Citegraph

Paper Info

Title
Data cleansing and preparation for moving toward electronic library repository

Abstract
Manually annotated metadata usually contains errors from mistyping; however, correcting those metadata manually could be costly and time consuming. This paper proposed a framework to ease metadata correction processed by proposing a system that utilizes OCR and NLP techniques to automatically extract metadata from document image. The system firstly converts images into text using OCR and then extracts metadata from OCR results. After that, the extracted metadata are compared with the data in existing repository to locate error entries. The error entries are then displayed to users whom will correct them using supporting information. Although human decision is required to correct the error manually, this step is necessary with only error entries. The experimental results with 3,712 thesis abstracts show that the proposed solution can automatically extract the relevance information with 91.41% accuracy.

Year	DOI	Venue
2005	10.1007/11599517_69	ICADL
Keywords	Field	DocType
electronic library repository,system firstly,proposed solution,ocr result,error entry,metadata correction,supporting information,extracts metadata,utilizes ocr,relevance information,annotated metadata,data cleansing	Data warehouse,Metadata,Metadata repository,Data mining,Data cleansing,Information retrieval,Character recognition,Computer science,Data element,Optical character recognition,Error detection and correction	Conference
Volume	ISSN	ISBN
3815	0302-9743	3-540-30850-4
Citations	PageRank	References
0	0.34	1
Authors
1

Authors (1 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
asanee kawtrakul	1	161	25.90

1