Title
Automatic Feature Selection with Applications to Script Identification of Degraded Documents
Abstract
Current approaches to script identification rely onhand-selected features and often require processing a significantpart of the document to achieve reliable identification.We present an approach that applies a large pool ofimage features to a small training sample and uses subsetfeature selection techniques to automatically select a subsetwith the most discriminating power. At run time we usea classifier coupled with an evidence accumulation engineto report a script label once a preset likelihood thresholdhas been reached. We apply the system to a diverse corpusof printed Russian and English documents that suffer fromcommon degradation problems. Our validation studyshows promising results both in terms of the script identificationaccuracy and the ability to identify script on thescale of individual words and text lines.
Year
DOI
Venue
2003
10.1109/ICDAR.2003.1227762
ICDAR-1
Keywords
Field
DocType
fromcommon degradation problem,degraded documents,english document,script identification,reliable identification,current approach,script identificationaccuracy,automatic feature selection,evidence accumulation,diverse corpusof,script label,individual word,engines,feature selection,image features,degradation,image quality,pattern recognition,testing
Computer vision,Pattern recognition,Feature selection,Computer science,Image quality,Speech recognition,Artificial intelligence,Classifier (linguistics),Text recognition
Conference
ISSN
ISBN
Citations 
1520-5363
0-7695-1960-1
11
PageRank 
References 
Authors
0.70
10
2
Name
Order
Citations
PageRank
Vitaly Ablavsky1847.16
Mark R. Stevens2758.93