Title
Rule-based middle-level character detection for simplifying Thai document layout analysis
Abstract
Although research interest in machine printed Thai character recognition has been an intense research area in the past decade, there are only a few results available for Thai document layout analysis. In addition, directly using the method proposed for other languages with Thai documents is not possible since Thai documents have a unique characteristic (i.e., Thai characters can be placed in four different levels). This paper proposed an approach to eliminate that characteristic by removing nonmiddle-level characters from the image based on heuristic rules derived from Thai language properties: nonmiddle-level characters are usually smaller than middle-level characters, and the gap between each level is smaller than the gap between two consecutive lines. After they are removed, one can use any existing methods with Thai documents without any modification. The experimental results show that the proposed method can effectively remove nonmiddle-level characters from 200 test images with 99.46% accuracy even when the image contains various font sizes.
Year
DOI
Venue
2005
10.1109/ICDAR.2005.204
ICDAR-1
Keywords
Field
DocType
character recognition,document image processing,natural languages,Thai document layout analysis,Thai language,character recognition,heuristic rules,rule-based middle-level character detection
Point (typography),Rule-based system,Heuristic,Character recognition,Pattern recognition,Computer science,Document processing,Document layout analysis,Image based,Natural language,Artificial intelligence,Natural language processing
Conference
ISSN
ISBN
Citations 
1520-5263
0-7695-2420-6
0
PageRank 
References 
Authors
0.34
4
2
Name
Order
Citations
PageRank
chaiyakorn yingsaeree1142.44
asanee kawtrakul216125.90