Title
Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection.
Abstract
In this paper, we present a text extraction method for historical Tibetan document images. The task of text extraction is considered as text area detection and location problem. Firstly, the historical Tibetan document image is preprocessed to correct imbalanced illumination, tilt and noises, then get the binary image. Secondly, the regions of interest in historical Tibetan documents are divided into three categories using connected components. The images are divided equally into grids and the grids are filtered by the information of the categories of CCs and corner point density. The remaining grids are used to compute vertical and horizontal grid projections. Thirdly, by analyzing the projections, the approximate location of the text area can be detected. Finally, the text area is extracted accurately by correcting the bounding box of the approximate text area. Experiments on the dataset of historical Tibetan document images demonstrate the effectiveness of the proposed method.
Year
DOI
Venue
2017
10.1007/978-981-10-7302-1_45
Communications in Computer and Information Science
Keywords
DocType
Volume
Historical Tibetan document,Text extraction,Connected components,Corner point
Conference
772
ISSN
Citations 
PageRank 
1865-0929
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Xiqun Zhang100.34
Lijuan Duan221526.13
Long-Long Ma300.68
Jian Wu494.12