Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection. - Citegraph

Paper Info

Title
Text Extraction for Historical Tibetan Document Images Based on Connected Component Analysis and Corner Point Detection.

Abstract
In this paper, we present a text extraction method for historical Tibetan document images. The task of text extraction is considered as text area detection and location problem. Firstly, the historical Tibetan document image is preprocessed to correct imbalanced illumination, tilt and noises, then get the binary image. Secondly, the regions of interest in historical Tibetan documents are divided into three categories using connected components. The images are divided equally into grids and the grids are filtered by the information of the categories of CCs and corner point density. The remaining grids are used to compute vertical and horizontal grid projections. Thirdly, by analyzing the projections, the approximate location of the text area can be detected. Finally, the text area is extracted accurately by correcting the bounding box of the approximate text area. Experiments on the dataset of historical Tibetan document images demonstrate the effectiveness of the proposed method.

Year	DOI	Venue
2017	10.1007/978-981-10-7302-1_45	Communications in Computer and Information Science
Keywords	DocType	Volume
Historical Tibetan document,Text extraction,Connected components,Corner point	Conference	772
ISSN	Citations	PageRank
1865-0929	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xiqun Zhang	1	0	0.34
Lijuan Duan	2	215	26.13
Long-Long Ma	3	0	0.68
Jian Wu	4	9	4.12

1