Title
A Novel Method of Text Line Segmentation for Historical Document Image of the Uchen Tibetan
Abstract
Text line segmentation is a key step in Tibetan historical document recognition. A novel method for text line segmentation was proposed based on the baseline in uchen Tibetan, and a new dataset was released, which was used to evaluate the results of text line segmentation of uchen Tibetan historical documents. In this paper, there were two steps for the proposed method: baseline detection and text line segmentation using the baseline. In baseline detection, the upper edges of all characters in the document were obtained by a horizontal gradient operator, then an edge connectivity definition was proposed by which the upper edge set was divided into disjoint subsets. Eligible sets were selected from these subsets, and the edges in these sets were joined in turn to obtain the baseline. In text line segmentation, the document image was truncated at the baseline position, then the adhesion regions were segmented again. Each connected region in the image was assigned to its nearest baseline. All connected regions belonging to the same baseline formed a text line. Experiments on the proposed dataset showed that the method could effectively avoid document distortion, the accuracy of text line segmentation was high, and the text line adhesion could be handled.
Year
DOI
Venue
2019
10.1016/j.jvcir.2019.01.021
Journal of Visual Communication and Image Representation
Keywords
Field
DocType
Tibetan historical document,Text line segmentation,Baseline,Upper edge,Connected region analysis,Dataset,Image processing
Computer vision,Disjoint sets,Pattern recognition,Segmentation,Operator (computer programming),Artificial intelligence,Distortion,Mathematics,Historical document
Journal
Volume
ISSN
Citations 
61
1047-3203
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Zhenjiang Li101.35
Weilan Wang2911.75
Yang Chen300.34
Yusheng Hao400.34