Title
Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis.
Abstract
Text line segmentation is one of the critical content in handwriting documents recognition especially in the historical documents’ analysis and recognition. Because of the low quality and the complexity of these documents (background noise, scattered character, touching components between consecutive lines), automatic text line segmentation remains to be a hot spot for researching. In this paper we propose a new method to segment the text line from the historical Tibetan scripture “kangjur” of the Beijing version on the paper by means of woodcut. This method first performs document image skew detection and correction, using projection profiles to get the baseline of text line, then the connected component is allocated to text line according to the location relationship. For some connected components, analyzing their location and sharp to assign these connected components correctly. This method using connected component instead of pixels, avoiding the noise generated by splitting characters. Experiments show that this method is effective in copes with touching text lines and promising in text line segmentation from historical Tibetan document.
Year
Venue
Field
2018
PRCV
Background noise,Handwriting,Pattern recognition,Segmentation,Computer science,Pixel,Connected component,Skew,Artificial intelligence,Connected-component labeling,Component analysis
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
5
5
Name
Order
Citations
PageRank
Wang Yiqun122617.63
Weilan Wang2911.75
Zhenjiang Li3194.81
Yuehui Han401.01
Xiao-Juan Wang5228.34