Title
Content-based indexing and retrieval method of Chinese document images
Abstract
In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is a Chinese document image (non-ASCII file), we may first convert the document image into the text file by using Chinese optical character recognition (OCR) technology and then index the document by using an information retrieval algorithm. However, OCR needs more time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate the stroke density of each Chinese character image, and at last attain the stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers a retrieval method for Chinese document images based on the index technology. We discuss the index and retrieval method for duplicate detection. We have proved the validity of the index method through its application to keyword spotting and duplicate detection
Year
DOI
Venue
1999
10.1109/ICDAR.1999.791880
ICDAR-1
Keywords
Field
DocType
content-based indexing,chinese document image,chinese information retrieval,stroke density,visual databases,stroke density code,image segmentation,information retrieval,document image,ocr,database indexing,chinese character image,chinese text document,text document,keyword spotting,index method,optical character recognition,chinese document images,text file,retrieval method,text document segmentation,document image processing,content-based retrieval,duplicate detection,robustness,helium,image retrieval,image recognition,indexation,indexing
Inverted index,Information retrieval,Pattern recognition,Document clustering,Computer science,Document layout analysis,Search engine indexing,Optical character recognition,Image retrieval,Artificial intelligence,Document retrieval,Visual Word
Conference
ISBN
Citations 
PageRank 
0-7695-0318-7
19
1.49
References 
Authors
5
4
Name
Order
Citations
PageRank
Yaodong He1191.49
Zao Jiang2201.85
Bing Liu3193.52
Hong Zhao45012.51