Content characterization using word shape tokens - Citegraph

Paper Info

Title
Content characterization using word shape tokens

Abstract
By quickly classifying character images into character shape categories, it is possible to automatically extract syntactic information from the text of document images without optical character recognition. Using word shape tokens composed of these character shape codes, a properly trained text tagger can extract part-of-speech information from scanned document images. Later components of a document processing system can then use this information to locate topics, characterize document style, and assist in information retrieval.

Year	DOI	Venue
1994	10.3115/991250.991255	COLING
Keywords	Field	DocType
information retrieval,content characterization,optical character recognition,document processing system,character shape code,document style,document image,classifying character image,word shape token,scanned document image,part-of-speech information,character shape category,part of speech,document processing	Pattern recognition,Computer science,Document processing,Optical character recognition,Speech recognition,Natural language processing,Artificial intelligence,Syntax	Conference
Volume	Citations	PageRank
C94-2	3	3.76
References	Authors
2	2

Authors (2 rows)

Cited by (3 rows)

References (2 rows)

Name	Order	Citations	PageRank
Penelope Sibun	1	284	187.65
David S. Farrar	2	3	3.76

1