Title
Detecting In-line Mathematical Expressions in Scientific Documents.
Abstract
One of the issues in extracting natural language sentences from PDF documents is the identification of non-textual elements in a sentence. In this paper, we report our preliminary results on the identification of in-line mathematical expressions. We first construct a manually annotated corpus and apply conditional random field (CRF) for the math-zone identification using both layout features, such as font types, and linguistic features, such as context n-grams, obtained from PDF documents. Although our method is naive and uses a small amount of annotated training data, our method achieved an 88.95% F-measure compared with 22.81% for existing math OCR software.
Year
DOI
Venue
2017
10.1145/3103010.3121041
DocEng
Keywords
Field
DocType
PDF structure analysis, mathematical formula recognition, in-line mathematical expression detection, math IR, scientific paper mining
Training set,Conditional random field,Information retrieval,Expression (mathematics),Computer science,Software,Natural language,Natural language processing,Artificial intelligence,Sentence,Database,Typeface
Conference
ISBN
Citations 
PageRank 
978-1-4503-4689-4
3
0.41
References 
Authors
4
4
Name
Order
Citations
PageRank
Kenichi Iwatsuki132.10
Takeshi Sagara230.41
Tadayoshi Hara31189.54
Akiko N. Aizawa4678120.63