Detecting In-line Mathematical Expressions in Scientific Documents. - Citegraph

Paper Info

Title
Detecting In-line Mathematical Expressions in Scientific Documents.

Abstract
One of the issues in extracting natural language sentences from PDF documents is the identification of non-textual elements in a sentence. In this paper, we report our preliminary results on the identification of in-line mathematical expressions. We first construct a manually annotated corpus and apply conditional random field (CRF) for the math-zone identification using both layout features, such as font types, and linguistic features, such as context n-grams, obtained from PDF documents. Although our method is naive and uses a small amount of annotated training data, our method achieved an 88.95% F-measure compared with 22.81% for existing math OCR software.

Year	DOI	Venue
2017	10.1145/3103010.3121041	DocEng
Keywords	Field	DocType
PDF structure analysis, mathematical formula recognition, in-line mathematical expression detection, math IR, scientific paper mining	Training set,Conditional random field,Information retrieval,Expression (mathematics),Computer science,Software,Natural language,Natural language processing,Artificial intelligence,Sentence,Database,Typeface	Conference
ISBN	Citations	PageRank
978-1-4503-4689-4	3	0.41
References	Authors
4	4

Authors (4 rows)

Cited by (3 rows)

References (4 rows)

Name	Order	Citations	PageRank
Kenichi Iwatsuki	1	3	2.10
Takeshi Sagara	2	3	0.41
Tadayoshi Hara	3	118	9.54
Akiko N. Aizawa	4	678	120.63

1