Title
What You Get Is What You See: A Visual Markup Decompiler.
Abstract
Building on recent advances in image caption generation and optical character recognition (OCR), we present a general-purpose, deep learning-based system to decompile an image into presentational markup. While this task is a well-studied problem in OCR, our method takes an inherently different, data-driven approach. Our model does not require any knowledge of the underlying markup language, and is simply trained end-to-end on real-world example data. The model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. To train and evaluate the model, we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup, as well as a synthetic dataset of web pages paired with HTML snippets. Experimental results show that the system is surprisingly effective at generating accurate markup for both datasets. While a standard domain-specific LaTeX OCR system achieves around 25% accuracy, our model reproduces the exact rendered image on 75% of examples.
Year
Venue
Field
2016
arXiv: Computer Vision and Pattern Recognition
Pattern recognition,Web page,Expression (mathematics),Computer science,Machine translation system,Optical character recognition,Artificial intelligence,Presentational and representational acting,Decompiler,Machine learning,Markup language
DocType
Volume
Citations 
Journal
abs/1609.04938
5
PageRank 
References 
Authors
0.67
12
3
Name
Order
Citations
PageRank
yuntian deng124114.12
Anssi Kanervisto2246.77
Alexander M. Rush3149967.53