Abstract | ||
---|---|---|
Building on recent advances in image caption generation and optical character recognition (OCR), we present a general-purpose, deep learning-based system to decompile an image into presentational markup. While this task is a well-studied problem in OCR, our method takes an inherently different, data-driven approach. Our model does not require any knowledge of the underlying markup language, and is simply trained end-to-end on real-world example data. The model employs a convolutional network for text and layout recognition in tandem with an attention-based neural machine translation system. To train and evaluate the model, we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup, as well as a synthetic dataset of web pages paired with HTML snippets. Experimental results show that the system is surprisingly effective at generating accurate markup for both datasets. While a standard domain-specific LaTeX OCR system achieves around 25% accuracy, our model reproduces the exact rendered image on 75% of examples. |
Year | Venue | Field |
---|---|---|
2016 | arXiv: Computer Vision and Pattern Recognition | Pattern recognition,Web page,Expression (mathematics),Computer science,Machine translation system,Optical character recognition,Artificial intelligence,Presentational and representational acting,Decompiler,Machine learning,Markup language |
DocType | Volume | Citations |
Journal | abs/1609.04938 | 5 |
PageRank | References | Authors |
0.67 | 12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
yuntian deng | 1 | 241 | 14.12 |
Anssi Kanervisto | 2 | 24 | 6.77 |
Alexander M. Rush | 3 | 1499 | 67.53 |