Abstract | ||
---|---|---|
•A clustering framework is proposed for clustering noisy form images.•Novel algorithms for matching text lines and rule lines are introduced.•We show 44% improvement over the state-of-the-art on 5 datasets of historical forms.•Sampling and bootstrapping is employed for scalability to large datasets. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.patcog.2018.10.004 | Pattern Recognition |
Keywords | Field | DocType |
Form processing,Document analysis,Document image clustering,Historical document processing,Clustering | Fuzzy clustering,Canopy clustering algorithm,CURE data clustering algorithm,Clustering high-dimensional data,Correlation clustering,Pattern recognition,Artificial intelligence,Brown clustering,Cluster analysis,Mathematics,Visual Word | Journal |
Volume | Issue | ISSN |
87 | 1 | 0031-3203 |
Citations | PageRank | References |
0 | 0.34 | 36 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chris Tensmeyer | 1 | 20 | 4.83 |
Tony R. Martinez | 2 | 1364 | 100.44 |