Robustly detect different types of text in videos - Citegraph

Paper Info

Title
Robustly detect different types of text in videos

Abstract
Text in videos can be categorized into three types: overlaid text, layered text, and scene text. Existing detection methods focus on a specific type of text and cannot obtain a good performance when working on other text types. To our knowledge, few works explore to build a system to simultaneously detect all types of text. In this paper, we present a unified video text detector, which can simultaneously localize all types of text in videos accurately. Our system consists of a spatial text detector and a temporal fusion filter. First, we explore to use three different strategies to learn the spatial text detector based on deep convolutional neural networks, so that it can simultaneously detect various texts without knowing the types of text. Then, a new area-first non-maximum suppression computation combined with multiple constraints is proposed to remove the redundant bounding boxes. Finally, the temporal fusion filter exploits the features of spatial locations and text components to integrate the detection results of consecutive frames to further remove false positives. To validate the proposed approach, comprehensive experiments are carried out on three publicly available datasets, consisting of overlaid text, layered text, and scene text. The experimental results demonstrate that our method consistently achieves the best performance compared with state-of-the-art methods.

Year	DOI	Venue
2020	10.1007/s00521-020-04729-6	NEURAL COMPUTING & APPLICATIONS
Keywords	DocType	Volume
Video text detector,Temporal consistency,Spatial location,Component representation	Journal	32.0
Issue	ISSN	Citations
16.0	0941-0643	0
PageRank	References	Authors
0.34	0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yuanqiang Cai	1	2	2.05
Weiqiang Wang	2	13	8.65

1