Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization For Sign Language Translation - Citegraph

Paper Info

Title
Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization For Sign Language Translation

Abstract
Vision-based sign language translation (SLT) is a challenging task due to the complicated variations of facial expressions, gestures, and articulated poses involved in sign linguistics. As a weakly supervised sequence-to-sequence learning problem, in SLT there are usually no exact temporal boundaries of actions. To adequately explore temporal hints in videos, we propose a novel framework named Hierarchical deep Recurrent Fusion (HRF). Aiming at modeling discriminative action patterns, in HRF we design an adaptive temporal encoder to capture crucial RGB visemes and skeleton signees. Specifically, RGB visemes and skeleton signees are learned by the same scheme named Adaptive Clip Summarization (ACS), respectively. ACS consists of three key modules, i.e., variable-length clip mining, adaptive temporal pooling, and attention-aware weighting. Besides, based on unaligned action patterns (RGB visemes and skeleton signees), a query-adaptive decoding fusion is proposed to translate the target sentence. Extensive experiments demonstrate the effectiveness of the proposed HRF framework.

Year	DOI	Venue
2020	10.1109/TIP.2019.2941267	IEEE TRANSACTIONS ON IMAGE PROCESSING
Keywords	DocType	Volume
Sign language translation, hierarchical adaptive temporal network, adaptive clip summarization, temporal pooling, score fusion	Journal	29
Issue	ISSN	Citations
1	1057-7149	1
PageRank	References	Authors
0.35	23	5

Authors (5 rows)

Cited by (1 rows)

References (23 rows)

Name	Order	Citations	PageRank
Dan Guo	1	70	11.32
Wengang Zhou	2	22	12.93
Anyang Li	3	1	0.35
Houqiang Li	4	2090	172.30
Meng Wang	5	3094	167.38

1