Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training | 0 | 0.34 | 2022 |
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training | 0 | 0.34 | 2022 |
Unpaired Image Captioning With semantic-Constrained Self-Learning | 0 | 0.34 | 2022 |
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection | 0 | 0.34 | 2022 |
3D Cascade RCNN: High Quality Object Detection in Point Clouds | 0 | 0.34 | 2022 |
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning. | 0 | 0.34 | 2022 |
Dynamic Temporal Filtering in Video Models. | 0 | 0.34 | 2022 |
SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement. | 0 | 0.34 | 2022 |
Stand-Alone Inter-Frame Attention in Video Models | 0 | 0.34 | 2022 |
Comprehending and Ordering Semantics for Image Captioning. | 0 | 0.34 | 2022 |
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics | 0 | 0.34 | 2021 |
A Style and Semantic Memory Mechanism for Domain Generalization*. | 0 | 0.34 | 2021 |
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising | 1 | 0.35 | 2021 |
Core-Text - Improving Scene Text Detection with Contrastive Relational Reasoning. | 0 | 0.34 | 2021 |
Transferrable Contrastive Learning for Visual Domain Adaptation | 0 | 0.34 | 2021 |
Scheduled Sampling In Vision-Language Pretraining With Decoupled Encoder-Decoder Network | 0 | 0.34 | 2021 |
Smart Director: An Event-Driven Directing System for Live Broadcasting | 1 | 0.39 | 2021 |
Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration. | 0 | 0.34 | 2021 |
Single Shot Video Object Detector | 3 | 0.37 | 2021 |
Representing Videos as Discriminative Sub-graphs for Action Recognition | 0 | 0.34 | 2021 |
Seco: Exploring Sequence Supervision For Unsupervised Representation Learning | 0 | 0.34 | 2021 |
Joint Contrastive Learning with Infinite Possibilities | 0 | 0.34 | 2020 |
Learning a Unified Sample Weighting Network for Object Detection | 0 | 0.34 | 2020 |
Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation | 0 | 0.34 | 2020 |
iDirector: An Intelligent Directing System for Live Broadcast | 1 | 0.35 | 2020 |
Exploring Depth Information for Spatial Relation Recognition | 0 | 0.34 | 2020 |
Deep Metric Learning With Density Adaptivity. | 1 | 0.35 | 2020 |
X-Linear Attention Networks for Image Captioning | 12 | 0.54 | 2020 |
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation. | 2 | 0.40 | 2019 |
Hierarchy Parsing For Image Captioning | 15 | 0.57 | 2019 |
Pointing Novel Objects In Image Captioning | 4 | 0.41 | 2019 |
vireoJD-MM at Activity Detection in Extended Videos. | 0 | 0.34 | 2019 |
daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices | 7 | 0.44 | 2019 |
Transferrable Prototypical Networks For Unsupervised Domain Adaptation | 20 | 0.53 | 2019 |
Exploring Object Relation In Mean Teacher For Cross-Domain Detection | 13 | 0.49 | 2019 |
Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019. | 0 | 0.34 | 2019 |
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning | 2 | 0.36 | 2019 |
VireoJD-MM @ TRECVid 2019 - Activities in Extended Video (ActEV). | 0 | 0.34 | 2019 |
Mocycle-GAN: Unpaired Video-to-Video Translation | 6 | 0.45 | 2019 |
Animating Your Life: Real-Time Video-to-Animation Translation | 0 | 0.34 | 2019 |
Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention. | 1 | 0.35 | 2019 |
Deep Semantic Hashing with Generative Adversarial Networks | 26 | 0.79 | 2018 |
To Create What You Tell: Generating Videos from Captions | 7 | 0.57 | 2017 |
Seeing Bot | 1 | 0.36 | 2017 |
Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure. | 12 | 0.59 | 2016 |
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search | 22 | 0.73 | 2015 |
Jointly Modeling Embedding And Translation To Bridge Video And Language | 135 | 3.07 | 2015 |
Click-through-based Subspace Learning for Image Search | 12 | 0.54 | 2014 |
Click-through-based cross-view learning for image search | 43 | 1.31 | 2014 |
Image search by graph-based label propagation with image representation from DNN | 10 | 0.56 | 2013 |