Finding the Dominant Winning Ticket in Pre-Trained Language Models | 0 | 0.34 | 2022 |
ALLSH: Active Learning Guided by Local Sensitivity and Hardness. | 0 | 0.34 | 2022 |
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models | 0 | 0.34 | 2022 |
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | 0 | 0.34 | 2022 |
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance. | 0 | 0.34 | 2022 |
DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation | 0 | 0.34 | 2022 |
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing | 0 | 0.34 | 2022 |
Controllable Natural Language Generation with Contrastive Prefixes | 0 | 0.34 | 2022 |
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models | 0 | 0.34 | 2022 |
LoRA: Low-Rank Adaptation of Large Language Models | 0 | 0.34 | 2022 |
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering | 0 | 0.34 | 2022 |
Adversarial Retriever-Ranker for Dense Text Retrieval. | 0 | 0.34 | 2022 |
What Makes Good In-Context Examples for GPT-3? | 0 | 0.34 | 2022 |
TAPEX: Table Pre-training via Learning a Neural SQL Executor | 0 | 0.34 | 2022 |
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge. | 0 | 0.34 | 2022 |
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation | 0 | 0.34 | 2022 |
Token-wise Curriculum Learning for Neural Machine Translation. | 0 | 0.34 | 2021 |
MixKD: Towards Efficient Distillation of Large-scale Language Models | 0 | 0.34 | 2021 |
Adversarial Regularization as Stackelberg Game - An Unrolled Optimization Approach. | 0 | 0.34 | 2021 |
GLGE - A New General Language Generation Evaluation Benchmark. | 0 | 0.34 | 2021 |
Poolingformer: Long Document Modeling with Pooling Attention | 0 | 0.34 | 2021 |
Few-Shot Named Entity Recognition - An Empirical Baseline Study. | 0 | 0.34 | 2021 |
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining | 0 | 0.34 | 2021 |
Finetuning Pretrained Transformers into RNNs. | 0 | 0.34 | 2021 |
ARCH - Efficient Adversarial Regularized Training with Caching. | 0 | 0.34 | 2021 |
NeurIPS 2020 EfficientQA Competition - Systems, Analyses and Lessons Learned. | 0 | 0.34 | 2021 |
Memory-Efficient Differentiable Transformer Architecture Search. | 0 | 0.34 | 2021 |
Reader-Guided Passage Reranking for Open-Domain Question Answering. | 0 | 0.34 | 2021 |
Contextual Bandit Applications in a Customer Support Bot | 1 | 0.43 | 2021 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | 0 | 0.34 | 2021 |
On the Variance of the Adaptive Learning Rate and Beyond | 4 | 0.43 | 2020 |
Understanding the Difficulty of Training Transformers. | 0 | 0.34 | 2020 |
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning. | 0 | 0.34 | 2020 |
The Microsoft Toolkit Of Multi-Task Deep Neural Networks For Natural Language Understanding | 0 | 0.34 | 2020 |
Parameter-free Sentence Embedding via Orthogonal Basis | 1 | 0.35 | 2019 |
Lessons from Real-World Reinforcement Learning in a Customer Support Bot. | 0 | 0.34 | 2019 |
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering. | 1 | 0.35 | 2019 |
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. | 4 | 0.39 | 2019 |
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering. | 2 | 0.36 | 2018 |
Zero-training Sentence Embedding via Orthogonal Basis. | 0 | 0.34 | 2018 |
IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles. | 2 | 0.36 | 2018 |
Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization. | 1 | 0.36 | 2017 |
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension. | 18 | 0.66 | 2017 |
ReasoNet: Learning to Stop Reading in Machine Comprehension. | 64 | 2.10 | 2016 |
Large-scale L-BFGS using MapReduce. | 12 | 0.75 | 2014 |
Transfer Understanding from Head Queries to Tail Queries | 6 | 0.40 | 2014 |
Beyond ten blue links: enabling user click modeling in federated web search | 36 | 0.99 | 2012 |
A noise-aware click model for web search | 10 | 0.56 | 2012 |
Personalized click model through collaborative filtering | 27 | 0.88 | 2012 |
Short text conceptualization using a probabilistic knowledgebase | 96 | 3.22 | 2011 |