Microsoft Research Asia, Beijing, China
Search Limit
Finding the Dominant Winning Ticket in Pre-Trained Language Models00.342022
ALLSH: Active Learning Guided by Local Sensitivity and Hardness.00.342022
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models00.342022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation00.342022
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance.00.342022
DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation00.342022
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing00.342022
Controllable Natural Language Generation with Contrastive Prefixes00.342022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models00.342022
LoRA: Low-Rank Adaptation of Large Language Models00.342022
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering00.342022
Adversarial Retriever-Ranker for Dense Text Retrieval.00.342022
What Makes Good In-Context Examples for GPT-3?00.342022
TAPEX: Table Pre-training via Learning a Neural SQL Executor00.342022
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge.00.342022
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation00.342022
Token-wise Curriculum Learning for Neural Machine Translation.00.342021
MixKD: Towards Efficient Distillation of Large-scale Language Models00.342021
Adversarial Regularization as Stackelberg Game - An Unrolled Optimization Approach.00.342021
GLGE - A New General Language Generation Evaluation Benchmark.00.342021
Poolingformer: Long Document Modeling with Pooling Attention00.342021
Few-Shot Named Entity Recognition - An Empirical Baseline Study.00.342021
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining00.342021
Finetuning Pretrained Transformers into RNNs.00.342021
ARCH - Efficient Adversarial Regularized Training with Caching.00.342021
NeurIPS 2020 EfficientQA Competition - Systems, Analyses and Lessons Learned.00.342021
Memory-Efficient Differentiable Transformer Architecture Search.00.342021
Reader-Guided Passage Reranking for Open-Domain Question Answering.00.342021
Contextual Bandit Applications in a Customer Support Bot10.432021
DeBERTa: Decoding-enhanced BERT with Disentangled Attention00.342021
On the Variance of the Adaptive Learning Rate and Beyond40.432020
Understanding the Difficulty of Training Transformers.00.342020
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning.00.342020
The Microsoft Toolkit Of Multi-Task Deep Neural Networks For Natural Language Understanding00.342020
Parameter-free Sentence Embedding via Orthogonal Basis10.352019
Lessons from Real-World Reinforcement Learning in a Customer Support Bot.00.342019
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering.10.352019
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding.40.392019
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering.20.362018
Zero-training Sentence Embedding via Orthogonal Basis.00.342018
IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles.20.362018
Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization.10.362017
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension.180.662017
ReasoNet: Learning to Stop Reading in Machine Comprehension.642.102016
Large-scale L-BFGS using MapReduce.120.752014
Transfer Understanding from Head Queries to Tail Queries60.402014
Beyond ten blue links: enabling user click modeling in federated web search360.992012
A noise-aware click model for web search100.562012
Personalized click model through collaborative filtering270.882012
Short text conceptualization using a probabilistic knowledgebase963.222011
  • 1
  • 2