Name
Papers
Collaborators
NOAM SHAZEER
37
129
Citations 
PageRank 
Referers 
1089
43.70
3655
Referees 
References 
522
236
Search Limit
1001000
Title
Citations
PageRank
Year
Do Transformer Modifications Transfer Across Implementations and Applications?00.342021
Searching for Efficient Transformers for Language Modeling.00.342021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding00.342021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.00.342021
How Much Knowledge Can You Pack Into the Parameters of a Language Model?00.342020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer20.462020
Corpora Generation for Grammatical Error Correction.00.342019
Music Transformer: Generating Music with Long-Term Structure.00.342019
Music Transformer - Generating Music with Long-Term Structure.40.422019
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.70.462018
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation.20.412018
Generating Wikipedia by Summarizing Long Sequences.180.622018
Blockwise Parallel Decoding for Deep Autoregressive Models.30.372018
Weakly Supervised Grammatical Error Correction using Iterative Decoding.00.342018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.20.462018
Generating Wikipedia by Summarizing Long Sequences.00.342018
Tensor2Tensor for Neural Machine Translation.190.702018
Image Transformer.00.342018
Fast Decoding in Sequence Models using Discrete Latent Variables.110.572018
Mesh-TensorFlow: Deep Learning for Supercomputers.40.402018
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.953.022017
Attention Is All You Need.4326.522017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.00.342017
One Model To Learn Them All.290.762017
Exploring the Limits of Language Modeling.1535.352016
Sparse Non-negative Matrix Language Modeling.10.362016
Swivel: Improving Embeddings by Noticing What's Missing.110.552016
Sparse non-negative matrix language modeling for geo-annotated query session data10.372015
Pruning Sparse Non-Negative Matrix N-Gram Language Models10.372015
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks2086.262015
Sparse Non-Negative Matrix Language Modeling For Skip-Grams70.512015
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation.40.472014
Variational Program Inference10.422010
A probabilistic approach to solving crossword puzzles292.122002
Solving Crosswords with PROVERB41.601999
Solving crossword puzzles as probabilistic constraint satisfaction162.481999
PROVERB: The Probabilistic Cruciverbalist253.971999