Noam Shazeer - Citegraph

Author Info

Name	Papers	Collaborators
NOAM SHAZEER	37	129
Citations	PageRank	Referers
1089	43.70	3655
Referees	References
522	236

Search Limit

1001000

Publications (37 rows)

Collaborators (100 rows)

Referers (100 rows)

Referees (100 rows)

Title	Citations	PageRank	Year
Do Transformer Modifications Transfer Across Implementations and Applications?	0	0.34	2021
Searching for Efficient Transformers for Language Modeling.	0	0.34	2021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding	0	0.34	2021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.	0	0.34	2021
How Much Knowledge Can You Pack Into the Parameters of a Language Model?	0	0.34	2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	2	0.46	2020
Corpora Generation for Grammatical Error Correction.	0	0.34	2019
Music Transformer: Generating Music with Long-Term Structure.	0	0.34	2019
Music Transformer - Generating Music with Long-Term Structure.	4	0.42	2019
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.	7	0.46	2018
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation.	2	0.41	2018
Generating Wikipedia by Summarizing Long Sequences.	18	0.62	2018
Blockwise Parallel Decoding for Deep Autoregressive Models.	3	0.37	2018
Weakly Supervised Grammatical Error Correction using Iterative Decoding.	0	0.34	2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.	2	0.46	2018
Generating Wikipedia by Summarizing Long Sequences.	0	0.34	2018
Tensor2Tensor for Neural Machine Translation.	19	0.70	2018
Image Transformer.	0	0.34	2018
Fast Decoding in Sequence Models using Discrete Latent Variables.	11	0.57	2018
Mesh-TensorFlow: Deep Learning for Supercomputers.	4	0.40	2018
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.	95	3.02	2017
Attention Is All You Need.	432	6.52	2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.	0	0.34	2017
One Model To Learn Them All.	29	0.76	2017
Exploring the Limits of Language Modeling.	153	5.35	2016
Sparse Non-negative Matrix Language Modeling.	1	0.36	2016
Swivel: Improving Embeddings by Noticing What's Missing.	11	0.55	2016
Sparse non-negative matrix language modeling for geo-annotated query session data	1	0.37	2015
Pruning Sparse Non-Negative Matrix N-Gram Language Models	1	0.37	2015
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks	208	6.26	2015
Sparse Non-Negative Matrix Language Modeling For Skip-Grams	7	0.51	2015
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation.	4	0.47	2014
Variational Program Inference	1	0.42	2010
A probabilistic approach to solving crossword puzzles	29	2.12	2002
Solving Crosswords with PROVERB	4	1.60	1999
Solving crossword puzzles as probabilistic constraint satisfaction	16	2.48	1999
PROVERB: The Probabilistic Cruciverbalist	25	3.97	1999