Name
Papers
Collaborators
LEE, JASON D.
85
126
Citations 
PageRank 
Referers 
711
48.29
1198
Referees 
References 
1010
795
Search Limit
1001000
Title
Citations
PageRank
Year
Neural Networks can Learn Representations with Gradient Descent.00.342022
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games00.342022
Optimization-Based Separations for Neural Networks.00.342022
Offline Reinforcement Learning with Realizability and Single-policy Concentrability.00.342022
Impact of Representation Learning in Linear Bandits00.342021
A Theory Of Label Propagation For Subpopulation Shift00.342021
Modeling from Features - a Mean-field Framework for Over-parameterized Deep Neural Networks.00.342021
Near-Optimal Linear Regression Under Distribution Shift00.342021
Shape Matters - Understanding the Implicit Bias of the Noise Covariance.00.342021
On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift00.342021
Few-Shot Learning via Learning the Representation, Provably00.342021
Kernel and Rich Regimes in Overparametrized Models00.342020
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation.00.342020
How to Characterize The Landscape of Overparameterized Convolutional Neural Networks.00.342020
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks00.342020
Beyond Lazy Training for Over-parameterized Tensor Decomposition00.342020
Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters00.342020
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot00.342020
SGD Learns One-Layer Networks in WGANs00.342020
Optimal transport mapping via input convex neural networks00.342020
Generalized Leverage Score Sampling for Neural Networks00.342020
Implicit Bias in Deep Linear Classification - Initialization Scale vs Training Accuracy.00.342020
Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes00.342020
Towards Understanding Hierarchical Learning: Benefits of Neural Representations00.342020
Agnostic $Q$-learning with Function Approximation in Deterministic Systems - Near-Optimal Bounds on Approximation Error and Sample Complexity.00.342020
Convergence of Adversarial Training in Overparametrized Neural Networks10.352019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models.30.382019
Neural Temporal-Difference Learning Converges to Global Optima.40.392019
Convergence of Adversarial Training in Overparametrized Networks.00.342019
Multi-Turn Beam Search for Neural Dialogue Modeling.00.342019
First-order methods almost always avoid strict saddle points70.552019
Kernel and Deep Regimes in Overparametrized Models.00.342019
Stochastic subgradient method converges on tame functions.100.592018
Characterizing Implicit Bias in Terms of Optimization Geometry.90.492018
Adding One Neuron Can Eliminate All Bad Local Minima.40.382018
On the Margin Theory of Feedforward Neural Networks.90.492018
Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition.20.362018
Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks.10.352018
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport.40.412018
Gradient Descent Finds Global Minima of Deep Neural Networks.300.692018
Implicit Bias of Gradient Descent on Linear Convolutional Networks.130.512018
Convergence of Gradient Descent on Separable Data.40.452018
On the Power of Over-parametrization in Neural Networks with Quadratic Activation.200.632018
Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization.20.372018
Provably Correct Automatic Subdifferentiation for Qualified Programs.00.342018
No Spurious Local Minima in a Two Hidden Unit ReLU Network10.352018
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced.100.652018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement.110.502018
Solving Approximate Wasserstein GANs to Stationarity.10.352018
Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement.10.382017
  • 1
  • 2