Name
Papers
Collaborators
MOHAMMAD GHAVAMZADEH
105
172
Citations 
PageRank 
Referers 
814
67.73
1301
Referees 
References 
1699
1180
Search Limit
1001000
Title
Citations
PageRank
Year
Deep Hierarchy in Bandits.00.342022
Hierarchical Bayesian Bandits00.342022
Fixed-Budget Best-Arm Identification in Structured Bandits00.342022
Feature and Parameter Selection in Stochastic Linear Bandits.00.342022
Multi-Environment Meta-Learning in Stochastic Linear Bandits.00.342022
Thompson Sampling with a Mixture Prior00.342022
Mirror Descent Policy Optimization00.342022
Control-Aware Representations for Model-based Reinforcement Learning00.342021
A review of uncertainty quantification in deep learning: Techniques, applications and challenges110.802021
Variational Model-based Policy Optimization.00.342021
Pid Accelerated Value Iteration Algorithm00.342021
Deep Bayesian Quadrature Policy Optimization00.342021
Control-Aware Representations for Model-based Reinforcement Learning.00.342021
Neural Lyapunov Redesign.00.342021
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control.00.342020
Adaptive Sampling for Estimating Probability Distributions00.342020
Predictive Coding for Locally-Linear Control00.342020
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control00.342020
Multi-Step Greedy Reinforcement Learning Algorithms00.342020
Lyapunov-based Safe Policy Optimization for Continuous Control.00.342019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits.00.342019
Perturbed-History Exploration in Stochastic Linear Bandits.00.342019
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies.00.342019
Active Learning for Binary Classification with Abstention.00.342019
Binary Classification with Bounded Abstention Rate.00.342019
Randomized Exploration in Generalized Linear Bandits.00.342019
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization.00.342018
Optimizing over a Restricted Policy Class in Markov Decision Processes.20.412018
Proximal gradient temporal difference learning: stable reinforcement learning with polynomial sample complexity00.342018
PAC Bandits with Risk Constraints.00.342018
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.00.342018
Disentangling Dynamics and Content for Control and Planning.00.342017
Model-Independent Online Learning for Influence Maximization.20.402017
Conservative Contextual Linear Bandits.00.342017
Active Learning for Accurate Estimation of Linear Models.20.372017
Bottleneck Conditional Density Estimation.00.342017
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing.20.362017
Online Learning to Rank in Stochastic Click Models.70.472017
Importance of Recommendation Policy Space in Addressing Click Sparsity in Personalized Advertisement Display.00.342017
Automated Data Cleansing through Meta-Learning.00.342017
Sequential Decision Making With Coherent Risk.40.412017
Online Learning to Rank in Stochastic Click Models.00.342017
Diffusion Independent Semi-Bandit Influence Maximization.00.342017
Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs40.452016
Personalized Advertisement Recommendation: A Ranking Approach to Address the Ubiquitous Click Sparsity Problem.00.342016
Graphical Model Sketch.20.402016
Proximal Gradient Temporal Difference Learning Algorithms.10.372016
Bayesian Policy Gradient and Actor-Critic Algorithms30.402016
Analysis of Classification-based Policy Iteration Algorithms.291.272016
Regularized Policy Iteration with Nonparametric Function Spaces.10.342016
  • 1
  • 2