Abstract | ||
---|---|---|
We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/TAC.2016.2644871 | IEEE Trans. Automat. Contr. |
Keywords | Field | DocType |
Markov processes,Heuristic algorithms,Optimization,Dynamic programming,Standards,Electronic mail,Random variables | Dynamic programming,Mathematical optimization,Random variable,Parameterized complexity,Markov process,Computer science,Markov decision process,Sampling (statistics),Reinforcement learning,Expected shortfall | Journal |
Volume | Issue | ISSN |
62 | 7 | 0018-9286 |
Citations | PageRank | References |
4 | 0.41 | 31 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Aviv Tamar | 1 | 275 | 24.04 |
Chow, Yinlam | 2 | 98 | 14.03 |
Mohammad Ghavamzadeh | 3 | 814 | 67.73 |
Shie Mannor | 4 | 3340 | 285.45 |