Abstract | ||
---|---|---|
•aPOMDP controls an agent’s actions to maintain the user in maximum value states.•Three reward functions based on state value and entropy are proposed and compared.•Online learning of the transition matrix T is done through a knowledge update step.•User stays in most valuable states up to 71% of the time, lowering T entropy to 0.7.•User tests show that the technique is transferable to real scenarios with robots. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.patrec.2018.03.011 | Pattern Recognition Letters |
Keywords | DocType | Volume |
Social robots,POMDPs,Automated planning,Decision making,Machine learning | Journal | 118 |
ISSN | Citations | PageRank |
0167-8655 | 1 | 0.35 |
References | Authors | |
23 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gonçalo S. Martins | 1 | 9 | 2.86 |
Hend Al Tair | 2 | 1 | 1.03 |
Luís Santos | 3 | 110 | 14.58 |
Jorge Dias | 4 | 175 | 33.83 |