Title
Online reinforcement learning for dynamic multimedia systems.
Abstract
In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori, by modeling the system as a layered Markov decision process. In practice, however, these dynamics are unknown a priori and, therefore, must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. The two key challenges in this layered learning setting are: (i) each layer's learning performance is directly impacted by not only its own dynamics, but also by the learning processes of the other layers with which it interacts; and (ii) selecting a learning model that appropriately balances time-complexity (i.e., learning speed) with the multimedia system's limited memory and the multimedia application's real-time delay constraints. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and interlayer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform equally as well as centralized learning, while enabling the layers to act autonomously. Additionally, we show that existing application-independent reinforcement learning algorithms, and existing myopic learning algorithms deployed in multimedia systems, perform significantly worse than our proposed application-aware and foresighted learning methods.
Year
DOI
Venue
2010
10.1109/TIP.2009.2035228
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Keywords
DocType
Volume
multimedia system,multimedia system layer,online reinforcement,long-term performance,dynamic multimedia system,decentralized learning,complementary accelerated learning algorithm,real-time delay constraint,layered learning setting,multimedia application,centralized learning,real time,computational complexity,constraint optimization,real time systems,time complexity,markov processes,reinforcement learning,system performance,acceleration,design optimization,learning artificial intelligence,markov decision process,algorithm design and analysis
Journal
19
Issue
ISSN
Citations 
2
1941-0042
7
PageRank 
References 
Authors
0.56
23
2
Name
Order
Citations
PageRank
Nicholas Mastronarde124026.93
Mihaela Van Der Schaar23968352.59