Online reinforcement learning for dynamic multimedia systems. - Citegraph

Paper Info

Title
Online reinforcement learning for dynamic multimedia systems.

Abstract
In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori, by modeling the system as a layered Markov decision process. In practice, however, these dynamics are unknown a priori and, therefore, must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. The two key challenges in this layered learning setting are: (i) each layer's learning performance is directly impacted by not only its own dynamics, but also by the learning processes of the other layers with which it interacts; and (ii) selecting a learning model that appropriately balances time-complexity (i.e., learning speed) with the multimedia system's limited memory and the multimedia application's real-time delay constraints. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and interlayer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform equally as well as centralized learning, while enabling the layers to act autonomously. Additionally, we show that existing application-independent reinforcement learning algorithms, and existing myopic learning algorithms deployed in multimedia systems, perform significantly worse than our proposed application-aware and foresighted learning methods.

Year	DOI	Venue
2010	10.1109/TIP.2009.2035228	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Keywords	DocType	Volume
multimedia system,multimedia system layer,online reinforcement,long-term performance,dynamic multimedia system,decentralized learning,complementary accelerated learning algorithm,real-time delay constraint,layered learning setting,multimedia application,centralized learning,real time,computational complexity,constraint optimization,real time systems,time complexity,markov processes,reinforcement learning,system performance,acceleration,design optimization,learning artificial intelligence,markov decision process,algorithm design and analysis	Journal	19
Issue	ISSN	Citations
2	1941-0042	7
PageRank	References	Authors
0.56	23	2

Authors (2 rows)

Cited by (7 rows)

References (23 rows)

Name	Order	Citations	PageRank
Nicholas Mastronarde	1	240	26.93
Mihaela Van Der Schaar	2	3968	352.59

1