Dynamic Computation Offloading With Energy Harvesting Devices: A Hybrid-Decision-Based Deep Reinforcement Learning Approach - Citegraph

Paper Info

Title
Dynamic Computation Offloading With Energy Harvesting Devices: A Hybrid-Decision-Based Deep Reinforcement Learning Approach

Abstract
Mobile-edge computing (MEC) with energy harvesting (EH) is becoming an emerging paradigm to improve the computation experience for the Internet-of-Things (IoT) devices. For a multidevice multiserver MEC system, the frequently varied harvested energy, along with changeable computation task loads and time-varying computation capacities of servers, increase the system’s dynamic. Therefore, each device should learn to make coordinated actions, such as the offloading ratio, local computation capacity, and server selection, to achieve a satisfactory computation quality. Thus, the MEC system with EH devices is highly dynamic and face two challenges: 1) continuous–discrete hybrid action spaces and 2) coordination among devices. To deal with such problem, we propose two deep reinforcement learning (DRL)-based algorithms: 1) hybrid-decision-based actor–critic learning (Hybrid-AC) and 2) multidevice hybrid-AC (MD-Hybrid-AC) for dynamic computation offloading. Hybrid-AC solves the hybrid action space with an improvement of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">actor–critic</italic> architecture, where the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${{actor}}$ </tex-math></inline-formula> outputs continuous actions (offloading ratio and local computation capacity) corresponding to every server, and the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${{critic}}$ </tex-math></inline-formula> evaluates the continuous actions and outputs the discrete action of server selection. MD-Hybrid-AC adopts the framework of centralized training with decentralized execution. It learns coordinated decisions by constructing a centralized <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${{critic}}$ </tex-math></inline-formula> to output server selections, which considers the continuous action policies of all devices. Simulation results show that the proposed algorithms achieve a good balance between consumed time and energy, and have a significant performance improvement compared with baseline offloading policies.

Year	DOI	Venue
2020	10.1109/JIOT.2020.3000527	IEEE Internet of Things Journal
Keywords	DocType	Volume
Servers,Task analysis,Internet of Things,Heuristic algorithms,Computational modeling,Performance evaluation,Optimization	Journal	7
Issue	ISSN	Citations
10	2327-4662	1
PageRank	References	Authors
0.35	0	4

Authors (4 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jing Zhang	1	1	0.35
Jun Du	2	21	5.67
Yuan Shen	3	1151	111.52
Jian Wang	4	175	21.03

1