Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems - Citegraph

Paper Info

Title
Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems

Abstract
Reinforcement learning (RL) is increasingly being used to optimize resource-constrained wireless Internet of Things (IoT) devices. However, existing RL algorithms that are lightweight enough to be implemented on these devices, such as <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning, converge too slowly to effectively adapt to the experienced information source and channel dynamics, while deep RL algorithms are too complex to be implemented on these devices. By integrating basic models of the IoT system into the learning process, the so-called postdecision state (PDS)-based RL can achieve faster convergence speeds than these alternative approaches at lower complexity than deep RL; however, its complexity may still hinder the real-time and energy-efficient operations on IoT devices. In this article, we develop efficient hardware accelerators for PDS-based RL. We first develop an arithmetic hardware acceleration architecture and then propose a stochastic computing (SC)-based reconfigurable hardware architecture. By using simple bitwise computations enabled by SC, we eliminate costly multiplications involved in PDS learning, which simultaneously reduces the hardware area and power consumption. We show that the computational efficiency can be further improved by using extremely short stochastic representations without sacrificing learning performance. We demonstrate our proposed approach on a simulated wireless IoT sensor that must transmit delay-sensitive data over a fading channel while minimizing its energy consumption. Our experimental results show that our arithmetic accelerator is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.3\times $ </tex-math></inline-formula> faster than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.6\times $ </tex-math></inline-formula> faster than a baseline hardware architecture, while the proposed SC-based architecture further reduces the critical path of the arithmetic accelerator by 87.9%.

Year	DOI	Venue
2022	10.1109/JIOT.2022.3163364	IEEE Internet of Things Journal
Keywords	DocType	Volume
Action evaluation,hardware acceleration,Internet of Things (IoT) systems,latency sensitive resource-constrained online operation,postdecision state learning,reinforcement learning,stochastic computing (SC),wireless communication	Journal	9
Issue	ISSN	Citations
12	2327-4662	0
PageRank	References	Authors
0.34	35	5

Authors (5 rows)

Cited by (0 rows)

References (35 rows)

Name	Order	Citations	PageRank
Jianchi Sun	1	0	0.68
Nikhilesh Sharma	2	5	2.42
Jacob Chakareski	3	532	58.87
Nicholas Mastronarde	4	240	26.93
Yingjie Lao	5	0	0.34

1