Title
A clustered manycore processor architecture for embedded and accelerated applications
Abstract
The Kalray MPPA-256 processor integrates 256 user cores and 32 system cores on a chip with 28nm CMOS technology. Each core implements a 32-bit 5-issue VLIW architecture. These cores are distributed across 16 compute clusters of 16+1 cores, and 4 quad-core I/O subsystems. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured by data and control Networks-On-Chip (NoC). The MPPA-256 processor is also fitted with a variety of I/O controllers, in particular DDR, PCI, Ethernet, Interlaken and GPIO. We demonstrate that the MPPA-256 processor clustered manycore architecture is effective on two different classes of applications: embedded computing, with the implementation of a professional H.264 video encoder that runs in real-time at low power; and high-performance computing, with the acceleration of a financial option pricing application. In the first case, a cyclostatic dataflow programming environment is utilized, that automates application distribution over the execution resources. In the second case, an explicit parallel programming model based on POSIX processes, threads, and NoC-specific IPC is used.
Year
DOI
Venue
2013
10.1109/HPEC.2013.6670342
High Performance Extreme Computing Conference
Keywords
DocType
ISSN
CMOS integrated circuits,data flow computing,multiprocessing systems,network-on-chip,parallel architectures,CMOS technology,I/O subsystems,Kalray MPPA-256 processor,NoC,POSIX processes,VLIW architecture,clustered manycore processor architecture,compute cluster,cyclostatic dataflow programming environment,embedded computing,explicit parallel programming model,financial option pricing application,high-performance computing,networks-on-chip,professional H.264 video encoder
Conference
2377-6943
ISBN
Citations 
PageRank 
978-1-4799-1364-0
60
1.89
References 
Authors
7
12