7th Workshop on Challenges for Parallel Computing - Citegraph

Paper Info

Title
7th Workshop on Challenges for Parallel Computing

Abstract
Over the last decade, interest in parallel programming has grown tremendously. Hardware systems that contain many different levels of parallelism have become mainstream. At one end of the spectrum, computer system that contain many processing cores, each capable of running multiple hardware threads, are becoming commonplace. It is common to find laptop and desktop systems that contain a small number (2--8) of these Shared-Memory Processor (SMP) chips. Furthermore, high-end computing systems can now contain hundreds of these SMP chips, resulting in machines capable of running more than 1000 hardware threads simultaneously. As processor speed begins to stagnate, software developers are being forced to exploit the parallelism that is available in these systems in order to improve the performance of their applications. At the other end of the spectrum, as commodity hardware prices fall, it is becoming increasingly affordable to build large-scale multi-node distributed machines. A survey of the top 10 supercomputers in the world (www.top500.org) shows that the systems contain an average of about 400,000 cores running at an average frequency of 2.5GHz. Since the average clock frequency of these machines is fairly low, the full potential of these systems must be exploited through efficient use of the parallelism provided by the thousands of processors they contain. New types of heterogeneous parallel computing system have begun to emerge. These systems contain multiple types of processors - typically a powerful CPU core and some type of Graphics Processing Unit (GPU) or hardware accelerator unit. This type of heterogeneous system provides a new set of challenges for software developers in terms of how to distribute work among the different units to maximize their utilization, based on their capabilities. Many advances in heterogeneous system design are still being realized as people continue to explore how to combine existing hardware in new and novel ways. Others still are exploring new advances in hardware design that can further increase the breadth of combinations that can be used to create heterogeneous systems. One striking example is the use of Field Programmable Gate Arrays (FPGAs) as re-configurable specialized processing units that can be included in a heterogeneous system to perform specialized work on demand. Of course, all of these advances in the construction of large parallel machines, whether single-node SMPs or large distributed clusters, are done with the intention of providing more performance for the applications that are designed to run on these systems. Thus, it is imperative that we provide software developers the means to exploit these systems. Programming models and languages are instrumental in allowing software developers to efficiently develop parallel applications with suitable performance. Unfortunately, the perfect programming language to deal with the different types of parallel systems has yet to be found. Existing languages and models, such as OpenMP and Message Passing Interface (MPI) are well established within the communities. However, it remains unclear whether they alone can provide solutions for all different types of parallel systems that are available. Similarly, the Partitioned Global Address Space (PGAS) programming model is gaining more traction within the community, as it provides a paradigm that allows development of parallel software and provides performance that is becoming increasingly competitive with existing paradigms. Other emerging languages, such as OpenCL, provide a means to effectively develop applications for heterogeneous systems. The performance of parallel applications relies heavily on the underlying synchronization primitives used for concurrency control. Thus, it is necessary to study the performance implications of synchronization primitives. Programming scalable, massively-parallel applications using fine-grained locking is a very challenging problem requiring significant expertise. Transactional Memory (TM) is emerging as a promising an alternative to the traditional lock-based synchronization. Transactional programming is easier for programmers because a lot of the burden of concurrency control is handled by the underlying system. This will become increasingly important, as the productivity of software developers continues to be stressed. Compilers play a significant role in transforming and optimizing code for parallel execution. Most mainstream compilers offer some level of automatic parallelization but there is still a long way to go. A good understanding of the hardware, especially in heterogeneous systems, is essential in order for the compiler and runtime systems to leverage new hardware features. Tools that assist in the development, debugging, and analysis of parallel software is also of utmost importance. Debugging large parallel applications is a formidable task. Most debuggers offer minimal support for debugging parallel applications and are lagging in keeping up with new parallel programming models and paradigms. A recent challenge for a parallel debugger lies in its scalability. It is essential for a debugger to have a small memory footprint to allow the debugger and the application to scale well. Organizing the large amount of debug information and presenting it to the programmer is a daunting task. Occupying a large system for several hours for interactive debugging is a rare luxury and is often not feasible. Therefore, it is crucial for the debugger to gather meaningful information and process it so as to best assist the programmer in diagnosing the problem. The testing of parallel applications to ensure correct behaviour is an equally hard problem. The non-determinism inherent in parallel applications makes reliable and reproducible testing extremely difficult, if not impossible. New techniques and tools must be developed to assist with this as well. Furthermore, there is an urgent need for more tools and infrastructure for performance tuning and profiling of parallel applications. A study of parallel applications is crucial in order to understand the inherent parallelism available in the program. Such a study reveals important performance characteristics and possible performance gains and scalability of the application. Information on the nature of parallelism (whether structured or unstructured) can be very useful in extending existing programming models. A classic example is the introduction of the Task construct in OpenMP3.0 to exploit unstructured, task parallelism. This workshop brought together different groups from the parallel community (application developers, language developers, compiler and tools developers, and academic researchers) to further explore the current challenges in parallel computing and to present ideas on how to deal with these challenges. Topics discussed in the workshop included: • Parallel architectures • Parallel programming models & languages • Concurrency control mechanisms • Compiler, runtime, debugger, tools and infrastructure for parallel computing • Parallel applications (scientific and non-scientific) • Parallel performance evaluation • New trends in parallel computing

Year	Venue	Keywords
2012	CASCON	parallel software,heterogeneous parallel computing system,parallel application,large parallel application,parallel system,heterogeneous system,new parallel programming model,parallel computing,Parallel Computing,large parallel machine,software developer
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Priya Unnikrishnan	1	200	14.67
Kit Barton	2	4	3.80
Tom Chen	3	0	0.34

1