SyclParallelSTL: A Parallel STL library for Heterogeneous Systems (SYCL 2016 - - 1st SYCL Programming Workshop,)

Sat 12 - Wed 16 March 2016 Barcelona, Spain

Who

Antonio Vilches, Ruyman Reyes

Track

SYCL 2016

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 13 Mar 2016 11:30 - 11:50 at Menorca - Session 2

Abstract

From High-Performance-Computing (HPC) systems, where combinations of CPUs, GPUs and FPGAs are increasingly common, to the mobile market, with highly-integrated SoC including multiple accelerators in a single dice, heterogeneous architectures are becoming the de-facto standard. These heterogeneous architectures present new opportunities to improve overall application performance and reduce energy consumption. These can be achieved by assigning some tasks to the CPU while offloading other tasks to the accelerators. However, while these highly parallel architectures provide opportunities to achieve higher raw performance, they require the use of more complex libraries and programming models. Thus, software developers need an extra knowledge to develop and maintain the code due to specific low level details of these architectures.

In this paper, we present a high-level parallel extension of the current C++ STL library. It is an implementation of the C++ Extensions for Parallelism Technical Specification proposal (TS) to be included in the ISO C++17. We aim to provide an easy to use interface that allows developers to focus on their own applications and mitigate the need to focus on the underlying processor architecture. In order to allow C++ developers to use our implementation on different accelerators, we rely on SYCL [2] which allows us to build an abstraction layer for different accelerator specific. In addition, by using our Parallel STL implementation, applications can be transparently accelerated on different accelerators such as Intel Xeon Phi, GPUs or FPGAs. Our implementation captures current STL iterators and automatically allocates and frees the required device memory space and performs transparent data movement between the host and devices, so developers do no have to worry about the memory layout of each device. We evaluate the performance of our Parallel STL implementation and show performance gains over the sequential implementation while targeting CPUs and GPUs.

Antonio Vilches

Ruyman Reyes