PPoPP 2016
Sat 12 - Wed 16 March 2016 Barcelona, Spain

SIMD processing is currently a main driver of performance in general purpose processor architectures besides multi-core technology. Both technologies increase the potential performance by factors, but have to be explicitly utilised by the software. To expose those different levels of parallelism in a productive and manageable way is still an active area of research. NVIDIA stirred the programming interface scene with the development of a simple yet efficient performance-oriented application programmer interface. OpenACC, OpenMP 4.0, OpenCL, Cilk+ and ispc are just examples for many choices available. Additionally, established optimising compilers still improve significantly in unleashing the SIMD potential. Notable developments on the hardware side include relaxation of alignment requirements and more powerful scatter/gather and shuffle instructions.

Accepted Papers

Title
An Evaluation of Current SIMD Programming Models for C++
WPMVP
A new SIMD iterative connected component labeling algorithm
WPMVP
Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application
WPMVP
Code Vectorization using Intel Array Notation
WPMVP
Compilers, Hands-Off My Hands-On Optimizations
WPMVP
Support for Data Parallelism in the CAL Actor Language
WPMVP

Call for Papers

Download slides

You may download the slides of all talks here:

https://sites.google.com/site/wpmvp2016/talk

Scope

The purpose of this workshop is to bring together practitioners and researchers from academia and industry to discuss issues, solutions, and opportunities in enabling application developers to effectively exploit SIMD/vector processing in modern processors. We seek submissions that cover all aspects of SIMD/vector processing. Topics of interests include, but are not restricted to:

  • Programming models for SIMD/vector processing
  • C/C++/Fortran extensions for SIMD (e.g., OpenMP, OpenACC, OpenCL, SIMD intrinsics)
  • New data parallel or streaming programming models for SIMD
  • Exploitation of SIMD/vector in Java, scripting languages, and domain-specific languages 

  • Compilers & tools to discover and optimize SIMD parallelism
  • Case study, experience report, and performance analysis of SIMD/vector applications

Submission

Submitted papers must be no more than 8 pages in length. Authors are encouraged to use the ACM two-column format here. Papers should be submitted in PDF format and should be legible when printed on a black-and-white printer. Each submission will receive at least three reviews from the technical program committee. Selected submissions will be invited to present at the workshop and be published in the workshop proceedings. Accepted papers will be published in the ACM digital library after the workshop. We also maintain a Google+ group to act as a community hub for the workshop.

We especially encourage students to submit papers. There will be a special PPoPP 2016 student travel grant award which you can apply for.

Authors must register and submit the paper through online submission system, if you have problems accessing the system, e-mail your submission to jan.eitzinger@fau.de .

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sun 13 Mar

Displayed time zone: Belfast change

09:00 - 10:30
Session 1WPMVP at Mallorca
Chair(s): Jan Eitzinger University of Erlangen-Nuremberg, Germany
09:00
15m
Talk
Opening Words
WPMVP
Jan Eitzinger University of Erlangen-Nuremberg, Germany
09:15
45m
Talk
Keynote - AnyDSL: Building Domain-Specific Languages for Productivity and Performance
WPMVP
10:00
30m
Talk
A new SIMD iterative connected component labeling algorithm
WPMVP
Lionel Lacassagne University Paris 6
11:00 - 12:30
Session 2 - Programming ModelsWPMVP at Mallorca
Chair(s): Joel Falcou LRI, Université Paris-Sud
11:00
30m
Talk
Support for Data Parallelism in the CAL Actor Language
WPMVP
Essayas Gebrewahid Halmstad University
11:30
30m
Talk
An Evaluation of Current SIMD Programming Models for C++
WPMVP
Angela Pohl TU Berlin
12:00
30m
Talk
Compilers, Hands-Off My Hands-On Optimizations
WPMVP
Richard Veras Carnegie Mellon University

AnyDSL: Building Domain-Specific Languages for Productivity and Performance

Sebastian Hack (Compiler Design Lab, Saarland University)

Abstract To achieve good performance, programmers have to carefully tune their application for the target architecture. Optimizing compilers fail to produce the “optimal” code because their hardware models are too coarse-grained. Even more, many important compiler optimizations are computationally hard even for simple cost models. It is unlikely that compilers will ever be able to produce high-performance code automatically for today’s and future machines.

Therefore, programmers often optimize their code manually. While manual optimization is often successful in achieving good performance, it is cumbersome, error-prone, and unportable. Creating and debugging dozens of variants of the same original code for different target platform is just an engineering nightmare.

An appealing solution to this problem are domain-specific languages (DSLs). A DSL offers language constructs that can express the abstractions used in the particular application domain. This way, programmers can write their code productively, on a high level of abstraction. Very often, DSL programs look similar to textbook algorithms. Domain and machine experts then provide efficient implementations of these abstractions. This way, DSLs enable the programmer to productively write portable and maintainable code that can be compiled to efficient implementations. However, writing a compiler for a DSL is a huge effort that people are often not willing to make. Therefore, DSLs are often embedded into existing languages to save some of the effort of writing a compiler.

In this talk, I will present the AnyDSL framework we have developed over the last three years. AnyDSL provides the core language Impala that can serve as a starting point for almost “any” DSL. New DSL constructs can be embedded into Impala in a shallow way, that is just by implementing the functionality as a (potentially higher-order) function. AnyDSL uses online partial evaluation remove the overhead of the embedding entirely .

To demonstrate the effectiveness of our approach, we generated code from generic, high-level text-book image-processing algorithms that has, on each and every hardware platform tested (Nvidia/AMD/Intel GPUs, SIMD CPUs), beaten the industry standard benchmark (OpenCV) by 10-35% (!), a standard that has been carefully hand-optimized for each architecture over many years. Furthermore, the implementation in Impala has one order of magnitude less lines of code than a corresponding hand-tuned expert code. We also obtained similar first results in other domains.

SIMD Vectorization Essentials: Learnings, Successes and Advances

Xinmin Tian (Intel Corporation)

Abstract SIMD Vectorization has received significant attention in the past decade as one of the most important methods to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel SSE, AVX, IBM AltiVec and ARM Neon. However, the recent proliferation of modern SIMD architectures poses new constraints such as control flow divergence , memory access divergence, data alignment, mixed data type, and wider fixed-length nature of SIMD vectors, that demand advanced SIMD vectorization compiler technologies and SIMD vectorization friendly language extensions. In this talk, we take a look back on what we have learned in the past decades, and what we have achieved on the path of successful SIMD vectorization for exploiting effective SIMD parallelism in real large applications in the past few years at Intel. We share Intel’s vision on explicit SIMD programming model and compiler technology evolution for SIMD vectorization.