WPMVP 2016 - - 3rd Workshop on Programming Models for SIMD/Vector Processing

SIMD processing is currently a main driver of performance in general purpose processor architectures besides multi-core technology. Both technologies increase the potential performance by factors, but have to be explicitly utilised by the software. To expose those different levels of parallelism in a productive and manageable way is still an active area of research. NVIDIA stirred the programming interface scene with the development of a simple yet efficient performance-oriented application programmer interface. OpenACC, OpenMP 4.0, OpenCL, Cilk+ and ispc are just examples for many choices available. Additionally, established optimising compilers still improve significantly in unleashing the SIMD potential. Notable developments on the hardware side include relaxation of alignment requirements and more powerful scatter/gather and shuffle instructions.

Accepted Papers

	Title
	An Evaluation of Current SIMD Programming Models for C++ WPMVP Angela Pohl
	A new SIMD iterative connected component labeling algorithm WPMVP Lionel Lacassagne
	Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application WPMVP Istvan Reguly
	Code Vectorization using Intel Array Notation WPMVP Olaf Krzikalla
	Compilers, Hands-Off My Hands-On Optimizations WPMVP Richard Veras
	Support for Data Parallelism in the CAL Actor Language WPMVP Essayas Gebrewahid

Call for Papers

Download slides

You may download the slides of all talks here:

https://sites.google.com/site/wpmvp2016/talk

Scope

The purpose of this workshop is to bring together practitioners and researchers from academia and industry to discuss issues, solutions, and opportunities in enabling application developers to effectively exploit SIMD/vector processing in modern processors. We seek submissions that cover all aspects of SIMD/vector processing. Topics of interests include, but are not restricted to:

Programming models for SIMD/vector processing
C/C++/Fortran extensions for SIMD (e.g., OpenMP, OpenACC, OpenCL, SIMD intrinsics)
New data parallel or streaming programming models for SIMD
Exploitation of SIMD/vector in Java, scripting languages, and domain-specific languages  
Compilers & tools to discover and optimize SIMD parallelism
Case study, experience report, and performance analysis of SIMD/vector applications

Submission

Submitted papers must be no more than 8 pages in length. Authors are encouraged to use the ACM two-column format here. Papers should be submitted in PDF format and should be legible when printed on a black-and-white printer. Each submission will receive at least three reviews from the technical program committee. Selected submissions will be invited to present at the workshop and be published in the workshop proceedings. Accepted papers will be published in the ACM digital library after the workshop. We also maintain a Google+ group to act as a community hub for the workshop.

We especially encourage students to submit papers. There will be a special PPoPP 2016 student travel grant award which you can apply for.

Authors must register and submit the paper through online submission system, if you have problems accessing the system, e-mail your submission to jan.eitzinger@fau.de .

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sun 13 Mar
Displayed time zone: Belfast change

09:00 - 10:30	Session 1WPMVP at Mallorca Chair(s): Jan Eitzinger University of Erlangen-Nuremberg, Germany

09:00 15m Talk		Opening Words WPMVP Jan Eitzinger University of Erlangen-Nuremberg, Germany
09:15 45m Talk		Keynote - AnyDSL: Building Domain-Specific Languages for Productivity and Performance WPMVP Sebastian Hack
10:00 30m Talk		A new SIMD iterative connected component labeling algorithm WPMVP Lionel Lacassagne University Paris 6

11:00 - 12:30	Session 2 - Programming ModelsWPMVP at Mallorca Chair(s): Joel Falcou LRI, Université Paris-Sud

11:00 30m Talk		Support for Data Parallelism in the CAL Actor Language WPMVP Essayas Gebrewahid Halmstad University
11:30 30m Talk		An Evaluation of Current SIMD Programming Models for C++ WPMVP Angela Pohl TU Berlin
12:00 30m Talk		Compilers, Hands-Off My Hands-On Optimizations WPMVP Richard Veras Carnegie Mellon University

14:00 - 15:30	Session 3WPMVP at Mallorca

14:00 45m Talk		Keynote - SIMD Vectorization Essentials: Learnings, Successes and Advances WPMVP Xinmin Tian Intel
14:45 45m Talk		Discussion Panel WPMVP

16:00 - 17:30	Session 4WPMVP at Mallorca Chair(s): Roland Leißa Saarland University

16:00 30m Talk		Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application WPMVP Istvan Reguly
16:30 30m Talk		Code Vectorization using Intel Array Notation WPMVP Olaf Krzikalla TU Dresden, Germany

AnyDSL: Building Domain-Specific Languages for Productivity and Performance

Sebastian Hack (Compiler Design Lab, Saarland University)

Abstract To achieve good performance, programmers have to carefully tune their application for the target architecture. Optimizing compilers fail to produce the “optimal” code because their hardware models are too coarse-grained. Even more, many important compiler optimizations are computationally hard even for simple cost models. It is unlikely that compilers will ever be able to produce high-performance code automatically for today’s and future machines.

Therefore, programmers often optimize their code manually. While manual optimization is often successful in achieving good performance, it is cumbersome, error-prone, and unportable. Creating and debugging dozens of variants of the same original code for different target platform is just an engineering nightmare.

An appealing solution to this problem are domain-specific languages (DSLs). A DSL offers language constructs that can express the abstractions used in the particular application domain. This way, programmers can write their code productively, on a high level of abstraction. Very often, DSL programs look similar to textbook algorithms. Domain and machine experts then provide efficient implementations of these abstractions. This way, DSLs enable the programmer to productively write portable and maintainable code that can be compiled to efficient implementations. However, writing a compiler for a DSL is a huge effort that people are often not willing to make. Therefore, DSLs are often embedded into existing languages to save some of the effort of writing a compiler.

In this talk, I will present the AnyDSL framework we have developed over the last three years. AnyDSL provides the core language Impala that can serve as a starting point for almost “any” DSL. New DSL constructs can be embedded into Impala in a shallow way, that is just by implementing the functionality as a (potentially higher-order) function. AnyDSL uses online partial evaluation remove the overhead of the embedding entirely .

To demonstrate the effectiveness of our approach, we generated code from generic, high-level text-book image-processing algorithms that has, on each and every hardware platform tested (Nvidia/AMD/Intel GPUs, SIMD CPUs), beaten the industry standard benchmark (OpenCV) by 10-35% (!), a standard that has been carefully hand-optimized for each architecture over many years. Furthermore, the implementation in Impala has one order of magnitude less lines of code than a corresponding hand-tuned expert code. We also obtained similar first results in other domains.

SIMD Vectorization Essentials: Learnings, Successes and Advances

Xinmin Tian (Intel Corporation)

Abstract SIMD Vectorization has received significant attention in the past decade as one of the most important methods to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel SSE, AVX, IBM AltiVec and ARM Neon. However, the recent proliferation of modern SIMD architectures poses new constraints such as control flow divergence , memory access divergence, data alignment, mixed data type, and wider fixed-length nature of SIMD vectors, that demand advanced SIMD vectorization compiler technologies and SIMD vectorization friendly language extensions. In this talk, we take a look back on what we have learned in the past decades, and what we have achieved on the path of successful SIMD vectorization for exploiting effective SIMD parallelism in real large applications in the past few years at Intel. We share Intel’s vision on explicit SIMD programming model and compiler technology evolution for SIMD vectorization.

- 3rd Workshop on Programming Models for SIMD/Vector ProcessingWPMVP 2016

Accepted Papers

Call for Papers

Program Display Configuration

Sun 13 MarDisplayed time zone: Belfast change

Keynotes

Jan Eitzinger

University of Erlangen-Nuremberg, Germany

Germany

James Brodman

Intel, USA

Joel Falcou

LRI, Université Paris-Sud

Ilie Gabriel Tanase

IBM Research

James Brodman

Intel, USA

Jan Eitzinger

University of Erlangen-Nuremberg, Germany

Germany

Daniel Etiemble

University Paris 11

Joel Falcou

LRI, Université Paris-Sud

Franz Franchetti

Carnegie Mellon University

Sebastian Hack

Paul H J Kelly

Imperial College London

Lionel Lacassagne

University Paris 6

Sebastian Pop

Samsung Research America

P. Sadayappan

Ohio State University

Ilie Gabriel Tanase

IBM Research

Xinmin Tian

Intel

Jessica Tseng

IBM Research, USA

Haichuan Wang

Huawei R&D USA

China

Pablo de Oliveira Castro

University of Versailles, France

Sun 13 Mar
Displayed time zone: Belfast change