PPoPP 2016
Sat 12 - Wed 16 March 2016 Barcelona, Spain
Sun 13 Mar 2016 09:15 - 10:00 at Mallorca - Session 1 Chair(s): Jan Eitzinger

To achieve good performance, programmers have to carefully tune their application for the target architecture. Optimizing compilers fail to produce the “optimal” code because their hardware models are too coarse-grained. Even more, many important compiler optimizations are computationally hard even for simple cost models. It is unlikely that compilers will ever be able to produce high-performance code automatically for today’s and future machines.

Therefore, programmers often optimize their code manually. While manual optimization is often successful in achieving good performance, it is cumbersome, error-prone, and unportable. Creating and debugging dozens of variants of the same original code for different target platform is just an engineering nightmare.

An appealing solution to this problem are domain-specific languages (DSLs). A DSL offers language constructs that can express the abstractions used in the particular application domain. This way, programmers can write their code productively, on a high level of abstraction. Very often, DSL programs look similar to textbook algorithms. Domain and machine experts then provide efficient implementations of these abstractions. This way, DSLs enable the programmer to productively write portable and maintainable code that can be compiled to efficient implementations. However, writing a compiler for a DSL is a huge effort that people are often not willing to make. Therefore, DSLs are often embedded into existing languages to save some of the effort of writing a compiler.

In this talk, I will present the AnyDSL framework we have developed over the last three years. AnyDSL provides the core language Impala that can serve as a starting point for almost “any” DSL. New DSL constructs can be embedded into Impala in a shallow way, that is just by implementing the functionality as a (potentially higher-order) function. AnyDSL uses online partial evaluation remove the overhead of the embedding entirely .

To demonstrate the effectiveness of our approach, we generated code from generic, high-level text-book image-processing algorithms that has, on each and every hardware platform tested (Nvidia/AMD/Intel GPUs, SIMD CPUs), beaten the industry standard benchmark (OpenCV) by 10-35% (!), a standard that has been carefully hand-optimized for each architecture over many years. Furthermore, the implementation in Impala has one order of magnitude less lines of code than a corresponding hand-tuned expert code. We also obtained similar first results in other domains.

Sun 13 Mar

Displayed time zone: Belfast change

09:00 - 10:30
Session 1WPMVP at Mallorca
Chair(s): Jan Eitzinger University of Erlangen-Nuremberg, Germany
09:00
15m
Talk
Opening Words
WPMVP
Jan Eitzinger University of Erlangen-Nuremberg, Germany
09:15
45m
Talk
Keynote - AnyDSL: Building Domain-Specific Languages for Productivity and Performance
WPMVP
10:00
30m
Talk
A new SIMD iterative connected component labeling algorithm
WPMVP
Lionel Lacassagne University Paris 6