PP4REE 2016 - - Workshop on Parallel Programming for Resilience and Energy Efficiency

About
Program

Nowadays, the number of components in High Performance Computing (HPC) systems increases at the pace dictated by Moore’s Law, but the mean time between failures (MTBF) for the complete system is significantly shrinking. For example, when accounting for the instruction & data caches and register files, the mean time between soft errors for the Sequoia supercomputer at Lawrence Livermore National Laboratory is estimated to be 1.5 days. As HPC systems move into the Exascale era, the number of system components will increase by up to three orders of magnitude, and MTBF will further deteriorate, thus promoting resilience into a fundamental challenge. This scenario renders current system solutions to resilience, such as coordinated checkpointing, unfeasible, and motivates the use of algorithmic, programming model, or runtime system approaches to improve the resilience of parallel applications at scale.

While a resilience crisis is looming in the HPC domain, the end of Dennard scaling (i.e., the ability to shrink the feature size of integrated circuits while maintaining a constant power density) has pushed energy consumption into a primary design principle, in par with performance, for which holistic solutions are currently necessary, from the hardware to the application software. The Green500 ranking, based on the LINPACK benchmark, shows remarkable improvements in the MFLOPS/W (millions of floating-point arithmetic operations per Joule) of recent HPC facilities. However, with the cost of 1 MW being close to $1 million, any improvement on this metric will surely have an enormous positive impact on the deployment of future Exascale systems. Despite a flurry of research in recent years on techniques that improve the energy-efficiency of HPC systems via software intervention, energy remains transparent to existing parallel programming models used in production settings.

The quest for higher energy-efficiency in future HPC systems is inherently connected to the quest for enhanced resilience for two reasons: First, resilience techniques have a non-trivial energy cost. Second, ongoing efforts to further improve the energy-efficiency of hardware at the device level (such as operating hardware below its nominal margins or replacing DDR technology with non-volatile memory technologies) may compromise hardware reliability.

Accepted Papers

	Title
	AcHEe: A Benchmark Suite at the Meeting Point of Heterogeneous and Approximate Computing PP4REE Panos Koutsovasilis, Christos Kalogirou, Christos Konstantas, Manolis Maroudas, Michalis Spyrou, Christos Antonopoulos
	Distributed Coordinated Checkpoints with Replication for Automatic Recovery PP4REE Jorge Villamayor, Dolores Rexachs, Emilio Luque
	Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner PP4REE Francesco Rizzi, Karla Morris, Khachik Sargsyan, Paul Mycek, Cosmin Safta, Olivier Le Maître, Omar Knio, Bert Debusschere
	On the Energy Costs of Fault Tolerance for Matrix Multiplication on Low-Power Multicore Architectures PP4REE Sandra Catalán, José R. Herrero, Enrique S. Quintana-Orti, Rafael Rodríguez-Sánchez, Tyler M. Smith, Robert van de Geijn

Call for Papers

The purpose of this workshop is to explore the space of techniques for improving the resilience and energy-efficiency (REE) of parallel programs at the algorithmic and language levels. We are particularly interested in papers that present cross-cutting techniques that trade energy-efficiency with resilience. We solicit original papers that include but are not limited to the following topics:

Programming languages, interfaces, and general software techniques for REE.
Scheduling and mapping for REE.
Run-times for REE.
Algorithmic techniques for REE.
Programming models for computing paradigms that improve REE, such as near-threshold computing, approximate computing, or neuromorphic computing.
Applications and cases studies of success.

Submissions

Papers should not exceed ten single-space double-column pages (including figures, tables and references) using a 10-point font on 8.5x11-inch pages. We suggest to use IEEE two-column template for conference proceedings. Submissions will be judged based on correctness, originality, technical strength, significance, presentation, quality and appropriateness. Submitted papers should not have appeared in or be under consideration for another venue. A full peer-process will be followed with each paper being reviewed by at least 3 members of the program committee. Submissions will be made through EasyChair.

Special Issue

Extended versions of best papers will appear, after an additional review process, in a special issue of Elsevier Parallel Computing journal.

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sat 12 Mar
Displayed time zone: Belfast change

09:00 - 10:30	Session 1PP4REE at Ibiza Chair(s): Christos Antonopoulos Department of Electrical and Computer Engineering, University of Thessaly, Greece

09:00 60m Talk		Keynote - Reliability and Energy-efficiency optimizations using Significance-Based Computing PP4REE Nikolaos Bellas University of Thessaly, Greece
10:00 30m Talk		Distributed Coordinated Checkpoints with Replication for Automatic Recovery PP4REE Jorge Villamayor Universidad Autónoma de Barcelona, Dolores Rexachs , Emilio Luque

11:00 - 12:30	Session 2PP4REE at Ibiza Chair(s): Enrique S. Quintana-Orti Department of Computer Engineering & Sciences, Universidad Jaume~I of Castellon, Spain

11:00 30m Talk		Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner PP4REE Francesco Rizzi , Karla Morris , Khachik Sargsyan , Paul Mycek , Cosmin Safta , Olivier Le Maître , Omar Knio , Bert Debusschere
11:30 30m Talk		AcHEe: A Benchmark Suite at the Meeting Point of Heterogeneous and Approximate Computing PP4REE Panos Koutsovasilis , Christos Kalogirou , Christos Konstantas , Manolis Maroudas , Michalis Spyrou , Christos Antonopoulos Department of Electrical and Computer Engineering, University of Thessaly, Greece
12:00 30m Talk		On the Energy Costs of Fault Tolerance for Matrix Multiplication on Low-Power Multicore Architectures PP4REE Sandra Catalán , José R. Herrero , Enrique S. Quintana-Orti , Rafael Rodríguez-Sánchez , Tyler M. Smith , Robert van de Geijn

- Workshop on Parallel Programming for Resilience and Energy EfficiencyPP4REE 2016

Accepted Papers

Call for Papers

Submissions

Special Issue

Sat 12 Mar
Displayed time zone: Belfast change

Christos Antonopoulos

Department of Electrical and Computer Engineering, University of Thessaly, Greece

Dimitrios Nikolopoulos

EEECS, at Queen's University of Belfast, Northern Ireland, United Kingdom

Oscar Plata

Department of Computer Architecture at the University of Malaga, Spain

Enrique S. Quintana-Orti

Department of Computer Engineering & Sciences, Universidad Jaume~I of Castellon, Spain

Costas Bekas

IBM Research, Zurich

Nikolaos Bellas

University of Thessaly, Greece

Bronis de Supinski

Lawrence Livermore National Laboratory

Rong Ge

Clemson University

Dong Li

University of California, Merced

Naoya Maruyama

AICS-RIKEN, Japan

Kathryn Mohror

Lawrence Livermore National Laboratory

Yves Robert

Ecole Normale Supérieure de Lyon

Michele Weiland

Tracks

Workshops

- Workshop on Parallel Programming for Resilience and Energy EfficiencyPP4REE 2016

Accepted Papers

Call for Papers

Submissions

Special Issue

Program Display Configuration

Sat 12 MarDisplayed time zone: Belfast change

Christos Antonopoulos

Department of Electrical and Computer Engineering, University of Thessaly, Greece

Dimitrios Nikolopoulos

EEECS, at Queen's University of Belfast, Northern Ireland, United Kingdom

Oscar Plata

Department of Computer Architecture at the University of Malaga, Spain

Enrique S. Quintana-Orti

Department of Computer Engineering & Sciences, Universidad Jaume~I of Castellon, Spain

Costas Bekas

IBM Research, Zurich

Nikolaos Bellas

University of Thessaly, Greece

Bronis de Supinski

Lawrence Livermore National Laboratory

Rong Ge

Clemson University

Dong Li

University of California, Merced

Naoya Maruyama

AICS-RIKEN, Japan

Kathryn Mohror

Lawrence Livermore National Laboratory

Yves Robert

Ecole Normale Supérieure de Lyon

Michele Weiland

Sat 12 Mar
Displayed time zone: Belfast change