PPoPP 2016
Sat 12 - Wed 16 March 2016 Barcelona, Spain
Mon 14 Mar 2016 16:45 - 17:10 at Mallorca+Menorca - GPUs and Scheduling Chair(s): Christophe Dubach

Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets or bins, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often choose to implement multisplit with a sort. However, sort does more work than necessary to implement multisplit, and is thus inefficient. In this work, we provide a parallel model and multiple implementations for the multisplit problem. Our principal focus is multisplit for a small number of buckets. In our implementations, we exploit the computational hierarchy of the GPU to perform most of the work locally, with minimal usage of global operations. We also use warp-synchronous programming models to avoid branch divergence and reduce memory usage, as well as hierarchical reordering of input elements to achieve better coalescing of global memory accesses. On an NVIDIA K40c GPU, for key-only (key-value) multisplit, we demonstrate a 3.0-6.7x (4.4-8.0x) speedup over radix sort, and achieve a peak throughput of 10.0 G keys/s.

Mon 14 Mar

Displayed time zone: Belfast change

16:20 - 18:00
GPUs and SchedulingMain conference at Mallorca+Menorca
Chair(s): Christophe Dubach University of Edinburgh
16:20
25m
Talk
Gunrock: A High-Performance Graph Processing Library on the GPUDistinguished Paper AwardArtifact Evaluation
Main conference
Yangzihao Wang , Andrew Davidson University of California, Davis, Yuechao Pan University of California, Davis, Yuduo Wu University of California, Davis, Andy Riffel University of California, Davis, John D. Owens University of California, Davis
Link to publication DOI
16:45
25m
Talk
GPU Multisplit
Main conference
Saman Ashkiani University of California, Davis, Andrew Davidson University of California, Davis, Ulrich Meyer Goethe-Universitat Frankfurt am Main, John D. Owens University of California, Davis
Link to publication DOI
17:10
25m
Talk
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data Stream Processing Artifact Evaluation
Main conference
Tiziano De Matteis University of Pisa, Gabriele Mencagli University of Pisa
Link to publication DOI
17:35
25m
Talk
Work Stealing for Interactive Services to Meet Target Latency
Main conference
Jing Li Washington University in St. Louis, Kunal Agrawal Washington University in St. Louis, Sameh Elnikety Microsoft Research, Yuxiong He Microsoft Research, I-Ting Angelina Lee Washington University in St. Louis, Chenyang Lu Washington University in St. Louis, Kathryn S McKinley Microsoft Research
Link to publication DOI