Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets or bins, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often choose to implement multisplit with a sort. However, sort does more work than necessary to implement multisplit, and is thus inefficient. In this work, we provide a parallel model and multiple implementations for the multisplit problem. Our principal focus is multisplit for a small number of buckets. In our implementations, we exploit the computational hierarchy of the GPU to perform most of the work locally, with minimal usage of global operations. We also use warp-synchronous programming models to avoid branch divergence and reduce memory usage, as well as hierarchical reordering of input elements to achieve better coalescing of global memory accesses. On an NVIDIA K40c GPU, for key-only (key-value) multisplit, we demonstrate a 3.0-6.7x (4.4-8.0x) speedup over radix sort, and achieve a peak throughput of 10.0 G keys/s.
Mon 14 MarDisplayed time zone: Belfast change
16:20 - 18:00 | GPUs and SchedulingMain conference at Mallorca+Menorca Chair(s): Christophe Dubach University of Edinburgh | ||
16:20 25mTalk | Gunrock: A High-Performance Graph Processing Library on the GPU Main conference Yangzihao Wang , Andrew Davidson University of California, Davis, Yuechao Pan University of California, Davis, Yuduo Wu University of California, Davis, Andy Riffel University of California, Davis, John D. Owens University of California, Davis Link to publication DOI | ||
16:45 25mTalk | GPU Multisplit Main conference Saman Ashkiani University of California, Davis, Andrew Davidson University of California, Davis, Ulrich Meyer Goethe-Universitat Frankfurt am Main, John D. Owens University of California, Davis Link to publication DOI | ||
17:10 25mTalk | Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data Stream Processing Main conference Link to publication DOI | ||
17:35 25mTalk | Work Stealing for Interactive Services to Meet Target Latency Main conference Jing Li Washington University in St. Louis, Kunal Agrawal Washington University in St. Louis, Sameh Elnikety Microsoft Research, Yuxiong He Microsoft Research, I-Ting Angelina Lee Washington University in St. Louis, Chenyang Lu Washington University in St. Louis, Kathryn S McKinley Microsoft Research Link to publication DOI |