Work Stealing for Interactive Services to Meet Target Latency
Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests.
We design a new adaptive work stealing policy, called tail-control, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.
Mon 14 MarDisplayed time zone: Belfast change
16:20 - 18:00 | GPUs and SchedulingMain conference at Mallorca+Menorca Chair(s): Christophe Dubach University of Edinburgh | ||
16:20 25mTalk | Gunrock: A High-Performance Graph Processing Library on the GPU Main conference Yangzihao Wang , Andrew Davidson University of California, Davis, Yuechao Pan University of California, Davis, Yuduo Wu University of California, Davis, Andy Riffel University of California, Davis, John D. Owens University of California, Davis Link to publication DOI | ||
16:45 25mTalk | GPU Multisplit Main conference Saman Ashkiani University of California, Davis, Andrew Davidson University of California, Davis, Ulrich Meyer Goethe-Universitat Frankfurt am Main, John D. Owens University of California, Davis Link to publication DOI | ||
17:10 25mTalk | Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data Stream Processing Main conference Link to publication DOI | ||
17:35 25mTalk | Work Stealing for Interactive Services to Meet Target Latency Main conference Jing Li Washington University in St. Louis, Kunal Agrawal Washington University in St. Louis, Sameh Elnikety Microsoft Research, Yuxiong He Microsoft Research, I-Ting Angelina Lee Washington University in St. Louis, Chenyang Lu Washington University in St. Louis, Kathryn S McKinley Microsoft Research Link to publication DOI |