Work Stealing for Interactive Services to Meet Target Latency (PPoPP 2016 - Main conference)

Who

Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee, Chenyang Lu, Kathryn S McKinley

Track

PPoPP 2016 Main conference

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 14 Mar 2016 17:35 - 18:00 at Mallorca+Menorca - GPUs and Scheduling Chair(s): Christophe Dubach

Abstract

Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests.

We design a new adaptive work stealing policy, called tail-control, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services.

Link to Publication

http://dl.acm.org/citation.cfm?id=2851151&CFID=752402131&CFTOKEN=70128886

DOI

https://doi.org/10.1145/2851141.2851151

Jing Li

Washington University in St. Louis

Kunal Agrawal

Washington University in St. Louis

Sameh Elnikety

Microsoft Research

Yuxiong He

Microsoft Research

I-Ting Angelina Lee

Washington University in St. Louis

Chenyang Lu

Washington University in St. Louis

Kathryn S McKinley

Microsoft Research

Time Zone

The program is currently displayed in (GMT) Belfast.

Use conference time zone: (GMT) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 14 Mar
Displayed time zone: Belfast change

16:20 - 18:00	GPUs and SchedulingMain conference at Mallorca+Menorca Chair(s): Christophe Dubach University of Edinburgh

16:20 25m Talk		Gunrock: A High-Performance Graph Processing Library on the GPU Main conference Yangzihao Wang , Andrew Davidson University of California, Davis, Yuechao Pan University of California, Davis, Yuduo Wu University of California, Davis, Andy Riffel University of California, Davis, John D. Owens University of California, Davis Link to publication DOI
16:45 25m Talk		GPU Multisplit Main conference Saman Ashkiani University of California, Davis, Andrew Davidson University of California, Davis, Ulrich Meyer Goethe-Universitat Frankfurt am Main, John D. Owens University of California, Davis Link to publication DOI
17:10 25m Talk		Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data Stream Processing Main conference Tiziano De Matteis University of Pisa, Gabriele Mencagli University of Pisa Link to publication DOI
17:35 25m Talk		Work Stealing for Interactive Services to Meet Target Latency Main conference Jing Li Washington University in St. Louis, Kunal Agrawal Washington University in St. Louis, Sameh Elnikety Microsoft Research, Yuxiong He Microsoft Research, I-Ting Angelina Lee Washington University in St. Louis, Chenyang Lu Washington University in St. Louis, Kathryn S McKinley Microsoft Research Link to publication DOI