EBSIS Summer School
on Distributed Event Based Systems
and Related Topics 2018

July 9—13, 2018 – Villars-sur-Ollon, Switzerland


★ Lecture Abstract

Queueing Models for Optimizing Performance of Deep Neural Network Serving

Evgenia Smirni (College of William and Mary)

Deep neural networks (DNNs) enable a host of artificial intelligence applications. These applications are supported by large DNN models running in serving mode often on a cloud computing infrastructure. Given the compute-intensive nature of large DNN models, a key challenge for DNN serving systems is to minimize user request response latencies. We show and model two important properties of DNN workloads that can allow for the use of queueing network models for predicting user request latencies: homogeneous request service demands and performance interference among requests running concurrently due to cache/memory contention. These properties motivate the design of a dynamic scheduling framework that is powered by an interference-aware queueing-based analytic model. The framework is evaluated in the context of an image classification service using several well known benchmarks. The results demonstrate its accurate latency prediction and its ability to adapt to changing load conditions, thanks to the fast deployment and accuracy of analytic queuing models. Special attention will be given in the development/adjustment of existing models to meet the needs of deep neural network serving.

Speaker Bio

Evgenia Smirni is the Sidney P. Chockley Professor of Computer Science at the College of William and Mary, Williamsburg, VA. She holds a diploma degree in computer science and informatics from the University of Patras, Greece (1987) and a Ph.D. in computer science from Vanderbilt University (1995). Her research interests include queuing networks, stochastic modeling, Markov chains, resource allocation policies, Internet and multi-tiered systems, storage systems, data centers and cloud computing, workload characterization, and models for performance prediction of of distributed systems and applications. She has served as a program co-chair of QEST'05, of ACM Sigmetrics/Performance'06, of HotMetrics'10, of ICPE'17, of DSN'17, and will serve as a program co-char for HPDC'19. She also served as general co-chair of QEST'10 and NSMC'10. She is an ACM Distinguished Scientist and a senior member of IEEE.