Zhisheng Zhao

Contact information:


I am a 4th year PhD student in Operations Research in the H. Milton Stewart School of Industrial & Systems Engineering (ISyE) at Georgia Tech, advised by Dr. Debankur Mukherjee. Before joining this PhD program, I finished my master degree in Statistics at Georgia Tech and my bachelor degree in Mathematics and Applied Mathematics at Shanghai University.  Here is my complete CV.


Broadly, my research interests include Applied Probability, Queueing Theory, and Optimization. Specifically, I focus on designing efficient Load Balancing algorithms with strong theoretical guarantees for large-scale service systems. Here is the dissertation proposal slides.

Zhisheng Zhao, Debankur Mukherjee

ACM SIGMETRICS Performance Evaluation Review (MAMA workshop) 2023

We present an analysis of large-scale load balancing systems, where the processing time distribution of tasks depends on both the task and server types. Our study focuses on the asymptotic regime, where the number of servers and task types tend to infinity in proportion. In heterogeneous environments, commonly used load balancing policies such as Join Fastest Idle Queue and Join Fastest Shortest Queue exhibit poor performance and even shrink the stability region. Interestingly, prior to this work, finding a scalable policy with a provable performance guarantee in this setup remained an open question.

To address this gap, we propose and analyze two asymptotically delay-optimal dynamic load balancing policies. The first policy efficiently reserves the processing capacity of each server for ``good" tasks and routes tasks using the vanilla Join Idle Queue policy. The second policy, called the speed-priority policy, significantly increases the likelihood of assigning tasks to the respective ``good" servers capable of processing them at high speeds. By leveraging a framework inspired by the graphon literature and employing the mean-field method and stochastic coupling arguments, we demonstrate that both policies achieve asymptotic zero queuing. Specifically, as the system scales, the probability of a typical task being assigned to an idle server approaches 1.

Zhisheng Zhao, Debankur Mukherjee, Ruoyu Wu 

 Stochastic Systems 2024

We consider load balancing in large-scale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among d≥2 randomly selected servers obeying the above constraints.  Server processing speeds are heterogeneous and they depend on the server-type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process-level and in steady state, as the system size becomes large.  Using such a characterization, we show that data locality constraints can be used to significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results.

Zhisheng Zhao, Sayan Banerjee, Debankur Mukherjee 

Mathematics of Operations Research, minor revision

The Join-the-Shortest Queue (JSQ) policy is a classical benchmark for the performance of many-server queueing systems due to its strong optimality properties. While the exact analysis of the JSQ policy is an open question to date, even under Markovian assumption on the service requirements. In this paper, we analyze the many-server limits of the JSQ policy in the super-Halfin-Whitt scaling window. We establish that the centered and scaled total queue length process converges to a certain Bessel process with negative drift and the associated centered and scaled steady-state total queue length, indexed by N, converges to a Gamma distribution. Both the transient and steady-state limit laws are universal in the sense that they do not depend on the value of the scaling parameter and exhibit fundamentally different qualitative behavior from both the Halfin-Whitt regime and the Non-degenerate Slowdown (NDS) regime. 

Industrial Experience

Fellowship and Awards