Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics

PI : Qin Zhang
NSF IIS-1633215 (September 2016 - August 2020)


A number of phenomena of societal importance, such as the spread of diseases and contagion processes, can be modeled by stochastic processes on networks. The analysis and control of such network phenomena involve, at their heart, fundamental graph-theoretic problems. The graphs encountered are typically of large-scale (having tens of millions of nodes); further, typical experimental analyses involve large designs with a number of parameters, leading to hundreds of thousands of graph computations. Novel methods for solving these problems are needed, since fast response times are critical to effective decision making.

The overarching goal of this project is to develop efficient distributed algorithms and associated lower bounds for graph-theoretic problems that arise in computational epidemiology and contagion dynamics. This will have a significant impact on these specific applications, through more efficient algorithmic tools for enabling complex analyses.

This is a collaborative project with Prof. Gopal Pandurangan (University of Houston) and Prof. Anil Vullikanti (Virginia Tech).


  1. Distributed Statistical Estimation of Matrix Products with Applications [conf. talk]
    with D. P. Woodruff
    Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 18), pages 383-394. Houston, TX, U.S.A., June 2018.

  2. Communication-Efficient Distributed Skyline Computation
    with H. Zhang
    Proc. ACM International Conference on Information and Knowledge Management (CIKM 17), pages 437-446. Singapore, November 2017.

  3. Distributed Partial Clustering (preliminary full version, 20 pages) [conf. talk]
    with S. Guha and Y. Li
    Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 17), pages 143-152. Washington D.C., U.S.A., July 2017.
    Best Paper Award
    Invited to special issue for SPAA 2017 papers in ACM Transactions on Parallel Computing (TOPC)

    • We give the first systematic study of communication/round-efficient distributed partial clustering, by providing almost tight bounds for partial k-center, k-median, and k-means in both deterministic and stochastic settings.
    • As a byproduct, we develop the first algorithms for the partial k-median and k-means objectives that run in subquadratic time.

  4. Communication-Optimal Distributed Clustering
    with J. Chen, H. Sun and D. Woodruff
    Proc. Annual Conference on Neural Information Processing Systems (NIPS 16), pages 3720-3728. Barcelona, Spain, December 2016.

Invited/Conference Talks

Educational and Other Development

A Trilogy of Courses: Code libraries for streaming/distributed algorithms: