BIGDATA:Collaborative Research:F:Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics


PI : Qin Zhang
NSF IIS-1633215 (September 2016 - August 2020)


Abstract

A number of phenomena of societal importance, such as the spread of diseases and contagion processes, can be modeled by stochastic processes on networks. The analysis and control of such network phenomena involve, at their heart, fundamental graph-theoretic problems. The graphs encountered are typically of large-scale (having tens of millions of nodes); further, typical experimental analyses involve large designs with a number of parameters, leading to hundreds of thousands of graph computations. Novel methods for solving these problems are needed, since fast response times are critical to effective decision making.

The overarching goal of this project is to develop efficient distributed algorithms and associated lower bounds for graph-theoretic problems that arise in computational epidemiology and contagion dynamics. This will have a significant impact on these specific applications, through more efficient algorithmic tools for enabling complex analyses.

This is a collaborative project with Prof. Gopal Pandurangan (University of Houston) and Prof. Anil Vullikanti (Virginia Tech).


Papers

  1. Learning to Cluster via Same-Cluster Queries
    with Y. Li and Y. Song
    Proc. ACM International Conference on Information and Knowledge Management (CIKM 21). Virtual conference, November 2021.

  2. Communication Complexity of Approximate Matching in Distributed Graphs
    with Z. Huang, B. Radunovic and M. Vojnovic
    Distributed Computing 33(6): 515-531, 2020.

  3. A Multi-criteria Approximation Algorithm for Influence Maximization with Probabilistic Guarantees.
    with M. Khan, G. Pandurangan, N. D. Pham, and A. Vullikanti
    Proc. SIAM Symposium on Algorithm Engineering and Experiments (ALENEX 20), pages 81-93. Salt Lake City, UT, U.S.A., January 2020.

  4. Distributed and Streaming Linear Programming in Low Dimensions
    with S. Assadi and N. Karpov
    Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 19), pages 236-253. Amsterdam, The Netherlands, June-July, 2019.
    Invited to the special issue for PODS 2019 in ACM Transactions on Database Systems (TODS).

  5. A Practical Algorithm for Distributed Clustering and Outlier Detection [poster]
    with J. Chen and E. Sadeqi-Azer
    Proc. Annual Conference on Neural Information Processing Systems (NeurIPS 18), pages 2253-2262. Montreal, Canada, December 2018.

  6. Distributed Statistical Estimation of Matrix Products with Applications [conf. talk]
    with D. P. Woodruff
    Proc. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 18), pages 383-394. Houston, TX, U.S.A., June 2018.

  7. Communication-Efficient Distributed Skyline Computation [conf. talk]
    with H. Zhang
    Proc. ACM International Conference on Information and Knowledge Management (CIKM 17), pages 437-446. Singapore, November 2017.

  8. Distributed Partial Clustering (preliminary full version, 20 pages) [conf. talk]
    with S. Guha and Y. Li
    Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 17), pages 143-152. Washington D.C., U.S.A., July 2017.
    Best Paper Award
    Invited to special issue for SPAA 2017 papers in ACM Transactions on Parallel Computing (TOPC)

    • We give the first systematic study of communication/round-efficient distributed partial clustering, by providing almost tight bounds for partial k-center, k-median, and k-means in both deterministic and stochastic settings.
    • As a byproduct, we develop the first algorithms for the partial k-median and k-means objectives that run in subquadratic time.

  9. Communication-Optimal Distributed Clustering [talk] [poster]
    with J. Chen, H. Sun and D. Woodruff
    Proc. Annual Conference on Neural Information Processing Systems (NIPS 16), pages 3720-3728. Barcelona, Spain, December 2016.

Educational and Other Development

A Trilogy of Courses: Code libraries for streaming/distributed algorithms:

Personnel


Acknowledgement

This project is supported by the National Science Foundation (NSF) under the project: BIGDATA: Collaborative Research: F: Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics. Any opinions, findings, and conclusions or recommendations expressed in this project are those of author(s) and do not necessarily reflect the views of the National Science Foundation.