Representations of real-world phenomena as graphs are ubiquitous, ranging from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks — including clustering, anomaly detection, nearest neighbor, similarity search, pattern recognition, and transfer learning — require a distance measure between graphs to be computed efficiently. The existing distance measures between graphs leave a lot to be desired. They are overwhelmingly based on heuristics. Many do not scale to graphs with millions of nodes; others do not satisfy the metric properties of non-negativity, positive definiteness, symmetry, and triangle inequality. This project studies a formal mathematical foundation covering a family of graph distances that overcome these limitations, focusing on real-world applications in biology and social network analysis. It also provides a universal methodology for parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes, and scaling it over cloud computing resources.
Funding: National Science Foundation, Google Cloud Services (IIS-1741197).