Tristan Brugère
Zhengchao Wan
Yusu Wang
The 35th International Conference on Algorithmic Learning Theory (2024)
(labeled) Graphs are everywhere
Discrete or continuous features
We want to
\[\text{Loss}(G) = \text{\large\textbf{?}}\]
\[\text{Loss}(G) = d_{\text{graphs}}(G, G_{\text{target}})\]
\[d_{\text{graphs}} = \text{\large\textbf{?}}\]
And we want this loss to be differentiable (for gradient descent).
No Polynomial algorithm for graph isomorphism
Graphs are combinatorial objects (how do we differentiate on the space of graphs?)
Fused Gromov Wassertein distance (FGW):
See graphs as metric spaces
Compute a matching of nodes minimizing:
Markov chain
Optimal Transport (Wasserstein’s distance, Earth mover’s distance…)
\[\text{(labeled) Graph } \simeq \text{ (labeled) Markov Chain}\]
Regular
Markov
Pointset law \((p_X, p_X)\)
Markov Chains \((\mathcal{X}, \mathcal{Y})\)
\(C(X, Y)\) between points
\(C(X_i, Y_i)\) between states
Take \(p\) distribution over \(\N\) and \(T \sim p\) independent of \(X\) and \(Y\)
\[d_{OT}(p_X, p_Y; C) = \inf_{\substack{(X, Y) \text{ coupling} \\ X\sim p_X \\ Y\sim p_Y}} \E\, C(X,Y)\]
\[d_{???}(\mathcal{X}, \mathcal{Y}; C) = \inf_{\substack{(X_t, Y_t) {\color{#D462AD}\text{ Markovian coupling } }\\ X_t\sim \mathcal{X} \\ Y_t \sim \mathcal{Y}}} \E\, C(X_{\color{#D462AD} t},Y_{\color{#D462AD} t})\]
\[d_{\color{#6E963B}OTM}(\mathcal{X}, \mathcal{Y}; C) = \inf_{\substack{(X_t, Y_t) {\color{#D462AD}\text{ Markovian coupling } }\\ X_t\sim \mathcal{X} \\ Y_t \sim \mathcal{Y}}} \E\, C(X_{\color{#6E963B} T},Y_{\color{#6E963B} T})\]
Study of optimal transport method for Markov chains
Theoretical framework: Optimal transport Markov (OTM) distances.
particular case of interest: \(\delta\)-discounted WL distances.
Differentiable
Computationally tractable
Naturally adapted to directed graphs
Can work with non-stationary Markov chains
Similar to the Weisfeiler-Lehman isomorphism test
Or to an MPNN
“Propagate” labels from neighbors to neighbors (random walk)
Determine “how different” the label distributions are
Compute by gradient descent: \[\text{Barycenter}(G_1, G_2, \dots, G_n) = \inf_{G} \sum d(G, G_i)^2\]
Computationally expensive
Ongoing work on generative models
$ pip install ot_markov_distances
This work is partially supported by NSF under grants CCF-2112665, CCF-2217058, and CCF-2310411.