Welcome to ot_markov_distances’s documentation!
ot_markov_distances
Differentiable distances on graphs based on optimal transport
This is the implementation code for
Brugere, T., Wan, Z., & Wang, Y. (2023). Distances for Markov Chains, and Their Differentiation. ArXiv, abs/2302.08621.
Setup
Installing as a library
The ot_markov_distances
package can be installed with the following command:
pip install ot-markov-distances
If for some reason you need to use cuda11.8
(ie you are installing torch+cuda118
)
then use the following command instead
pip install git+https://github.com/YusuLab/ot_markov_distances@cuda118
Dependencies
Python version
This project requires python 3.10
a minima.
If your python version is prior to 3.10
,
you need to update (or to create a new conda
environment)
to a version above (latest release at the time of writing is 3.12
)
Python dependencies
Note
The main branch uses the default (cuda12
) version of torch
in its dependencies. If for some
reason you need to use cuda11.8
, clone the cuda118
branch
instead
This package manages its dependencies via
poetry. I recommend you install it
(otherwise if you prefer to manage them manually, a list of the
dependencies is available in the file pyproject.toml
)
When you have poetry
, you can add dependencies using our makefile
Note
If you want to create a virtual environment for this project
(as opposed to using the one you are currently in)
you can use the command poetry env use python3.12
(or other python version)
$ make .make/deps
or directly with poetry
$ poetry install
TUDataset
If you are planning to reproduce the classification experiment.
The TUDataset
package is also needed to run the classification experiment,
but it is not available via pip
/ poetry
.
To install it, follow the instruction in the tudataset repo,
including the “Compilation of kernel baselines” section, and add the directory where you downloaded it to your $PYTHONPATH
.
eg:
$ export PYTHONPATH="/path/to/tudataset:$PYTHONPATH"
Project structure
.
├── docs #contains the generated docs (after typing make)
│ ├── build
│ │ └── html #Contains the html docs in readthedocs format
│ └── source
├── experiments #contains jupyter notebooks with the experiments
│ └── utils #contains helper code for the experiments
├── ot_markov_distances #contains reusable library code for computing and differentiating the discounted WL distance
│ ├── discounted_wl.py # implementation of our discounted WL distance
│ ├── __init__.py
│ ├── sinkhorn.py # implementation of the sinkhorn distance
│ ├── utils.py # utility functions
│ └── wl.py #implementation of the wl distance by Chen et al.
├── staticdocs #contains the static source for the docs
│ ├── build
│ └── source
└── tests #contains sanity checks
Documentation
The documentation is available online: read the documentation
Warning
Do not edit the documentation directly in the docs/
folder,
that folder is wiped every time the documentation is built. The
static parts of the documentation can be edited in staticdocs/
.
You can build documentation and run tests using
$ make
Alternatively, you can build only the documentation using
$ make .make/build-docs
The documentation will be available in docs/build/html
in the
readthedocs format
Running Experiments
Running experiments requires installing development dependencies. This can be done by running
$ make .make/dev-deps
or alternatively
$ poetry install --with dev
Experiments can be found in the experiments/
directory (see Project structure ).
The Barycenter and Coarsening experiments can be found in
experiments/Barycenter.ipynb
and experiments/Coarsening.ipynb
.
The performance graphs are computed in experiments/Performance.ipynb
Classification experiment
The Classification experiment (see the first paragraph of section 6 in the paper) is not in a jupyter notebook, but accessible via a command line.
As an additional dependency it needs tudataset
, which is not installable via pip
. To install it follow the instructions in the tudataset repo.
, including the “Compilation of kernel baselines” section, and add the directory where you downloaded it to your $PYTHONPATH
.
Now you can run the classification experiment using the command
$ poetry run python -m experiments.classification
usage: python -m experiments.classification [-h] {datasets_info,distances,eval} ...
Run classification experiments on graph datasets
positional arguments:
{datasets_info,distances,eval}
datasets_info Print information about given datasets
distances Compute distance matrices for given datasets
eval Evaluate a kernel based on distance matrix
options:
-h, --help show this help message and exit
The yaml file containing dataset information that should be passed to the command line is in experiments/grakel_datasets.yaml
.
Modifying this file should allow running the experiment on different datasets.
FAQ
I have a question about the paper
In this case just send me an email through the email address mentioned in the paper.
I have noticed a bug in the code
Please use the Github “Issues” feature to open a ticket, and post a description of the bug, the error message and a minimal reproducible example . I’ll try to fix it.
Or if you have fixed it, you can submit a Pull Request directly
I cannot install the library
If you followed all the instructions correctly, please create a ticket using Github Issues.
Why do you need python3.10
?
Because I am using structural pattern matching, and some typing features such as this one .
Indices and tables
Bibliography
Tristan Brugère, Zhengchao Wan, and Yusu Wang. Distances for markov chains, and their differentiation. 2023. arXiv:2302.08621.
Samantha Chen, Sunhyuk Lim, Facundo Mémoli, Zhengchao Wan, and Yusu Wang. Weisfeiler-Lehman meets Gromov-Wasserstein. In International Conference on Machine Learning (ICML), 3371–3416. PMLR, 2022.
Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, and Gabriel Peyré. Interpolating between optimal transport and mmd using sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, 2681–2690. PMLR, 2019.