ot_markov_distances
===================
|github workflow badge| |codecov| |doc badge| |pytorch badge|
Differentiable distances on graphs based on optimal transport
This is the implementation code for
::
Brugere, T., Wan, Z., & Wang, Y. (2023). Distances for Markov Chains, and Their Differentiation. ArXiv, abs/2302.08621.
Setup
-----
Installing as a library
~~~~~~~~~~~~~~~~~~~~~~~
The ``ot_markov_distances`` package can be installed with the following command:
.. code:: console
pip install ot-markov-distances
If for some reason you need to use ``cuda11.8`` (ie you are installing ``torch+cuda118``)
then use the following command instead
.. code:: console
pip install git+https://github.com/YusuLab/ot_markov_distances@cuda118
Dependencies
~~~~~~~~~~~~
Python version
^^^^^^^^^^^^^^
This project requires ``python 3.10`` *a minima*.
If your python version is prior to ``3.10``,
you need to update (or to create a new ``conda`` environment)
to a version above (latest release at the time of writing is ``3.12``)
Python dependencies
^^^^^^^^^^^^^^^^^^^
.. note::
The main branch uses the default (``cuda12``) version of torch
in its dependencies. If for some
reason you need to use ``cuda11.8``, clone the ``cuda118`` branch
instead
This package manages its dependencies via
`poetry `__. I recommend you install it
(otherwise if you prefer to manage them manually, a list of the
dependencies is available in the file ``pyproject.toml``)
When you have ``poetry``, you can add dependencies using our makefile
.. note::
If you want to create a virtual environment for this project
(as opposed to using the one you are currently in)
you can use the command ``poetry env use python3.12``
(or other python version)
.. code:: console
$ make .make/deps
or directly with poetry
.. code:: console
$ poetry install
TUDataset
^^^^^^^^^
*If you are planning to reproduce the classification experiment.*
The ``TUDataset`` package is also needed to run the classification experiment,
but it is not available via ``pip`` / ``poetry``.
To install it, follow the instruction in `the tudataset repo`_,
including the "Compilation of kernel baselines" section, and add the directory where you downloaded it to your ``$PYTHONPATH``.
eg:
.. code:: console
$ export PYTHONPATH="/path/to/tudataset:$PYTHONPATH"
Project structure
~~~~~~~~~~~~~~~~~
.. code:: console
.
├── docs #contains the generated docs (after typing make)
│ ├── build
│ │ └── html #Contains the html docs in readthedocs format
│ └── source
├── experiments #contains jupyter notebooks with the experiments
│ └── utils #contains helper code for the experiments
├── ot_markov_distances #contains reusable library code for computing and differentiating the discounted WL distance
│ ├── discounted_wl.py # implementation of our discounted WL distance
│ ├── __init__.py
│ ├── sinkhorn.py # implementation of the sinkhorn distance
│ ├── utils.py # utility functions
│ └── wl.py #implementation of the wl distance by Chen et al.
├── staticdocs #contains the static source for the docs
│ ├── build
│ └── source
└── tests #contains sanity checks
Documentation
-------------
The documentation is available online: `read the documentation `_
.. warning::
Do not edit the documentation directly in the ``docs/`` folder,
that folder is wiped every time the documentation is built. The
static parts of the documentation can be edited in ``staticdocs/``.
You can build documentation and run tests using
.. code:: console
$ make
Alternatively, you can build only the documentation using
.. code:: console
$ make .make/build-docs
The documentation will be available in ``docs/build/html`` in the
readthedocs format
Running Experiments
-------------------
Running experiments requires installing development dependencies. This can be done by running
.. code:: console
$ make .make/dev-deps
or alternatively
.. code:: console
$ poetry install --with dev
`Experiments `__ can be found in the ``experiments/``
directory (see `Project structure <#project-structure>`__ ).
The Barycenter and Coarsening experiments can be found in
``experiments/Barycenter.ipynb`` and ``experiments/Coarsening.ipynb``.
The performance graphs are computed in ``experiments/Performance.ipynb``
Classification experiment
~~~~~~~~~~~~~~~~~~~~~~~~~
The Classification experiment (see the first paragraph of section 6 in the paper) is not in a jupyter notebook, but accessible via a command line.
As an additional dependency it needs ``tudataset``, which is not installable via ``pip``. To install it follow the instructions in `the tudataset repo`_.
, including the "Compilation of kernel baselines" section, and add the directory where you downloaded it to your ``$PYTHONPATH``.
Now you can run the classification experiment using the command
.. code:: console
$ poetry run python -m experiments.classification
usage: python -m experiments.classification [-h] {datasets_info,distances,eval} ...
Run classification experiments on graph datasets
positional arguments:
{datasets_info,distances,eval}
datasets_info Print information about given datasets
distances Compute distance matrices for given datasets
eval Evaluate a kernel based on distance matrix
options:
-h, --help show this help message and exit
The yaml file containing dataset information that should be passed to the command line is in ``experiments/grakel_datasets.yaml``.
Modifying this file should allow running the experiment on different datasets.
FAQ
---
I have a question about the paper
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this case just send me an email through the email address mentioned in the paper.
I have noticed a bug in the code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please use the Github "Issues" feature to open a ticket, and post a description of the bug, the error message and a
`minimal reproducible example `_ . I’ll try to fix it.
Or if you have fixed it, you can submit a Pull Request directly
I cannot install the library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you followed all the instructions correctly, please create a ticket using Github Issues.
Why do you need ``python3.10`` ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Because I am using `structural pattern matching `_, and some typing features such as `this one `_ .
.. _`the tudataset repo`: https://github.com/chrsmrrs/tudataset
.. |github workflow badge| image:: https://github.com/YusuLab/ot_markov_distances/actions/workflows/testing-publish.yml/badge.svg
.. |codecov| image:: https://codecov.io/gh/YusuLab/ot_markov_distances/branch/main/graph/badge.svg
:target: https://codecov.io/gh/YusuLab/ot_markov_distances
.. |pytorch badge| image:: https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge&logo=PyTorch&logoColor=white
.. |doc badge| image:: https://img.shields.io/badge/documentation-green?style=for-the-badge&logo=readme&logoColor=black
:target: https://tristan.bruge.re/documentation/ot_markov_distances