Explore possible extensions to the serialized commit-graph format in Git, including adding reachability indexes for the graph of revisions
Git uses various clever methods for making operations on very large
repositories faster, from bitmap indices for git fetch
[1], to generation
numbers (also known as topological levels) in the commit-graph file for
commit graph traversal operations like git log --graph
[2].
The goal of this project is to examine various possible improvements that can make Git even faster, other than just using generation numbers. For example there are many methods to make reachability queries in very large graphs faster; it remains to be seen if they would work on large commit graphs (the graph of project history) as well as they work on other real-life graphs.
Ultimately, this project is about examining extension to Git's commit-graph feature, including mainly adding reachability indexes / labels for the DAG (Directed Acyclic Graph) of revisions.
This project, while mainly exploratory in nature, is using nbdev
library for literate programming in Python using Jupyter Notebooks -- not to create a Python module (to publish in PyPi and/or Conda), but to allow for splitting it up.
The original notebook at Google Colaboratory: "Reachability labels for version control graphs.ipynb" got quite unwieldy; it takes too much time to run it. By splitting it into smaller notebooks, and turning the code into helper modules, the hope is that it should be possible to quickly go to the interesting parts of exploration.
This project focuses on the Git operations that involve examining and walking the commit graph, i.e. the project history. Such operations include:
git merge-base --is-ancestor
git branch --contains
git tag --contains
git branch --merged
git tag --merged
git merge-base --all
git log --topo-sort
Only the first command performs straight reachability query.
Warning: this is not generated automatically
- Graphs in Python
- Related works: various reachability labellings
- Example directed graphs
- Drawing graphs
- Reachability index
- Topological levels
- DFS intervals labelling
- Reachability queries
- Extracting commit graphs from Git repositories
- Checkpointing
- Graph datasets
- Large Git repositories
- Graph stats
- Reachability evaluation
This topic is covered in much more details in slides for the presentation "Graph operations in Git version control system" by Jakub Narebski (2019).
Those slides are also available to read on SlideShare and on Speaker Deck: