Enable new diff option linematch #14537

jwhite510 · 2021-05-12T01:27:47Z

This fork was created to improve the diff mode of neovim to show more useful information when comparing lines between files in diff view. Line comparisons are made in a more useful way to show which lines are actually being added, changed, and deleted.
2 files before:

2 files after:

3 files before:

3 files after:

Fugitive merge conflict before:

Fugitive merge conflict after:

How to use:
enable this enhanced diff mode by using :set diffopt =linematch:{n}. Where n is the maximum total number of lines of the diff hunk. The line match diff opt is disabled automatically when diffing more than three files at once. A reasonable setting is ":set diffopt =linematch:50", this will align the most similar lines for a diff hunk in two buffers, 25 lines long in each, or a diff hunk between 3 files, 20 lines, 20 lines, and 10 lines. The limit is placed to prevent lag when a very large diff hunk is present, in the case that the specified line number is exceeded, the default diff behaviour is resumed.

Why is this not a plugin?
This may be able to be converted to a plugin, but doing so would take much more work because the original diff mode would first need to be completely hidden. All the locations with diffs would need to be overwritten with the text from the linematch diff output. This would include writing text over locations which are marked as filler lines, which I don't believe is possible to do. Changing lines would need to be done on different "fake lines", because part of the functionality here moves around the lines to align them between the diff buffers. Additionally, By default the diff mode in vim is very bad compared to other editors like Emacs and vs-code, so by default VIM should have a comparable high quality diff view because other editors do.

How it works:

Before:

After

The 3d case (for 3 buffers) of the algorithm implemented when diffopt 'linematch' is enabled. The algorithm constructs a 3d tensor to compare a diff between 3 buffers. The dimmensions of the tensor are the length of the diff in each buffer plus 1 A path is constructed by moving from one edge of the cube/3d tensor to the opposite edge. Motions from one cell of the cube to the next represent decisions. In a 3d cube, there are a total of 7 decisions that can be made, represented by the enum path3_choice which is defined in buffer_defs.h a comparison of buffer 0 and 1 represents a motion toward the opposite edge of the cube with components along the 0 and 1 axes. a comparison of buffer 0, 1, and 2 represents a motion toward the opposite edge of the cube with components along the 0, 1, and 2 axes. A skip of buffer 0 represents a motion along only the 0 axis. For each action, a point value is awarded, and the path is saved for reference later, if it is found to have been the optimal path. The optimal path has the highest score. The score is calculated as the summation of the total characters matching between all of the lines which were compared. The structure of the algorithm is that of a dynamic programming problem. We can calculate a point i,j,k in the cube as a function of i-1, j-1, and k-1. To find the score and path at point i,j,k, we must determine which path we want to use, this is done by looking at the possibilities and choosing the one which results in the local highest score. The total highest scored path is, then in the end represented by the cell in the opposite corner from the start location. The entire algorithm consits of populating the 3d cube with the optimal paths from which it may have came. However, we cannot apply the general 3d case before first populating the edges and the surfaces of the cube. Therefore, there are several sets of if / else statements inside the main loops which determine which case to evaluate.

Optimizations
As the function to calculate the cell of a tensor at point i,j,k is a function of the cells at i-1, j-1, k-1, the whole tensor doesn't need to be stored in memory at once. In the case of the 3d cube, only two slices (along k and j axis) are stored in memory. For the 2d matrix (for 2 files), only two rows are stored at a time. The next/previous slice (or row) is always calculated from the other, and they alternate at each iteration.

In the 3d case, 3 arrays are populated to memorize the score (matched characters) of the 3 buffers, so a redundant calculation of the scores does not occur

src/nvim/diff.c

fredizzimo · 2021-07-09T13:46:29Z

This is great!

But I think it needs two additional changes in order to be fully useful for solving merge conflicts with conflict markers.

I am assuming that you run a three window, three way diff, as opposed to the default four window configuration, which doesn't really work at all with this pull request since it only adds support for three windows. But four windows is not really needed, since the base is included in the conflict markers, additionally you can always open a separate view showing the two sides and the base if you want to.

The first problem is related to the alignment. With this pull request each side gets its own block, so that there's no direct way to see the actual differences between each side. You have to manually scan the blocks and spot the differences. One workaround would be to always start the conflict resolving by selecting one version, probably the base version. Then you can compare both sides to that in order to see what exactly is changed on each side. After that you can resolve it for it for real. But that's an extra step and you also loose the ability to use copy paste within the same buffer to shape the conflict resolution.

But I think this could be improved with some additional options, to be able to select which part of a conflict marker is used for the line alignment. So if you for example selects the base version, then both the local and remote blocks will have empty lines of each side, and the line matching only matches against the base version. If no conflict markers are found in the current hunk, then it would work as before, so as soon as you have actually started resolving the conflict it compares both sides against that.

I think this change needs to be part of Neovim, because just like this pull request I don't see an easy way to implement it by a plugin. Of course it can and probably should be a different pull request though.

The second problem is related to the highlight colors, as seen in the three files example for line 15,21,15, it's not immediatley clear that example1 and example3 are the same. This could be fixed by using three different colors, using the same colors for the parts that are the same on both sides, and unique colors for the parts that are unique. Also unlike the default highlighting which just marks the start and end of a difference, this should probably support multiple color blocks on a single line.

The highlighting part does not necessarily need to be a part of neovim core, it could be implemented as a plugin similar to https://github.com/rickhowe/diffchar.vim, or with a bundled lua script.

Are these something that could be included in Neovim? I am quite busy myself, but if no one else wants to tackle these I could probably find time to implement them myself.

sakhnik · 2021-08-18T10:28:58Z

It looks there're intentions to implement the addition of filler lines: #15331 declared a duplicate of #9496. This would allow delivering the line matching algorithm as a separate plugin.

sakhnik · 2021-12-01T10:25:49Z

Virtual lines support is released in 0.6. Proper diff support with line matching could now be implemented in a separate plugin using the following changes:

feat(api): add lua C bindings for xdiff #14536 — xdiff API in Lua
392c658 — virtlines
8d7816c — multiple virtlines

jwhite510 · 2021-12-01T14:44:19Z

since making this pull request, I have been doing all my coding in my own fork of neovim with these changes and some additional ones related to diff mode. When I've ran into compatibility problems with other plugins, I've rebased my changes to the latest neovim and fixed some merge conflicts.

I'm really confident the changes in this PR are stable, as I've been using it for almost a year with no issues.
I think that the code might be able to be simplified a lot by using recursion instead of two seperate long functions
for the case with 3 buffers and 2 buffers in diff mode. But it is stable.

Not included in this PR, I've made some changes to the diff mode that I use daily, but are not stable for a PR. As there are some bugs that I am used to and work around. But If there was interest, I would add the fixes to make them stable.

bad diff dividers:
I added an option to supply characters that should not divide diffs. A really common problem that creates a bad diff is when you have a really long diff and it's divided by a one line that contains one character like a '{' or a ']'. when if that line were not splitting the diff in two, it would be a lot more readable.

char diff
I made a character wise diff to display the exact changes on the lines. Before this change, you will never see two highlighted sections on a line showing the exact changes. only one highlighted block. so if there's a change at the beginning of a line and at the end, normally it would just highlight from the start to the end of the changed section, instead of showing two individual changed sections on the line.

i'd definitely be willing to do some work to get this merged somehow so I don't have to keep rebasing it on my own fork

Co-authored-by: Lewis Russell <[email protected]>

lewis6991 · 2022-11-01T17:12:59Z

Assuming CI stays green, I aim to merge this by the end of the week.

lewis6991 · 2022-11-04T09:07:56Z

Many thanks to @jwhite510 for putting this together and being (very) patient with us getting this in.

Follow on to neovim#14537

Requires neovim/neovim#14537

Follow on to neovim#14537

sakhnik reviewed May 12, 2021

View reviewed changes