In order to compare two strings with each other do the following:
iex> FuzzyCompare.similarity("Oscar-Claude Monet", "monet, claude")
0.95
Imagine you had to match some names.
Try to match the following list of painters:
"Oscar-Claude Monet"
"Edouard Manet"
"Monet, Claude"
For a human it is easy to see that some of the names have just been flipped and that others are different but similar sounding.
A first approrach could be to compare the strings with a string similarity function like the Jaro-Winkler function.
iex> String.jaro_distance("Oscar-Claude Monet", "Monet, Claude")
0.6032763532763533
iex> String.jaro_distance("Oscar-Claude Monet", "Edouard Manet")
0.6749287749287749
This is not an improvement over exact equality.
In order to improve the results this library uses two different approaches,
FuzzyCompare.ChunkSet
and FuzzyCompare.SortedChunks
.
This approach yields good results when words within a string have been shuffled around. The strategy will sort all substrings by words and compare the sorted strings.
iex> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Monet, Claude")
1.0
iex(4)> FuzzyCompare.SortedChunks.substring_similarity("Oscar-Claude Monet", "Edouard Manet")
0.6944444444444443
The chunkset approach is best in scenarios when the strings contain other substrings that are not relevant to what is being searched for.
iex> FuzzyCompare.ChunkSet.standard_similarity("Claude Monet", "Alice Hoschedé was the wife of Claude Monet")
1.0
Should one of the strings be much longer than the other the library will attempt to compare matching substrings only.
This library is inspired by a seatgeek blogpost from 2011.