Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.0.0 version. skip codacy #25

Merged
merged 41 commits into from
Jan 16, 2019
Merged
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8097890
A quick cleanup to prepare random forest image matchers
KilianB Dec 6, 2018
98ce751
Cleanup number 2
KilianB Dec 6, 2018
401fdd0
Random forest wip prototype
KilianB Dec 6, 2018
968f5e4
fix javadocs and imports
KilianB Dec 25, 2018
1efe256
nightly save. some progress on the random forest image matchers
KilianB Dec 25, 2018
d956628
Start developing version 3.0.0 fix hash to String method to correclty
KilianB Dec 26, 2018
530b351
Distinguish in supervised and unsupervised matchers. Extract TestData
KilianB Dec 26, 2018
4202c32
Extract tree data in random forest image matcher
KilianB Dec 26, 2018
46a0d3b
fix test cases
KilianB Dec 26, 2018
d17d286
Add optimization criteria for potential use in random forest
KilianB Dec 26, 2018
65188a7
move from float threshold precision to double
KilianB Dec 26, 2018
81fd277
update variable naming. average color hash uses gray values instead of
KilianB Dec 27, 2018
3096753
update javadocs
KilianB Dec 27, 2018
2911c11
Add utility methods to allow each hashing algorithm to implement it's
KilianB Dec 27, 2018
36d9805
add categorical matcher which is a image matcher that clusters images
KilianB Dec 27, 2018
762ae37
3.0.0 reduce technical debt (#21)
KilianB Dec 27, 2018
7c62db3
Make h2 dependency optional. Resolves #16
KilianB Dec 30, 2018
dbca498
add unit tests for h2 image matcher
KilianB Dec 31, 2018
a9a4933
Remove duplicate code by inheriting from ahash.
KilianB Dec 31, 2018
0ddb30d
add missing scope keywords
KilianB Dec 31, 2018
4bfa269
use explicit scoping in unit tests
KilianB Dec 31, 2018
f5d6ab7
remove unused variable
KilianB Dec 31, 2018
93eddaa
more scoping fixes
KilianB Dec 31, 2018
c9079f3
final bad smell refactoring
KilianB Dec 31, 2018
da3db29
Rename H2DbImageMatcher to H2DatabaseImageMatcher
KilianB Jan 2, 2019
afc026d
Swap from big integer to stringbuilder creation
KilianB Jan 6, 2019
61a7c7a
Refactor package structure
KilianB Jan 9, 2019
d9ef1f8
Change hash creation to utilize hashbuilder instead of string builder
KilianB Jan 9, 2019
3b05d4a
Update difference hash creation to utilize hash builder. adjust to image
KilianB Jan 9, 2019
dffa61c
Adjust algorithm id to account for incompatibility
KilianB Jan 9, 2019
06ea9a8
Add fuzzy hash and refactor binary trees
KilianB Jan 9, 2019
6350af0
Add categorical matcher
KilianB Jan 9, 2019
02f2cd9
add categorical matcher part 2
KilianB Jan 9, 2019
4906f41
refactor package structure
KilianB Jan 9, 2019
0107cf0
cleanup packages and update dependencies
KilianB Jan 9, 2019
5f4981b
intermediate progress random forest matcher. Move from in set to
KilianB Jan 9, 2019
579ec86
Batch update. Version 3.0.0 Commit history in private repo
KilianB Jan 16, 2019
794fcb0
update examples and remove createDefaultMatcher
KilianB Jan 16, 2019
5a2dc5a
bump version
KilianB Jan 16, 2019
6726458
Merge branch 'master' into 3.0.0_Image_classifier_and_double_imagemat…
KilianB Jan 16, 2019
d3104fd
Merge branch 'master' into 3.0.0_Image_classifier_and_double_imagemat…
KilianB Jan 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Merge branch "master" into 3.0.0_Image_classifier_and_double_imagemat…
…cher
  • Loading branch information
KilianB authored Jan 16, 2019
commit d3104fda59a49cf2264f4d4a28a9d91216c02bd5
57 changes: 5 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,60 +79,13 @@ and optimize individual algorithms on your own.
</tbody>
</table>

### Hello World: Check if two images are likely duplicates of each other

````java
public static void main(String[] args){

//Load images
BufferedImage img1 = ImageIO.read(new File("image1.jpg"));
BufferedImage img2 = ImageIO.read(new File("image2.jpg"));

SingleImageMatcher matcher = SingleImageMatcher.createDefaultMatcher();

if(matcher.checkSimilarity(img1, img2)){
//likely duplicate found
}
}
````

### Check batch of images

````java
public void matchMultipleImagesInMemory() {

InMemoryImageMatcher matcher = InMemoryImageMatcher.createDefaultMatcher();

//Add all images of interest to the matcher and precalculate hashes
matcher.addImages(ballon,copyright,highQuality,lowQuality,thumbnail);

//Find all images which are similar to highQuality
PriorityQueue<Result<BufferedImage>> similarImages = matcher.getMatchingImages(highQuality);

//Print out results
similarImages.forEach(result ->{
System.out.printf("Distance: = Image: %s%n",result.distance,result.value);
});
}
````


Multiple types image matchers are available for each situation
## Multiple types image matchers are available for each situation

The `persistent` package allows hashes and matchers to be saved to disk. In turn the images are not kept in memory and are only referenced by file path allowing to handle a great deal of images
at the same time.
The `cached` version keeps the BufferedImage image objects in memory allowing to change hashing algorithms on the fly and a direct retrieval of the buffered image objects of matching images.
The `categorize` package contains image clustering matchers. KMeans and Categorical as well as weighted matchers.
The `ecotic` package


<table>
<tr> <th>Image Matcher Class</th> <th>Feature</th> </tr>
<tr> <td>SingleImageMatcher</td> <td>Compare if two images are similar with multiple chained hashing algorithms. An allowed distance is defined for each algorithm. To consider images a match every filter has to be passed independently.</td> </tr>
<tr> <td>InMemoryMatcher</td> <td>Keep precomputed hashes in memory and quickly tell apart batches of images. An allowed distance is defined for each algorithm. To consider images a match every filter has to be passed independently.</td></tr>
<tr> <td>CumulativeInMemoryMatcher</td> <td>Keep precomputed hashes in memory and quickly tell apart batches of images. An overall distance threshold is defined which is checked against the sum of the distances produced by all filters</td></tr>
<tr> <td>DatabaseImageMatcher</td> <td>Store computed hashes in a SQL database to tell apart batches of images while still keeping the hashes around even after a restart of the JVM. Conceptually this class behaves identical to the InMemoryMatcher. Performance penalties may incur due to binary tree's not being used.</td></tr>
</table>
The `exotic` package features BloomFilter, and the SingleImageMatcher used to match 2 images without any fancy additions.

<table>
<tr> <th>Image</th> <th></th> <th>High</th> <th>Low</th> <th>Copyright</th> <th>Thumbnail</th> <th>Ballon</th> </tr>
Expand Down Expand Up @@ -198,15 +151,15 @@ Image matchers can be configured using different algorithm. Each comes with indi
<tr><td><a href="#hoghash">HogHash</a></td> <td>Angular Gradient based (detection of shapes?) </td> <td>A hashing algorithm based on hog feature detection which extracts gradients and pools them by angles. Usually used in support vector machine/NNs human outline detection. It's not entirely set how the feature vectors should be encoded. Currently average, but not great results, expensive to compute and requires a rather high bit resolution</td> </tr>
</table>


### Version 3.0.0 Image clustering

Image clustering with fuzzy hashes.
Image clustering with fuzzy hashes allowing to represent hashes with probability bits instead of simple 0's and 1's

![1_fxpw79yoon8xo3slqsvmta](https://user-images.githubusercontent.com/9025925/51272388-439d9600-19ca-11e9-8220-fe3539ed6061.png)


### Algorithm benchmarking

<img src="https://user-images.githubusercontent.com/9025925/49185669-c14a0b80-f362-11e8-92fa-d51a20476937.jpg" />
See the wiki page on how to test differet hashing algorithms with your set of images

<img src="https://user-images.githubusercontent.com/9025925/49185669-c14a0b80-f362-11e8-92fa-d51a20476937.jpg" />
You are viewing a condensed version of this merge commit. You can view the full changes here.