This repository is a result of the Deep Learning Research Lab at the University of Koblenz-Landau, Germany. The project was carried out by a team of master students, attending Web Science or Mathematical Modelling, with the goal of learning about Deep Learning methods through collaboration.
The goal of our project was to develop an image retrieval system by using Deep Learning, Machine Learning and other compatible methods. The search engine in question needs to be able to retrieve corresponding images for the searched keyword. We assume that:
- The user is searching for an object in the represented in the image.
- The search engine has the functionality of learning new queries through user feedback on returned results or through an inverse image query.
- The user feedback is also taken into account for ranking the results.
- The search engine will not have any given metadata describing the content of the images.
- The search engine should be able to support further extending of the image dataset.
- Images need to be ranked based on relevance that the users have expressed through giving feedback
- We were given a set of unlabeled X images
For generating a vector representation of images we use the Inception/GoogLeNet Model. The output layer classfies the images into the 1000 ImageNet classes. The transfer Layer is a vector with 2048 describing each image.
For finding the similarity of images selected by the user, we computed a set of neighborhood n=50 queries for each image using the KNN algorithm. The neighborhood queries are precomputed.
For the reverse image query we used ANN instead of KNN to find similar images to the uploaded one.
For finding synonyms of entered query terms, we used word2vec.
The web application has an architecture splited into client and server, which are communicating over a REST API.
For extracting layers of the neural network, we used TensorFlow, as Inception is included within the Model Zoo.
For the KNN algorithm, we used the sckit-learn implementation, and for the ANN we used annoy.
ADD: Word2Vec library details
For the frontend (client) we used the ReactJS framework, and for the backend (server) we used Django
Hardware - Development and testing was done on a machine with 16 GB RAM and a 8 core CPU.
For presenting similar images, we compute a neighborhod query. This computation can be done on different NN layers and using different distance measures. We decided to compare performance between the Transfer and the Output layer, over three different distance measures - Cosine, Euclidean, Manhattan. The best performing combination is shown in bold fonts.
NN Layer and Distance Measure | p@5 | p@10 | p@20 |
---|---|---|---|
Transfer and Manhattan | 60.7% | 55.8% | 46.1% |
Transfer Euclidean | 59.9% | 57.0% | 45.6% |
Transfer Cosine | 60.1% | 50.8% | 40.6% |
Output Manhattan | 50.4% | 47.5% | 44.9% |
Output Euclidean | 48.2% | 43.8% | 43.3% |
Output Cosine | 49.4% | 48.3% | 45.6% |
NOTE In this case, the p in p@n stands for a modified precision where the examinee rates how similar an image is on the scale 0-4.
We wanted to compare performance of the two algorithms on large vectors (2048). Although KNN is much more precise, ANN is significantly faster.
Algorithm | p@5 | p@10 | p@20 |
---|---|---|---|
KNN | 61.4% | 52.3% | 42.2% |
ANN (20 trees) | 51.5% | 40.8% | 32.95% |
In case of running queries corresponding to the terms from ImageNet classes the p@20 was 90% when this query is submitted for the first time (the user has not given any feedback yet). In case of running queries for unknown terms, counting from the first occurrence of a relevant image/s corresponding to the submitted query, the system needs on average 6 feedback rounds submitted by the user in order to reach p@20 of 75%.
To run the React app:
- navigate to the "frontend" folder and run "$ npm install".
- After the node_modules are installed, run "$ npm start".
- To build the production version, run "$ npm build".
How to run:
python get_inception_output.py [image_path] [output_path]
Script saves a file with vectors from the output layer, for each image on inage_path and saves it as pickle on output_path.
How to run:
python get_transfer_layer.py [image_path] [output_path]
Script saves a file with vectors from the transfer layer, for each image on inage_path and saves it as pickle on output_path.
How to run:
python compute_distances.py [features_path] [output_dir_path] [distance_metric] [n_neighbors] [n_cores]
Scripts uses vectors from output layer of the CNN for each image in the file specified with features_path to find top n_neighbors with the specified distance_metric. The neighbors and distances are stored in two separate files in the output_dir_path This script copies fiels from path_copy to path_paste. The files are specified in the filter.csv file. An optional parameter is the parameters for the n_cores to be used, value -1 means all cores are to be used, by default the value is 4.
- @sVujke - Stefan Vujovic
- @t0a0 - Igor Fedotov
- @utegulovalmat - Almat Utegulov
- @vamshi-7 - Bandla Vamshi
- @shohelahamad - Shohel Ahmad
- @aelkindy - Abudllah El Kindy