The Synthetic to Real Image Classifier is an image classification model designed to classify images into three categories: humans, horses, and a combination of both. It utilizes transfer learning, a popular technique in deep learning, leveraging the architecture and pre-trained weights of the Inception Convolutional Neural Network model. Trained on synthetic images, it can also classify real ones.
Transfer Learning: A machine learning technique where a model pre-trained on a large dataset is fine-tuned on a new, smaller dataset, leveraging learned features to improve performance and reduce training time. It's especially useful in scenarios with limited labeled data.
InceptionV3: A deep convolutional neural network designed for image recognition tasks, utilizing Inception modules that apply multiple convolutional filters of different sizes within the same layer, allowing the network to capture a wide range of features from fine details to broader patterns.
The inspiration for this project stemmed from Laurence Moroney's binary classifier, which was trained on synthetic images of humans and horses. Building upon this concept, I aimed to utilize a different dataset of computer-generated images and transitioned the classifier from binary to multi-class. This adaptation enabled the classification of images containing both humans and horses.
The training process involves fine-tuning the parameters of the Inception model on the three distinct CGI image datasets. These datasets are carefully curated and collected to represent scenarios featuring humans, horses, and both entities coexisting within the same image. The decision to use CGI images instead of real images stems from the practical challenges associated with acquiring large volumes of real data, particularly in scenarios involving specific subjects like horses. While real images offer authenticity, the synthetic nature of CGI images provides scalability and flexibility, enabling the creation of diverse training sets without constraints.
The model's performance metrics are highly promising, with a training accuracy of 99% indicating its proficiency in learning from the training data. Furthermore, the validation accuracy of 92% underscores its ability to generalize well to unseen data, demonstrating robust classification capabilities. Despite the impressive results, there remains room for improvement, particularly in the exploration of integrating real images into the training pipeline.
Base Model with Transfer Learning
|
Model fine-tuned with Inception V3
|
Future iterations of the model could benefit from hybrid approaches that combine vast amount of both synthetic and real data, potentially enhancing its adaptability to real-world scenarios and further improving classification accuracy.
- Synthetic images for humans, horses, and combinations were generated through https://runwayml.com/.
- Additional synthetic images specifically for humans were downloaded from: facesyntheticspubwedata.blob.core.windows.net/iccv-2021/dataset_100000.zip.
The finalized model and its weights are available for free download and unrestricted use.
This project is licensed under the Raza Mehar License. See the LICENSE.md file for details.
For any questions or clarifications, please contact Raza Mehar at [[email protected]].