A DeepLab V3 Model with ResNet 50 Encoder to perform Binary Segmentation Tasks.
Report Bug
.
Request Feature
- Table Of Contents
- About The Project
- Working
- Results
- Built With
- Getting Started
- Usage
- Folder Structure
- Roadmap
- Contributing
- License
- Authors
- Acknowledgements
- To Cite this Repository
The goal of this research is to develop a DeepLabV3 model with a ResNet50 backbone to perform binary segmentation on plant image datasets. Based on the presence or absence of a certain object or characteristic, binary segmentation entails splitting an image into discrete subgroups known as image segments which helps to simplify processing or analysis of the image by reducing the complexity of the image. Labeling pixels is a step in the segmentation process. Each pixel or piece of a picture assigned to the same category has a unique label.
Plant pictures with ground truth binary mask labels make up the training and validation dataset. The project uses PyTorch, a well-known deep learning library, for model development, training, and evaluation.1 During the training process, the model is optimized using strategies like the Dice Loss, Adam optimizer, Reducing LR on Pleateau and Early Stopping. All the while, important metrics like Intersection over Union (IoU), Pixel Accuracy, and Dice Coefficient are kept track of.
Datasets used during development of this project are described below:
-
The Eschikon Wheat Segmentation (EWS) Dataset consists of 190 images that were cropped to 350 by 350 pixel patches and manually labeled as binary masks for soil and plants, respectively. Pixels that the annotator was certain belonged to vegetative active material from a wheat plant should be marked as such. Everything else, including dirt, rocks, and dead plants, is categorized as vegetative inactive material. Following that, the masks were exported as 8-bit, lossless PNG images.
Between 2017 and 2020, a Canon 5D Mark II with a 35mm lens and autofocus was used to shoot these pictures. The approximate distance to the ground was 3 m. In 2017 and 2018, ISO, aperture, and shutter speed were set using the aperture priority setting; in 2019 and 2020, these settings were set using the shutter speed priority setting. The photographic collection for each year covers the whole growing season, from emergence to harvest. The photos were taken outdoors, in a setting with a wide range of sunlight and soil moisture conditions.
-
Plant Semantic Segmentation Dataset by HIL
Humans in the Loop (HIL) Plant Semantic Segmentation Dataset was made available as an Open-Access Dataset by The Computer Vision and Biosystems Signal Processing Group at the Department of Electrical and Computer Engineering at Aarhus University.
144 images of plant seedlings from 3 containers were collected over the course of two months at various intervals and are included in the dataset. Each container holds up to 40 single plants, and to make them easier to see, each plant has been given a bounding box. The photos are 4096 by 3000 pixels in size and manually annotated.
The annotations are made as such:
- ‘Background’ class as black.
- ‘Plant’ class as green.
-
The Computer Vision Problems in Plant Phenotyping (CVPPP) Leaf Counting Challenge (LCC) 2017 Dataset provides 27 images of tobacco and 783 Arabidopsis images in separate folders. Using a camera with a single plant in its range of view, tobacco photos were gathered. Images of the Arabidopsis plant were taken with a camera that had a wider field of view and were later cropped. The photographs were shot over a period of days from mutants or wild types, and they came from two different experimental settings, where the field of vision was different.
Additionally, certain plants are slightly out of focus than others due to the wider range of view. Though, the backgrounds of most photographs are straightforward and static, occasionally, moss growth or the presence of water in the growing tray complicates the scene. For the purpose of obtaining ground truth masks for every leaf/plant in the picture, each image was manually labeled.
The ultimate objective of the project is to develop a strong model that can accurately segment plant-related regions inside photographs, which can have applications in a variety of fields, such as agriculture, botany, and environmental sciences. The included code demonstrates how to prepare the data, create the model's architecture, train it on the dataset, and assess the model's effectiveness using a variety of metrics.
The objective of binary segmentation, often referred to as semantic binary segmentation, is to categorize each pixel in an image into one of two groups: the foreground (object of interest), or the background. A powerful Encoder-Decoder based architecture for solving binary segmentation challenges, DeepLabV3 with ResNet50 or ResNet101 as the backbone offers great accuracy and spatial precision.
Architecture of this Repository's Model - DeepLabV3 |
Known for its precise pixel-by-pixel image segmentation skills, DeepLabV3 is a powerful semantic segmentation model. It combines a robust feature extractor, such as ResNet50 or ResNet101, with an effective decoder. This architecture does a great job of capturing both local and global context information, which makes it suitable for tasks where accurate object boundaries and fine details are important. A crucial part is the Atrous Spatial Pyramid Pooling (ASPP) module, which uses several dilated convolutions to collect data on multiple scales. The decoder further improves the output by fusing high-level semantic features with precise spatial data. Highly precise segmentations across a variety of applications are made possible by this fusion of context and location awareness.
Residual Networks, often known as ResNets, are a class of deep neural network architectures created to address the vanishing gradient problem that can arise in very deep networks. They were first presented in the 2015 publication Deep Residual Learning for Image Recognition by Kaiming He et al. ResNets have been extensively used for a number of tasks, including image classification, object recognition, and segmentation.
The main novelty in ResNets is the introduction of residual blocks, which allow for the training of extremely deep networks by providing shortcut connections (skip connections) that omit one or more layers. Through the use of these connections, gradients can pass directly through the network without disappearing or blowing up, enabling the training of far more complex structures.
ResNets are available in a range of depths, designated as ResNet-XX, where XX is the number of layers. The ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152 are popular variations. The performance of the deeper variations is better, but they also use up more processing resources.
-
Encoder: The ResNet backbone's early layers are typically where the encoder is implemented. With growing receptive fields, it has numerous convolutional layers. These layers take the input image and extract low-level and mid-level information. The ASPP module is then given the feature maps the encoder produced.
For pixel-wise predictions, the Encoder is essential in converting raw pixel data into abstract representations. It consists of several layers of convolutional and pooling procedures, arranged into blocks, that gradually increase the number of channels while decreasing the input's spatial dimensions. Because of its hierarchical structure, the model may capture aspects with varied levels of complexity, from simple edges and textures to complicated object semantics.
-
Atrous Spatial Pyramid Pooling (ASPP): The ASPP module executes many convolutions with various dilation rates following the encoder. This records contextual data at various scales. Concatenated and processed outputs from several atrous convolutions are then used to create context-rich features.
By gathering data from diverse scales and viewpoints, the ASPP module improves the network's comprehension of the items in a scene. It is especially useful for overcoming the challenges presented by items with varying sizes and spatial distributions.
-
Decoder: Through skip connections, the decoder module combines low-level features from the encoder with high-level features from the ASPP module. This method aids in recovering spatial data and producing fine-grained segmentation maps.
This Module enables the network to generate precise and contextually rich segmentation maps by including skip links and mixing data from various scales. This is crucial for tasks like semantic segmentation, where accurate delineation of object boundaries is necessary for producing high-quality results.
-
Squeeze & Excitation (SE): It is a mechanism made to increase the convolutional neural networks' representational strength by explicitly modeling channel-wise interactions. Jie Hu et al. first discussed it in their publication Squeeze-and-Excitation Networks published in 2018. In order to enable the model to focus greater attention on crucial features, the SE Module seeks to selectively emphasize informative channels while suppressing less critical ones within the network.
By computing the average value of each channel across all spatial dimensions, the global average pooling method is used. The end result is a channel-wise descriptor that accurately reflects the significance of each channel in relation to the overall feature map.
The channels are then adaptively recalibrated using the squeezed information. Two fully connected layers are utilized for this. A non-linear activation function, also known as ReLU, is added after the first layer, which minimizes the dimensionality of the squeezed descriptor. A set of channel-wise excitation weights is produced after the second layer returns the dimensionality to the original number of channels. Each channel's weights indicate how much it should be boosted or muted.
Squeeze & Excitation Module
Results of the developed Model on EWS, PSS and CVPPP Dataset.
On the basis of IoU, the results of this repository's best performing model are compared to Zenkl et al. (2022), Yu et al. (2017), Sadeghi-Tehran et al. (2020) and Rico-Fernández et al. (2018).
Benchmark | IoU |
---|---|
Repository (Model v1.4) | 0.744 |
Zenkl et al. (2022) | 0.775 |
Yu et al. (2017) | 0.666 |
Sadeghi-Tehran et al. (2020) | 0.638 |
Rico-Fernández et al. (2018) | 0.691 |
ResNet50 Model v1.4 Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
Totaling 810 pictures of Tobacco and Arabidopsis plants, the CVPPP LCC 2017 Dataset is divided into 4 directories, A1 through A4. Arabidopsis plant photos are included in divides A1, A2, and A4, which have 128, 31, and 624 images, respectively. 27 photos of tobacco plants are included in A3.
A collection of 63 photos from the divides A1 through A4 were assembled to form an evaluation set, representing each split.
Model training was done on A1, A2, A3, and A4 separately for the outcomes of this repository's model. A separate split of 267 photos, consisting of 46 images from A1, 20 images from A2, and 201 images from A4, was also created and utilized for training.
Split | IoU | Dice-Loss |
---|---|---|
A1 | 0.371 | 0.498 |
A2 | 0.865 | 0.100 |
A3 | 0.614 | 0.410 |
A4 | 0.907 | 0.069 |
A1 A2 A4 (Model v1.6) | 0.942 | 0.044 |
A2 Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
A3 Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
A4 Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
A1 A2 A4 Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
There are 144 photos in the Humans in the Loop (HIL) Plant Semantic Segmentation (PSS) Dataset. No additional splits were created because of the smaller size of the dataset. The masks from the dataset, however, were thresholded to only contain black or white color. Black is the background, whereas white is the plant.
Data Augmentations were used during training of the model.
The best model found for this dataset produced the results listed below.
Model | IoU | Dice-Loss |
---|---|---|
Best Model | 0.603 | 0.315 |
Best Model Result Left to Right: Input Image, Ground Truth, Predicted Mask, Segmented Output |
- IDE Used:
- Operating System:
To get a local copy of this project up and running on your machine, follow these simple steps.
- Clone a copy of this Repository on your machine.
git clone https://github.com/mukund-ks/DeepLabV3Plus-PyTorch.git
- Python 3.9 or above.
python -V
Python 3.9.13
- CUDA 11.2 or above.
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_22:08:44_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
- Move into the cloned repo.
cd DeepLabV3Plus-PyTorch
- Setup a Virutal Environment
python -m venv env
- Activate the Virutal Environment
env/Scripts/activate
- Install Dependencies
pip install -r requirements.txt
Note You can deactivate the Virtual Environment by using
env/Scripts/deactivate
The Model can be trained on the data aforementioned in the About section or on your own data.
- To train the model, use
train.py
python train.py --help
Usage: train.py [OPTIONS]
Training Script for DeepLabV3 with ResNet50 Encoder for Binary
Segmentation.
Please make sure your data is structured according to the folder structure
specified in the Github Repository.
See: https://github.com/mukund-ks/DeepLabV3Plus-PyTorch
Refer to the Options below for usage.
Options:
-D, --data-dir TEXT Path for Data Directory [required]
-E, --num-epochs INTEGER Number of epochs to train the model for. Default
- 25
-L, --learning-rate FLOAT Learning Rate for model. Default - 1e-4
-B, --batch-size INTEGER Batch size of data for training. Default - 4
-P, --pre-split BOOLEAN Opt-in to split data into Training and Validaton
set. [required]
-A, --augment BOOLEAN Opt-in to apply augmentations to training set.
Default - True
-S, --early-stop BOOLEAN Stop training if val_loss hasn't improved for a
certain no. of epochs. Default - True
--help Show this message and exit.
- For Evaluation, use
evaluation.py
python evaluation.py --help
Usage: evaluation.py [OPTIONS]
Evaluation Script for DeepLabV3 with ResNet50 Encoder for Binary
Segmentation.
Please make sure your evaluation data is structured according to the folder
structure specified in the Github Repository.
See: https://github.com/mukund-ks/DeepLabV3Plus-PyTorch
Refer to the Option(s) below for usage.
Options:
-D, --data-dir TEXT Path for Data Directory [required]
--help Show this message and exit.
- An Example
python train.py --data-dir data --num-epochs 80 --pre-split False --early-stop False
python evaluation.py --data-dir eval_data
The folder structure will alter slightly depending on whether or not your training data has already been divided into a training and testing set.
-
If the data is not already seperated, it should be in a directory called
data
that is further subdivided intoImage
andMask
subdirectories.-
train.py
should be run with--pre-split
option asFalse
in this case.Example:
python train.py --data-dir data --pre-split False
-
Note
dataset.py
will split the data into training and testing set with a ratio of 0.2
$ tree -L 2
.
├── data
│ ├── Image
│ └── Mask
└── eval_data
├── Image
└── Mask
-
If the data has already been separated, it should be in a directory called
data
that is further subdivided into the subdirectoriesTrain
andTest
, both of which contain the subdirectoriesImage
andMask
.-
train.py
should be run with--pre-split
option asTrue
in this case.Example:
python train.py --data-dir data --pre-split True
-
$ tree -L 3
.
├── data
│ ├── Test
│ │ ├── Image
│ │ └── Mask
│ └── Train
│ ├── Image
│ └── Mask
└── eval_data
├── Image
└── Mask
- The structure of
eval_data
remains the same in both cases, holdingImage
andMask
sub-directories.
Note The directory names are case-sensitive.
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please make sure you check your spelling and grammar.
- Create individual PR for each suggestion.
- Please also read through the Code Of Conduct before posting your first idea as well.
- Fork the Project
- Create your Feature Branch (
git checkout -b MyBranch
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push -u origin myBranch
) - Open a Pull Request
Distributed under the Apache 2.0 License. See LICENSE for more information.
- Mukund Kumar Surehli - Comp Sci Student
- Dr. Naveen Aggarwal - Comp Sci Professor - Project Guide
- Dr. Garima Joshi - Comp Sci Professor - Project Guide
- M. Minervini, A. Fischbach, H.Scharr, and S.A. Tsaftaris. Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recognition Letters, pages 1-10, 2015, doi:10.1016/j.patrec.2015.10.013
- H. Scharr, M. Minervini, A.P. French, C. Klukas, D. Kramer, Xiaoming Liu, I. Luengo, J.-M. Pape, G. Polder, D. Vukadinovic, Xi Yin, and S.A. Tsaftaris. Leaf segmentation in plant phenotyping: A collation study. Machine Vision and Applications, pages 1-18, 2015, doi:10.1007/s00138-015-0737-3.
- B. Dellen, H. Scharr, and C. Torras. Growth signatures of rosette plants from time-lapse video. IEEE/ACM Transactions on Computational Biology and Bioinformatics, PP(99):1 - 11, 2015, doi:10.1109/TCBB.2015.2404810
- E.E. Aksoy, A. Abramov, F. Wörgötter, H. Scharr, A. Fischbach, and B. Dellen. Modeling leaf growth of rosette plants using infrared stereo image sequences. Computers and Electronics in Agriculture, 110:78 - 90, 2015, doi:10.1016/j.compag.2014.10.020
- M. Minervini , M.M. Abdelsamea, S.A. Tsaftaris. Image-based plant phenotyping with incremental learning and active contours. Ecological Informatics 23, 35–48, 2014, doi:10.1016/j.ecoinf.2013.07.004
- Polat H. A modified DeepLabV3 based semantic segmentation of chest computed tomography images for COVID-19 lung infections. Int J Imaging Syst Technol. 2022;32(5):1481-1495. doi:10.1002/ima.22772
- Li, K. (2022). Study on the segmentation method of the improved deeplabv3 algorithm in the basketball scene. Scientific Programming, 2022, 1–7. https://doi.org/10.1155/2022/3311931
- Wang Y, Wang C, Wu H, Chen P (2022) An improved Deeplabv3 semantic segmentation algorithm with multiple loss constraints. PLOS ONE 17(1): e0261582. https://doi.org/10.1371/journal.pone.0261582
- Zenkl, R., Timofte, R., Kirchgessner, N., Roth, L., Hund, A., Van Gool, L., Walter, A., & Aasen, H. (2022). Outdoor plant segmentation with deep learning for high-throughput field phenotyping on a diverse wheat dataset. Frontiers in Plant Science, 12. https://doi.org/10.3389/fpls.2021.774068
- Hsu C-Y, Hu R, Xiang Y, Long X, Li Z. Improving the Deeplabv3 Model with Attention Mechanisms Applied to Eye Detection and Segmentation. Mathematics. 2022; 10(15):2597. https://doi.org/10.3390/math10152597
- Singh, V. (2023, January 17). The Ultimate Guide to deeplabv3 - with Pytorch Inference. LearnOpenCV. https://learnopencv.com/deeplabv3-ultimate-guide/
- Zualkernan, I., Abuhani, D. A., Hussain, M. H., Khan, J., & ElMohandes, M. (2023). Machine Learning for Precision Agriculture Using Imagery from Unmanned Aerial Vehicles (UAVs): A Survey. Drones, 7(6), 382. https://doi.org/10.3390/drones7060382
- S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz and D. Terzopoulos, Image Segmentation Using Deep Learning: A Survey, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3523-3542, 1 July 2022, doi: 10.1109/TPAMI.2021.3059968.
- Pröve, P. L. (2017, October 18). Squeeze-and-Excitation Networks. Retrieved from https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
- Chen, LC., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_49
- Zhou, E., Xu, X., Xu, B. et al. An enhancement model based on dense atrous and inception convolution for image semantic segmentation. Appl Intell 53, 5519–5531 (2023). https://doi.org/10.1007/s10489-022-03448-w
- M. S. Minhas, Transfer Learning for Semantic Segmentation using PyTorch DeepLab v3, GitHub.com/msminhas93, 12-Sep-2019. Available: https://github.com/msminhas93/DeepLabv3FineTuning.
- Kou, L., Sysyn, M., Fischer, S., Liu, J., & Nabochenko, O. (2022). Optical Rail Surface Crack Detection Method Based on Semantic Segmentation Replacement for Magnetic Particle Inspection. Sensors, 22(21), 8214. https://doi.org/10.3390/s22218214
- Zhang C, Gao S, Yang X, Li F, Yue M, Han Y, Zhao H, Zhang Y, Fan K. Convolutional Neural Network-Based Remote Sensing Images Segmentation Method for Extracting Winter Wheat Spatial Distribution. Applied Sciences. 2018; 8(10):1981. https://doi.org/10.3390/app8101981
- Zhang, D., Zhang, L. & Tang, J. Augmented FCN: rethinking context modeling for semantic segmentation. Sci. China Inf. Sci. 66, 142105 (2023). https://doi.org/10.1007/s11432-021-3590-1
- Zeiler, M. D., & Fergus, R. (2013, November 12). Visualizing and Understanding Convolutional Networks. Retrieved from https://arxiv.org/abs/1311.2901v3
- Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. ArXiv, abs/1706.05587.
- Z. Zhang, X. Wang and C. Jung, DCSR: Dilated Convolutions for Single Image Super-Resolution, in IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1625-1635, April 2019, doi: 10.1109/TIP.2018.2877483.
- K. He, X. Zhang, S. Ren and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
- EWS Dataset
- Plant Phenotyping Dataset
- Plant Semantic Segmentation Dataset by HIL
Surehli, M. K., Aggarwal, N., & Joshi, G. (2023, August 6). GitHub - mukund-ks/DeepLabV3Plus-PyTorch: A DeepLab V3 Model with ResNet 50 Encoder to perform Binary Segmentation Tasks. Implemented with PyTorch. Retrieved from https://github.com/mukund-ks/DeepLabV3Plus-PyTorch