0% found this document useful (0 votes)
29 views

Underwater Image Classification Using ML Techniques

This document discusses underwater image classification using machine learning techniques. It begins by outlining the importance of underwater image classification for environmental protection and scientific discovery. It then describes the unique challenges of classifying underwater images, including low visibility, color distortion, and complex marine life diversity. The document proposes addressing these challenges through techniques like image preprocessing, data augmentation, and transfer learning using models like convolutional neural networks. It aims to explore state-of-the-art methods for performing accurate underwater image classification.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Underwater Image Classification Using ML Techniques

This document discusses underwater image classification using machine learning techniques. It begins by outlining the importance of underwater image classification for environmental protection and scientific discovery. It then describes the unique challenges of classifying underwater images, including low visibility, color distortion, and complex marine life diversity. The document proposes addressing these challenges through techniques like image preprocessing, data augmentation, and transfer learning using models like convolutional neural networks. It aims to explore state-of-the-art methods for performing accurate underwater image classification.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Underwater Image Classification using ML Techniques

Periseti Sai Ram Mohan Rao, Mahesh Vasimalla


Department of Computer Science Engineering
Indian Institute of Information Technology, Raichur,
Raichur, 584135, Karnataka, India
{maheshvasimalla333, sairam122}@gmail.com

Neha Agarwal
Department of Computer Science Engineering
Indian Institute of Information Technology Raichur,
Raichur 584135, Karnataka, India
[email protected]

Abstract— Underwater image classification is a compelling to conservation efforts. By leveraging ML and DL, we can
field within computer vision and marine science, leveraging efficiently process large volumes of underwater imagery,
the power of Machine Learning (ML) and Deep Learning accelerating scientific discovery and environmental pro-
(DL) to decipher the hidden realms of our oceans. This
underwater image classification is essential for measuring the tection.
water bodies’ health and quality and protecting endangered This paper delves into the world of underwater im-
species. Further, it has applications in oceanography, marine age classification, emphasizing the role of ML and DL
economy and defense, environment protection, underwater in addressing the unique challenges posed by aquatic
exploration, and human-robot collaborative tasks. In this
environments. We explore the state-of-the-art techniques,
paper, we explore the significance and challenges of classifying
underwater images, highlighting the unique characteristics including Convolutional Neural Networks (CNNs) and
of this environment, including low visibility, color distortion, transfer learning, which have propelled the field forward.
and complex marine life diversity. This paper presents various Furthermore, we underscore the significance of robust
types of deep learning and ML techniques for performing and diverse underwater image datasets, as well as the
underwater image classification.
importance of preprocessing methods tailored to this
Index Terms— PCA, GAN, ReLU, SVM-CNN, Transfer Learn- context.
ing, Neural Networks
II. C HALLENGES IN THE AUTOMATED CLASSIFICATION OF
I. INTRODUCTION
UNDERWATER IMAGES :
In today’s digital age, the ubiquity of images and visual
data has revolutionized the way we interact with technol- The automated classification of underwater images is
ogy. In the age of visual data, where images are the lan- faced with numerous challenges. The intensity of light
guage of machines, image classification serves as a bridge is diminished due to energy loss during its propagation,
between the visual world and computational intelligence. resulting in low and variable illumination and visibility,
Image classification, a cornerstone of this technological particularly in deeper waters. Furthermore, ocean currents
revolution, plays a pivotal role in the analysis of visual in contrast to still waters like ponds or swimming pools
data. contribute to changes in luminosity. These alterations,
This paper focuses on the underwater image classifica- along with impurities and suspended solids, give rise to
tion. Underwater image classification aims to decode the intricate noise in underwater images, particularly those
secrets of the deep by enabling the automated analysis of captured in the ocean. Additionally, these images exhibit
images captured in aquatic environments. The underwater low contrast and degraded edges and details. Moreover,
environment introduces factors such as reduced visibil- the non-uniform spectral propagation causes color distor-
ity, color distortion, and the presence of diverse marine tion depending on the distance. In order to address some
species, all of which complicate the process of image of these limitations, the use of sophisticated yet expensive
analysis. cameras is necessary. Here are some key challenges in
In recent years, the fusion of Machine Learning (ML) automated underwater image classification datasets:
and Deep Learning (DL) techniques has emerged as a Limited Labeled Data- Collecting labeled data for un-
transformative force in this field. These methodologies derwater image classification is a labor-intensive task. The
have the capacity to not only identify and classify under- limited availability of diverse and well-labeled datasets
water objects, species, and habitats but also to enhance hinders the training of robust machine learning models.
our understanding of marine ecosystems and contribute Insufficient data can lead to overfitting, where the model

1
performs well on training data but fails to generalize to transfer learning. Transfer learning also reduces compu-
new, unseen data. tational demands during training[2]. [13]For underwater
Species and Object Recognition-Identifying underwater computer vision, the image preprocessing is the most
species and objects is challenging due to the diversity of important procedure for object detection. Because of the
marine life and the potential for occlusions. Some species effects of light scattering and absorption in the water, the
may have similar appearances, and certain objects may be images obtained by the underwater vision system show
partially hidden, making accurate classification difficult. the characteristics of uneven illumination, low contrast,
Dynamic Environmental Condition-Underwater con- and serious noise and more which are mentioned in the
ditions can change rapidly, including variations in water II i.e., Challenges in the underwater image classification.
currents, temperature, and light levels. Models must be
robust to these dynamic environmental factors to ensure
consistent performance across different scenarios.
Sensitivity to Illumination Conditions- Illumination
conditions underwater can vary significantly depending
on the time of day, weather conditions, and water depth.
Models need to be robust to changes in lighting for con-
sistent performance. Addressing these challenges requires
a combination of advanced computer vision techniques,
domain-specific knowledge, and the availability of high-
quality, diverse datasets. Hence we use preprocessing
techniques to solve these challenges and data augmen-
tation is a necessary and crucial strategy in underwater
Fig. 2. Original Image before Pre-processing
image classification to overcome challenges related to lim-
ited labeled data, enhance model robustness, and improve
We now discuss the techniques used to solve these
generalization to diverse underwater conditions.
challenges-
This paper is structured as follows:SectionIII Proposed
Standard Median filtering- it is a commonly used
methodology .Section IV- Image Pre-processing tech-
image preprocessing technique that aims to reduce noise
niques. In Section V- Data Augmentation. In Section VI
and preserve edges in an image. Here in this proposed
-Transfer Learning Models. In SectionVII -Compare the
model, we used a 5x5 kernel and this kernel moved on
results with other models and finally in SectionVIII- The
the image such that the center of the kernel traverses
conclusion. Section IX References
all input image pixels. Features that are smaller than
III. P ROPOSED M ETHODOLOGY half the size of the median filter kernel are completely
removed by the filter. It operates by replacing each pixel’s
intensity with the median value of the pixel intensities in
its neighborhood, which is used to reduce the salt pepper
noise i.e., (0 or 255 values) in the image we use standard
median filtering[1]. When median filtering is applied to an
underwater image it reduces the noise of the image but
increases blurring in the image. To reduce these effects we
can use dehaze, this technique is designed to reduce or
remove the haze and blur effects and it aims to enhance
the clarity and visibility of the image.

Fig. 1. Architcture

IV. I MAGE P RE - PROCESSING T ECHNIQUES :


Why is there a need for Image pre-processing? -
Underwater images often suffer from issues such as
poor visibility, color distortion, and environmental noise,
therefore pre-processing them is absolutely essential. Pre-
processing helps address these challenges and enhances
the quality of images, leading to improved performance
in image classification models. The lack of underwater Fig. 3. Resultant image after applying Standard Median Filter on original
image datasets requires the use of data augmentation and image

2
We also used the Otsu Thresholding,which is an image
segmentation technique used to find an optimal threshold
for binarization. A criterion function is computed for
intensity and that which maximizes this function is
selected as the threshold. Otsu’s thresholding picks the
threshold value to minimize the intra-class variance of
the thresholded black and white pixels. This technique
separates an image into two classes: foreground and
background or object and non-object according to the
threshold values. This method is particularly effective
when there is a bimodal distribution of pixel intensities
in the image. Fig. 5. Resultant image after applying Morphological Erosion on Otsu’s
Thresholded image
Algorithm[2]:
Step 1: Compute histogram for a 2D image.
Step 2: Calculate foreground and background variances fusion, boundary expansion, noise removal, and feature
(measure of spread) for a single threshold. extraction. It is an essential operation in morphologi-
i) Calculate the weight of background pixels and fore- cal processing pipelines, contributing to tasks such as
ground pixels. image enhancement, segmentation, and pattern recog-
ii) Calculate the mean of background pixels and fore- nition. This process helps join the broken parts of the
ground pixels. objects with a particular technique which further helps
iii)Calculate the variance of background pixels and fore- in modelling. Dilation is often followed by erosion in
ground pixels. Step 3: Calculate "within class variance" morphological operations, and the combination of these
operations is known as closing.

Fig. 4. Otsu’s Thresholding Work Flow[2]


Fig. 6. Resultant Image after applying Morphological dilation on the
Erosion and dilation are the basic operations of mor- previous result image

phological image processing The erosion process shrinks


the foreground structures while the dilation process en-
larges them. The performance of both operations depends V. D ATA AUGMENTATION :
on their structuring element shape. In this paper, the
erosion and dilation operations are programmed using the It is a technique commonly used in machine learning
OpenCV tools. and computer vision to artificially increase the diversity of
Morphological erosion is applied to binary images to a training dataset by applying various transformations to
reduce the size of foreground objects or to grayscale im- the existing data. The main goal of data augmentation is to
ages for various image enhancement tasks. The operation enhance the generalization ability of a machine learning
involves the use of a structuring element (also known as a model, making it more robust to variations and improving
kernel) to modify the shape or size of objects in an image. its performance on unseen data.
Erosion is effective in reducing small-scale noise or thin We used various data augmentation techniques like-
structures in binary images and is also used to modify
• Rotation: Rotating the image by a certain angle.
the shape or size of objects in an image. This technique
• Flip: Flipping the image horizontally or vertically.
is often followed by dilation in morphological operations.
• Shift: Shifting the image horizontally or vertically.
• noise injection.
Morphological Dilation is a versatile technique used in
• Random cropping.
various image processing applications, including object

3
refines the models, and the generated images are visual-
ized throughout the process.
B. working
We implemented a simple Generative Adversarial Net-
work (GAN) using TensorFlow and Keras for generating
images. GANs consist of a generator model and a dis-
criminator model that are trained simultaneously in a
competitive manner. The generator takes a random noise
vector (latent_dim) as input and produces an image.
It consists of fully connected layers with leaky ReLU
activation functions and batch normalization.The output
Fig. 7. Horizontal and Vertical flip of the preprocessed image
is a generated image with the same dimensions as the
input images The discriminator takes an image as input
and outputs a binary classification (real or fake). It consists
of fully connected layers with leaky ReLU activation func-
tions. The output is a probability indicating the likelihood
that the input image is real, the Adam optimizer for both
the generator and discriminator.

Fig. 8. Random cropped images of the preprocessed image

A. GANs-Generative Adversarial Networks


Both data augmentation and GANs contribute to en-
hancing the diversity and generalization of machine learn-
ing models, they serve different purposes and operate in Fig. 10. Generation of GAN images at epochs 5,10,15 [14]

distinct ways. Data augmentation focuses on expanding


the training dataset by applying various transformations VI. T RANSFER L EARNING M ODELS
to the existing data whereas GANs are a class of generative
In transfer learning, from each pretrained and finetuned
models designed to generate entirely new and realistic
network, a set of those layers are selected that would
data samples that resemble the training data.
output the “best” feature vectors for classification. For
In GANs, there is a Generator and a Discriminator. selecting the layers, they employ the “sequential floating
The generator learns to create synthetic images, while the forward selection”(SFFS) method, which uses an SVM.
discriminator becomes adept at distinguishing between In plankton classification, the distributions of different
real and generated images. The training loop iteratively classes are generally different in the training and the test
sets.
We are going to use Ensemble learning is a machine
learning technique that involves combining the
predictions of multiple models to improve the overall
performance and robustness of a system. The idea is
that by aggregating the predictions of multiple models,
the ensemble can often achieve better results than any
individual model on its own.

In this paper, we implement a multi-input neural


network using the VGG16 and ResNet50 architectures for
image classification. It uses transfer learning by loading
pre-trained models and then adding custom layers on
Fig. 9. Added Noise and orientation changed images of the preprocessed top for your specific classification task.
image

4
we are going to use multiple collaborative models
for improved classification performance on datasets with
class imbalance. This system combines pretrained CNNs
followed by an additional learning phase. To mitigate class
imbalance, they employ the strategies of data standard-
ization, data augmentation, and usage of “class weights”.
Additionally, the authors integrate training using geomet-
ric (dimensions, area, etc.) and environmental data (tem-
perature, salinity, season, time, etc.) into the classification Fig. 12. ResNet Architecture[4]
system by concatenating with the extracted feature maps
from CONV layers
We take VGG and ResNet models for constructing a are added on top of each base model. The outputs of both
collaborative model. The learners are trained individually models are concatenated. A final dense layer with softmax
and are loaded with fixed weights. And in each model last activation is added for classification. The model is created
layers are concatenated and followed by softmax layer. The using the Model class from Keras.
FC layer works as novel function for the model to learn Freezing Base Model Layers: The layers of both VGG16
how efficiently every learner contributes. and ResNet50 base models are set as non-trainable to
. preserve their pre-trained weights.
VGG16 is a convolution neural network (CNN) architec- Model Compilation: The model is compiled with the
ture that’s considered to be one of the best vision model Adam optimizer and categorical crossentropy loss.
architectures to date. Instead of having a large number of Training: The model is trained using the fit method. The
hyper-parameters, VGG16 uses convolution layers with a training data is passed as a list containing both VGG16
3x3 filter and a stride 1 that are in the same padding and ResNet50 features.
and maxpool layer of 2x2 filter of stride 2. It follows Evaluation: The model is evaluated on the validation set,
this arrangement of convolution and max pool layers and predictions are obtained.Accuracy is calculated using
consistently throughout the whole architecture. In the end scikit-learn’s accuracy score.
it has two fully connected layers, followed by a softmax ResNet enables the creation of very deep neural net-
for output. The 16 in VGG16 refers to it has 16 layers that works, which can improve performance on image recog-
have weights. This network is a pretty large network, and nition tasks.ResNet50 provides a novel way to add more
it has about 138 million (approx) parameters. convolutional layers to a CNN, without running into the
vanishing gradient problem, using the concept of shortcut
connections. VGG16 supports the processing for a large-
scale data set with a deep network layers and smaller
filters to produce a better performance. VGG model can
have a considerable number of weight layers due to the
small size of the convolution filters; of course, more
layers mean better performance. However, this isn’t an
unusual trait. Ensembling both models create a better
model offsetting disadvantages of the other to create a
better model
Fig. 11. VGG-16 Architecture[3]

ResNet-50 is a convolutional neural network that is 50


layers deep. You can load a pretrained version of the
neural network trained on more than a million images
from the ImageNet database [1]. The pretrained neural
network can classify images into 1000 object categories,
such as keyboard, mouse, pencil, and many animals.
As a result, the neural network has learned rich feature
representations for a wide range of images. The neural
Fig. 13. Ensemble Model
network has an image input size of 224-by-224.
Data Preparation: Training and validation data direc-
tories are specified. Image dimensions and batch size are
VII. C OMPARING RESULTS WITH OTHER MODELS
defined. ImageDataGenerator is used for data augmenta-
tion. VIII. E XPERIMENTAL A NALYSIS
Model Architecture: VGG16 and ResNet50 base models The dataset is of size 1136 MB. To train the model over
are loaded with pre-trained weights. Custom dense layers the entire dataset we used 12GB of RAM and 1TB storage.

5
The generated images take nearly 12 GB of storage in [12] F. Han, J. Yao, H. Zhu, C. Wang, et al., “Underwater image processing
Running the model through the entire dataset gives a good and object detection based on deep cnn method,” Journal of
Sensors, vol. 2020, 2020.
estimation of model performance. [13] C. R. Purcell, A. J. Walsh, A. P. Colefax, and P. Butcher, “Assessing the
ability of deep learning techniques to perform real-time identifica-
SI.No Model Time taken Accuracy tion of shark species in live streaming video from drones,” Frontiers
in Marine Science, vol. 9, p. 981897, 2022.
1 VGG-16 + ResNet50 1.7 hours 95.83%
2 AlexNet 1.2 hours 87.02
3 GoogLeNet 1.5 hours 90.8

IX. C ONCLUSION
A.
In this paper, we presented preprocessing techniques,
and data augmentation techniques useful to make better
training model techniques to classify underwater images.
We compared them on critical parameters and highlighted
their similarities and differences. We reviewed the works
related to datasets and training and those related to the
design and optimization of CNNs. We close this paper
with a brief mention of future research challenges.
Deep learning models require a large amount of data
to achieve high accuracy. While data augmentation over-
comes the scarcity of training data also reduces the
robustness of the network
Ensembling the VGG16 and ResNet50 gives a better train-
ing model for the classification of underwater images for
better classification
Additionally, in future work, we integrate training using
geometric (dimensions, area, etc.) and environmental data
(temperature, salinity, season, time, etc.) into the classifi-
cation system by concatenating with the extracted feature
maps from CONV layers. [1] [2] [3] [4] [5] [6] [7] [8] [9]
[10] [11] [12] [13]

R EFERENCES
[1] S. Mittal, S. Srivastava, and J. P. Jayanth, “A survey of deep learning
techniques for underwater image classification,” IEEE Transactions
on Neural Networks and Learning Systems, 2022.
[2] S. L. Bangare, A. Dubal, P. S. Bangare, and S. Patil, “Reviewing otsu’s
method for image thresholding,” International Journal of Applied
Engineering Research, vol. 10, no. 9, pp. 21777–21783, 2015.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
2014.
[4] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into deep
learning. Cambridge University Press, 2023.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 770–778, 2016.
[6] Y. Massoud, Sensor fusion for 3D object detection for autonomous
vehicles. PhD thesis, Université d’Ottawa/University of Ottawa,
2021.
[7] T. Nguyen, T. Le, H. Vu, and D. Phung, “Dual discriminator gener-
ative adversarial nets,” Advances in neural information processing
systems, vol. 30, 2017.
[8] L. Vincent, “Morphological grayscale reconstruction in image anal-
ysis: applications and efficient algorithms,” IEEE transactions on
image processing, vol. 2, no. 2, pp. 176–201, 1993.
[9] R. C. Gonzalez, Digital image processing. Pearson education india,
2009.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
Advances in neural information processing systems, vol. 27, 2014.
[11] D. Learning, “Ian goodfellow, yoshua bengio, aaron courville,” The
reference book for deep learning models, vol. 1, 2016.

You might also like