Underwater Image Classification Using ML Techniques
Underwater Image Classification Using ML Techniques
Neha Agarwal
Department of Computer Science Engineering
Indian Institute of Information Technology Raichur,
Raichur 584135, Karnataka, India
[email protected]
Abstract— Underwater image classification is a compelling to conservation efforts. By leveraging ML and DL, we can
field within computer vision and marine science, leveraging efficiently process large volumes of underwater imagery,
the power of Machine Learning (ML) and Deep Learning accelerating scientific discovery and environmental pro-
(DL) to decipher the hidden realms of our oceans. This
underwater image classification is essential for measuring the tection.
water bodies’ health and quality and protecting endangered This paper delves into the world of underwater im-
species. Further, it has applications in oceanography, marine age classification, emphasizing the role of ML and DL
economy and defense, environment protection, underwater in addressing the unique challenges posed by aquatic
exploration, and human-robot collaborative tasks. In this
environments. We explore the state-of-the-art techniques,
paper, we explore the significance and challenges of classifying
underwater images, highlighting the unique characteristics including Convolutional Neural Networks (CNNs) and
of this environment, including low visibility, color distortion, transfer learning, which have propelled the field forward.
and complex marine life diversity. This paper presents various Furthermore, we underscore the significance of robust
types of deep learning and ML techniques for performing and diverse underwater image datasets, as well as the
underwater image classification.
importance of preprocessing methods tailored to this
Index Terms— PCA, GAN, ReLU, SVM-CNN, Transfer Learn- context.
ing, Neural Networks
II. C HALLENGES IN THE AUTOMATED CLASSIFICATION OF
I. INTRODUCTION
UNDERWATER IMAGES :
In today’s digital age, the ubiquity of images and visual
data has revolutionized the way we interact with technol- The automated classification of underwater images is
ogy. In the age of visual data, where images are the lan- faced with numerous challenges. The intensity of light
guage of machines, image classification serves as a bridge is diminished due to energy loss during its propagation,
between the visual world and computational intelligence. resulting in low and variable illumination and visibility,
Image classification, a cornerstone of this technological particularly in deeper waters. Furthermore, ocean currents
revolution, plays a pivotal role in the analysis of visual in contrast to still waters like ponds or swimming pools
data. contribute to changes in luminosity. These alterations,
This paper focuses on the underwater image classifica- along with impurities and suspended solids, give rise to
tion. Underwater image classification aims to decode the intricate noise in underwater images, particularly those
secrets of the deep by enabling the automated analysis of captured in the ocean. Additionally, these images exhibit
images captured in aquatic environments. The underwater low contrast and degraded edges and details. Moreover,
environment introduces factors such as reduced visibil- the non-uniform spectral propagation causes color distor-
ity, color distortion, and the presence of diverse marine tion depending on the distance. In order to address some
species, all of which complicate the process of image of these limitations, the use of sophisticated yet expensive
analysis. cameras is necessary. Here are some key challenges in
In recent years, the fusion of Machine Learning (ML) automated underwater image classification datasets:
and Deep Learning (DL) techniques has emerged as a Limited Labeled Data- Collecting labeled data for un-
transformative force in this field. These methodologies derwater image classification is a labor-intensive task. The
have the capacity to not only identify and classify under- limited availability of diverse and well-labeled datasets
water objects, species, and habitats but also to enhance hinders the training of robust machine learning models.
our understanding of marine ecosystems and contribute Insufficient data can lead to overfitting, where the model
1
performs well on training data but fails to generalize to transfer learning. Transfer learning also reduces compu-
new, unseen data. tational demands during training[2]. [13]For underwater
Species and Object Recognition-Identifying underwater computer vision, the image preprocessing is the most
species and objects is challenging due to the diversity of important procedure for object detection. Because of the
marine life and the potential for occlusions. Some species effects of light scattering and absorption in the water, the
may have similar appearances, and certain objects may be images obtained by the underwater vision system show
partially hidden, making accurate classification difficult. the characteristics of uneven illumination, low contrast,
Dynamic Environmental Condition-Underwater con- and serious noise and more which are mentioned in the
ditions can change rapidly, including variations in water II i.e., Challenges in the underwater image classification.
currents, temperature, and light levels. Models must be
robust to these dynamic environmental factors to ensure
consistent performance across different scenarios.
Sensitivity to Illumination Conditions- Illumination
conditions underwater can vary significantly depending
on the time of day, weather conditions, and water depth.
Models need to be robust to changes in lighting for con-
sistent performance. Addressing these challenges requires
a combination of advanced computer vision techniques,
domain-specific knowledge, and the availability of high-
quality, diverse datasets. Hence we use preprocessing
techniques to solve these challenges and data augmen-
tation is a necessary and crucial strategy in underwater
Fig. 2. Original Image before Pre-processing
image classification to overcome challenges related to lim-
ited labeled data, enhance model robustness, and improve
We now discuss the techniques used to solve these
generalization to diverse underwater conditions.
challenges-
This paper is structured as follows:SectionIII Proposed
Standard Median filtering- it is a commonly used
methodology .Section IV- Image Pre-processing tech-
image preprocessing technique that aims to reduce noise
niques. In Section V- Data Augmentation. In Section VI
and preserve edges in an image. Here in this proposed
-Transfer Learning Models. In SectionVII -Compare the
model, we used a 5x5 kernel and this kernel moved on
results with other models and finally in SectionVIII- The
the image such that the center of the kernel traverses
conclusion. Section IX References
all input image pixels. Features that are smaller than
III. P ROPOSED M ETHODOLOGY half the size of the median filter kernel are completely
removed by the filter. It operates by replacing each pixel’s
intensity with the median value of the pixel intensities in
its neighborhood, which is used to reduce the salt pepper
noise i.e., (0 or 255 values) in the image we use standard
median filtering[1]. When median filtering is applied to an
underwater image it reduces the noise of the image but
increases blurring in the image. To reduce these effects we
can use dehaze, this technique is designed to reduce or
remove the haze and blur effects and it aims to enhance
the clarity and visibility of the image.
Fig. 1. Architcture
2
We also used the Otsu Thresholding,which is an image
segmentation technique used to find an optimal threshold
for binarization. A criterion function is computed for
intensity and that which maximizes this function is
selected as the threshold. Otsu’s thresholding picks the
threshold value to minimize the intra-class variance of
the thresholded black and white pixels. This technique
separates an image into two classes: foreground and
background or object and non-object according to the
threshold values. This method is particularly effective
when there is a bimodal distribution of pixel intensities
in the image. Fig. 5. Resultant image after applying Morphological Erosion on Otsu’s
Thresholded image
Algorithm[2]:
Step 1: Compute histogram for a 2D image.
Step 2: Calculate foreground and background variances fusion, boundary expansion, noise removal, and feature
(measure of spread) for a single threshold. extraction. It is an essential operation in morphologi-
i) Calculate the weight of background pixels and fore- cal processing pipelines, contributing to tasks such as
ground pixels. image enhancement, segmentation, and pattern recog-
ii) Calculate the mean of background pixels and fore- nition. This process helps join the broken parts of the
ground pixels. objects with a particular technique which further helps
iii)Calculate the variance of background pixels and fore- in modelling. Dilation is often followed by erosion in
ground pixels. Step 3: Calculate "within class variance" morphological operations, and the combination of these
operations is known as closing.
3
refines the models, and the generated images are visual-
ized throughout the process.
B. working
We implemented a simple Generative Adversarial Net-
work (GAN) using TensorFlow and Keras for generating
images. GANs consist of a generator model and a dis-
criminator model that are trained simultaneously in a
competitive manner. The generator takes a random noise
vector (latent_dim) as input and produces an image.
It consists of fully connected layers with leaky ReLU
activation functions and batch normalization.The output
Fig. 7. Horizontal and Vertical flip of the preprocessed image
is a generated image with the same dimensions as the
input images The discriminator takes an image as input
and outputs a binary classification (real or fake). It consists
of fully connected layers with leaky ReLU activation func-
tions. The output is a probability indicating the likelihood
that the input image is real, the Adam optimizer for both
the generator and discriminator.
4
we are going to use multiple collaborative models
for improved classification performance on datasets with
class imbalance. This system combines pretrained CNNs
followed by an additional learning phase. To mitigate class
imbalance, they employ the strategies of data standard-
ization, data augmentation, and usage of “class weights”.
Additionally, the authors integrate training using geomet-
ric (dimensions, area, etc.) and environmental data (tem-
perature, salinity, season, time, etc.) into the classification Fig. 12. ResNet Architecture[4]
system by concatenating with the extracted feature maps
from CONV layers
We take VGG and ResNet models for constructing a are added on top of each base model. The outputs of both
collaborative model. The learners are trained individually models are concatenated. A final dense layer with softmax
and are loaded with fixed weights. And in each model last activation is added for classification. The model is created
layers are concatenated and followed by softmax layer. The using the Model class from Keras.
FC layer works as novel function for the model to learn Freezing Base Model Layers: The layers of both VGG16
how efficiently every learner contributes. and ResNet50 base models are set as non-trainable to
. preserve their pre-trained weights.
VGG16 is a convolution neural network (CNN) architec- Model Compilation: The model is compiled with the
ture that’s considered to be one of the best vision model Adam optimizer and categorical crossentropy loss.
architectures to date. Instead of having a large number of Training: The model is trained using the fit method. The
hyper-parameters, VGG16 uses convolution layers with a training data is passed as a list containing both VGG16
3x3 filter and a stride 1 that are in the same padding and ResNet50 features.
and maxpool layer of 2x2 filter of stride 2. It follows Evaluation: The model is evaluated on the validation set,
this arrangement of convolution and max pool layers and predictions are obtained.Accuracy is calculated using
consistently throughout the whole architecture. In the end scikit-learn’s accuracy score.
it has two fully connected layers, followed by a softmax ResNet enables the creation of very deep neural net-
for output. The 16 in VGG16 refers to it has 16 layers that works, which can improve performance on image recog-
have weights. This network is a pretty large network, and nition tasks.ResNet50 provides a novel way to add more
it has about 138 million (approx) parameters. convolutional layers to a CNN, without running into the
vanishing gradient problem, using the concept of shortcut
connections. VGG16 supports the processing for a large-
scale data set with a deep network layers and smaller
filters to produce a better performance. VGG model can
have a considerable number of weight layers due to the
small size of the convolution filters; of course, more
layers mean better performance. However, this isn’t an
unusual trait. Ensembling both models create a better
model offsetting disadvantages of the other to create a
better model
Fig. 11. VGG-16 Architecture[3]
5
The generated images take nearly 12 GB of storage in [12] F. Han, J. Yao, H. Zhu, C. Wang, et al., “Underwater image processing
Running the model through the entire dataset gives a good and object detection based on deep cnn method,” Journal of
Sensors, vol. 2020, 2020.
estimation of model performance. [13] C. R. Purcell, A. J. Walsh, A. P. Colefax, and P. Butcher, “Assessing the
ability of deep learning techniques to perform real-time identifica-
SI.No Model Time taken Accuracy tion of shark species in live streaming video from drones,” Frontiers
in Marine Science, vol. 9, p. 981897, 2022.
1 VGG-16 + ResNet50 1.7 hours 95.83%
2 AlexNet 1.2 hours 87.02
3 GoogLeNet 1.5 hours 90.8
IX. C ONCLUSION
A.
In this paper, we presented preprocessing techniques,
and data augmentation techniques useful to make better
training model techniques to classify underwater images.
We compared them on critical parameters and highlighted
their similarities and differences. We reviewed the works
related to datasets and training and those related to the
design and optimization of CNNs. We close this paper
with a brief mention of future research challenges.
Deep learning models require a large amount of data
to achieve high accuracy. While data augmentation over-
comes the scarcity of training data also reduces the
robustness of the network
Ensembling the VGG16 and ResNet50 gives a better train-
ing model for the classification of underwater images for
better classification
Additionally, in future work, we integrate training using
geometric (dimensions, area, etc.) and environmental data
(temperature, salinity, season, time, etc.) into the classifi-
cation system by concatenating with the extracted feature
maps from CONV layers. [1] [2] [3] [4] [5] [6] [7] [8] [9]
[10] [11] [12] [13]
R EFERENCES
[1] S. Mittal, S. Srivastava, and J. P. Jayanth, “A survey of deep learning
techniques for underwater image classification,” IEEE Transactions
on Neural Networks and Learning Systems, 2022.
[2] S. L. Bangare, A. Dubal, P. S. Bangare, and S. Patil, “Reviewing otsu’s
method for image thresholding,” International Journal of Applied
Engineering Research, vol. 10, no. 9, pp. 21777–21783, 2015.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
2014.
[4] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into deep
learning. Cambridge University Press, 2023.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 770–778, 2016.
[6] Y. Massoud, Sensor fusion for 3D object detection for autonomous
vehicles. PhD thesis, Université d’Ottawa/University of Ottawa,
2021.
[7] T. Nguyen, T. Le, H. Vu, and D. Phung, “Dual discriminator gener-
ative adversarial nets,” Advances in neural information processing
systems, vol. 30, 2017.
[8] L. Vincent, “Morphological grayscale reconstruction in image anal-
ysis: applications and efficient algorithms,” IEEE transactions on
image processing, vol. 2, no. 2, pp. 176–201, 1993.
[9] R. C. Gonzalez, Digital image processing. Pearson education india,
2009.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
Advances in neural information processing systems, vol. 27, 2014.
[11] D. Learning, “Ian goodfellow, yoshua bengio, aaron courville,” The
reference book for deep learning models, vol. 1, 2016.