A Prediction of Water Quality Analysis Using Machine Learning
A Prediction of Water Quality Analysis Using Machine Learning
Keywords—Decision tree, Machine learning, J48, LMT, Five different classifiers were applied to assess the
Random Forest, and Decision Stump accuracy of the decision tree model in this architecture Fig.1.
A decision tree classifier based on J48, LMT, Random Forest,
I. INTRODUCTION Hoeffding Tree, and Decision Stump was used in this study.
An algorithm for predicting water quality based on Water quality predictions were made using these classifiers to
machine learning. Health risks and significant economic compare their accuracy.
losses can result from poor water quality. Water quality must a. J48 classifier: A popular implementation of the C4.5
therefore be analyzed from both an environmental and algorithm is the J48 classifier, which uses the decision
economic perspective [6-7]. WEKA is an excellent tool for tree algorithm in order to classify data. Top-down
analyzing data for this purpose. WEKA is a JAVA-based data greedy decision trees are built by the J48 algorithm
mining software developed by the University of Waikato in from training data. A subset of the data is split
New Zealand. DM and ML communities have widely adopted recursively at each node of the tree according to the
it, and it represents a significant milestone. To ensure the most significant attribute. In this case, the tree will
safety of human consumption and environmental resources, reach a predefined depth when the subsets are
water quality analysis is a vital task. It is often not feasible to homogeneous. Water quality analysis is one of the
monitor water quality in real time using traditional methods of many applications of the J48 algorithm. In addition to
analysis, which involve time-consuming and costly laboratory its ability to handle categorical and numerical data, the
testing. Consequently, machine learning techniques have J48 algorithm is easily interpretable, making its
gained popularity in recent years for analyzing water quality application in decision-making straightforward.
efficiently and accurately. Water quality analysis is carried out
using a decision tree algorithm based on the decision tree b. LMT Classifier: A logistic model tree classifier
algorithm [9-10]. algorithm is a combination of a decision tree and a
logistic regression algorithm. LMT algorithm
Different water quality parameters are used in the model generates prediction models using training data and fits
to categorize water quality. Due to its interpretability and the logistic regression models at each node of the tree. By
ability to handle categorical and numerical data, the decision pruning the model, the algorithm avoids overfitting,
tree algorithm was chosen. Water resource managers can use resulting in a smaller and more interpretable tree. In
the proposed model to monitor water quality in real time and classification problems with many attributes or noisy
make better decisions. data, the LMT algorithm is particularly useful [8]-10.
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.
In addition to bioinformatics, finance, and Methods, and Support Vector Machines[2]. Accordingly,
environmental science, it has been applied to a variety Zhang et al. (2017) developed a statistical model and double-
of domains. The LMT algorithm can aid in decision- movement window algorithm for detecting anomalies in water
making due to its high accuracy and interpretability. quality data[3]. As compared to other algorithms such as AD
and ADAM, the algorithm showed better anomaly detection
c. Random forest classifier: Multi-decision tree ensemble performance on pH values. Several references have been
classifiers reduce accuracy and overfitting by using published on the topic of water quality analysis classification
Random Forest Classifier. A subset of features and models using decision trees, including:
training data is randomly selected at each node of the
forest to build a decision tree. By combining the TABLE I. CLASSIFICATION MODELS FOR WATER QUALITY ANALYSIS
predictions of all the trees, a final decision is made by USNING DESCISION TREE .
combining the results of several decision trees. A
Algorithm Authors Application Result
feature importance ranking is also included in the
Classification- based Muharemi et al., Changes in the Adequate solution
algorithm, which indicates the importance of each Neural 2019 quality of
feature in classification. Image recognition, Network using drinking water
bioinformatics, and finance are just a few of the tasks Logistic Regression
that the Random Forest algorithm can be used for. Artificial Neural Haghiabi et Water quality The Tansig
When compared to other classification algorithms, the Network (ANN) al., of the Tireh And RBF functions
Random Forest algorithm is less sensitive to outliers (2018) River have shown the
best performance
and is capable of handling high-dimensional and noisy as transfer and core
data. Additionally, it provides insight into the most functions.
important features for classification tasks with Support Vector Haghiabi et.al., Water quality The Tansig
reasonable accuracy. Machine (SVM) (2018) of the Tireh And RBF functions
River have shown the
d. Hoeffding Tree Classifier : An algorithm designed for best performance
handling continuous streaming data is this decision as transfer and core
tree classifier. As updated data arrives, branches are functions.
added to the decision tree incrementally. In the Artificial Neural Chou et al., Water quality It has been found to
Networks (ANN) (2018) in the reservoir be more accurate
algorithm, split nodes in a tree are determined by using than other
the Hoeffding bound. With a small amount of data, models.
Hoeffding bounds allow the algorithm to make Support Vector Mohammad pour Water quality Competitive with
decisions with high confidence. Internet traffic Machines (SVM) et al. (2015) neural networks
analysis, sensor networks, and financial applications
are examples of applications where the Hoeffding Tree Applied data mining techniques can be used to predict
algorithm is particularly useful. With limited memory water quality, as examined in this paper. In previous studies,
and computational resources, the Hoeffding Tree in the above table 1 a wide range of data mining techniques
algorithm is capable of handling large amounts of data. have been used in the environmental domain, including the
Adaptability to changing data distributions is another Nearest Neighbor Algorithm (KNN). The Artificial Neural
advantage of the system. Network (ANN) was proposed by Haghiabi et al. (2018) [2],
the Artificial Neural Network (ANN) was proposed by Chou
e. Decision Stump Classifier : There are two leaf nodes et al. (2018) [3], and the Support Vector Machine was
and one root node in the decision stump classifier proposed by Mohammadpour et al. (2015) [4-5].Two primary
algorithm. Using the majority class in the objectives are being pursued in this study:
corresponding subset of data, the algorithm divides the
data based on a single attribute. When a single 1. By applying common data mining and classification
significant feature exists in the data or if a quick techniques, a predictive model can bedeveloped to
decision is required, the decision stump algorithm can predict the type of data. Qualitative characteristics of
be particularly useful. Since only one feature is river water.
considered in the classification task, it is less accurate
than other more complex decision tree algorithms. 2. Identify the most appropriate predictive model for the
Text classification and image recognition are two present study based on different evaluationmetrics.
applications of the decision stump algorithm. When III. PROPOSED WORK
computational resources are limited, the decision
stump algorithm is useful because of its simplicity and Data mining algorithms were executed using WEKA, a
speed. data mining tool developed by the University of Waikato in
New Zealand using the JAVA programming language. There
II. RELATED WORK is widespread recognition of WEKA in the machine learning
Various machine learning techniques have been explored and data mining communities. For classification, decision
in previous studies to address water quality concerns. To deal trees include ID3, AD trees, REP trees, J48 trees, FT trees,
with changes in drinking water quality, Muharemi et al. (2019) LAD trees, decision stumps, LMT trees, random forests, and
proposed the use of the K-Nearest Neighbor algorithm and random trees.
Neural Network Classification based on Logistic Regression The following steps might be included in a proposed
[1]. Similarly, Haghiabi et al. (2018) assessed the machine learning system for predicting water quality Fig.2:
effectiveness of artificial intelligence techniques in predicting
water quality components in Tireh River, Iran, such as
Artificial Neural Networks, Group Data Management
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.
tree algorithms, commonly referred to as CART, are used to
build trees. Decision trees are based on questions that are
asked and then subtrees are created accordingly (Yes/No).
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.
C. Pseudocode for a decision tree used to classify water passed into the classify_example function, and the predicted
quality: label is returned for the sample.
IV. RESULTS
# Step 1: Prepare the dataset
# Load the dataset from a CSV file or a database
# Preprocess of the dataset by handling missing values, encoding
categorical variables, and normalizing numerical features
dataset = preprocess_dataset(raw_dataset)
# Step 2: Split the dataset in to the training set and testing sets
train_set, test_set = split_dataset(dataset, test_ratio=0.3)
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.
By measuring the pH of water, you can determine whether
it is acidic or alkaline. From the Fig. 7 on a pH scale of 0 to
14, seven is neutral, seven is acidic, seven is alkaline, and
seven is basic. “pH” measures the concentration of hydrogen
ions in a solution.
In addition to hardness, water can also be classified by its
quality by its hardness. Mineral-rich water, such as hard water
or hard water with high levels of calcium and magnesium, has
a high content of dissolved minerals. Water quality
classification relies heavily on solids. Physical, chemical,
and biological characteristics of water can be affected by
solids, and overall quality can be affected by solids.
In Table3 shows the data set values of potable and non development. Before integrating software units, all
potable water the above values are taken for water sample decision branches and internal code flow are validated
values from river water and lake water. via structural testing. An application, system
configuration , or business process is tested using unit
V. TESTING tests. By doing so, they ensure that every unique path
Software testing is an important part of the software of a business process follows the specifications as
development life cycle because it aids in identifying bugs or described in the documentation. In addition, unit tests
defects before the software is released. Testing software is require the user to know the construction of the
primarily concerned with ensuring that it meets all the software, so they are considered invasive since the
specifications and behaves as expected. To test software, test inputs and outputs are well defined.
cases must be created and executed, results must be compared b. Integration Testing: Testing integrated software
with expected results, defects must be identified, and they components is a crucial step in software development
must be reported to the development team. that determines whether they work together properly
a. Unit testing: Testing unit logic and ensuring that inputs as one program. Despite passing unit tests
produce valid outputs is an essential part of software successfully, integration tests ensure that the
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.
components are consistent and correct when VIII. ACKNOWLEDGMENT
combined. Software integration testing is primarily We thank CMR Technical Campus for supporting this
concerned with identifying any problems or issues paper titled “A Prediction of Water Quality Analysis Using
resulting from the integration of software components. Machine Learning”, which provided good facilities and
Integration testing is therefore crucial to ensuring that support to accomplish our work. Sincerely thank our
software systems function correctly and are reliable. Chairman, Director, Deans, Head Of the Department,
c. Functional Testing: The importance of functional testing Department Of Computer Science and Engineering, Guide
in software testing cannot be overstated. Systematic and Teaching and Non- Teaching faculty members for giving
testing provides proof that the tested features are valuable suggestions and guidance in every aspect of our work
available as specified in business and technical
requirements, user manuals, and system REFERENCES
documentation. It involves completing system [1]. Azamathulla, H. M. 2013 2 – A Review on Application of Soft
procedures and interfacing with other systems and Computing Methods in Water Resources Engineering A2 – Yang, Xin-
She. In: Metaheuristics in Water, Geotechnical and Transport
procedures, as well as verifying inputs, functions, and Engineering (Gandomi, A. H., Talatahari, S. & Alavi, A. H., eds).
outputs. The key requirements, functions, and special Elsevier, Oxford, pp. 27–41.
test cases must be organized and prepared before [2]. Karanfil, O., & Konar, A. (2018). Development of a decision tree-based
performing functional testing. Software functional classification model for surface water quality. Environmental
testing ensures that the system meets the specifications Monitoring and Assessment, 190(3), 171.
and performs its intended functions [3]. Saravi, S. S. S., & Malekian, A. (2019). Application of decision tree
algorithm for prediction of water quality index (WQI). Journal of Water
VI. CONCLUSION and Land Development, 43(1), 18-27.
[4]. Esakkirajan, M., & Thanushkodi, K. (2017). Decision tree based
The analysis of water quality has been based on a decision approach for water quality classification using WQI parameters.
tree-based classification model. An approach based on various Journal of King Saud University-Engineering Sciences, 29(4), 318-
input features can predict the quality of water. In order to 324.
provide clean and safe water for human consumption and [5]. Ferreira, R. B., Lopes, J. A., & Ribeiro, R. (2018). Decision trees for
tensure the sustainability of the environment, this approach monitoring and management of water quality in small and medium-size
can aid in decision-making and improve water management water supply systems. Water Science and Technology: Water Supply,
18(6), 2006-2013.
practices. A variety of decision tree algorithms are used for
[6]. Jha, P., & Kumar, R. (2016). A novel decision tree approach for water
constructing classification models for water quality analysis, quality prediction in rivers. Journal of Hydroinformatics, 18(1), 116-
including Random Forest, LMT, Hoeffding Tree, J48 and 131.
Decision Stump. As part of the proposed system, data is [7]. Wang, Y., Zhan, C., Li, M., Chen, G., & Liu, Y. (2019). An improved
collected and preprocessed, training and testing sets are decision tree method for water quality assessment of rivers. Journal of
separated, a decision tree model is built, its performance is Cleaner Production, 210, 1001-1012.
evaluated and optimized, it is deployed to a suitable platform, [8]. Chatterjee, D., & Panchal, V. (2019). Water quality assessment of
and it is continuously monitored and updated. It is important Ganga River basin using decision tree analysis. Environmental Earth
Sciences, 78(19), 578.
to emphasize that accuracy and reliability of a classification
model are dependent on many factors, such as data quality and [9]. Baskaran, R., & Muthusaravanan, S. (2018). An intelligent system for
water quality analysis using decision tree algorithm. Applied Water
quantity, feature selection, algorithm selection, and parameter Science, 8(5), 141.
optimization. By incorporating deep learning and ensemble [10]. Patil, S. S., & Jena, S. K. (2017). Decision tree algorithm based
methods, classification models can be enhanced for water classification of water quality data. Procedia Engineering, 173, 506-
quality analysis. It is possible to improve the quality of life for 513.
millions of people around the world by using classification [11]. Abdellah El Hmaidi, Abdelghani Talhaoui, Imad Manssouri, Hajar
models for water quality analysis. Jaddi, Habiba Ousmana, "Contribution of the pollution index and GIS
in the assessment of the physico-chemical quality of the surface waters
VII. FUTRURE SCOPE of Moulouya River (NE, Morocco)", La Houille Blanche, vol.106,
no.3, pp.45, 2020.
Water quality models should include biological and [12]. Sengorur, B.; Koklu, R.; Ates, A. Water quality assessment using
weather parameters, according to our research. By doing so, artificial intelligence techniques: SOM and ANN—A case study of
the possibility of contamination sources from a wide range of Melen River Turkey. Water Qual. Expo. Health 2015, 7, 469–490.
sources can be taken into consideration. In order to predict [13]. Aradhana, G.; Singh, N.B. Comparison of Artificial Neural Network
water quality accurately, hybrid models with minimal algorithm for water quality prediction of River Ganga. Environ. Res. J.
2014, 8, 55–63
parameters are recommended. An essential information and
strategy for early detection and mitigation of water [14]. Muhammad, S.Y.; Makhtar, M.; Rozaimee, A.; Aziz, A.A.; Jamal,
A.A. Classification model for water quality using machine learning
contamination needs to be developed by stakeholders techniques. Int. J. Softw. Eng. Its Appl. 2015, 9, 45–52.
involved in water quality management. In this way, humans [15]. Haghiabi, A.H.; Nasrolahi, A.H.; Parsaie, A. Water quality prediction
and the environment can both be protected from adverse using machine learning methods. Water Qual. Res. J. 2018, 53, 3–13.
effects.
Authorized licensed use limited to: Mukesh Patel School of Technology & Engineering. Downloaded on August 05,2023 at 07:35:05 UTC from IEEE Xplore. Restrictions apply.