IoT Based Smart Water Quality Monitoring System
IoT Based Smart Water Quality Monitoring System
Samia Islam
ID : 2015-1-60-102
Monira Mukta
ID : 2015-1-60-116
Md. Emon Miea
ID : 2015-1-60-085
Dhaka-1212, Bangladesh
December, 2018
Declaration
We, hereby, declare that the work presented in this thesis is the outcome of the investigation performed by me
under the supervision of Surajit Das Barman, Senior Lecturer, Department of Computer Science and engineering,
East West University. We also declare that no part of this thesis/project has been or is being submitted elsewhere
for the award of any degree or diploma.
Countersigned Signature
........................ ........................
Signature
........................
(Monira Mukta)
(ID : 2015-1-60-116)
Signature
........................
1
Letter of Acceptance
This thesis report entitled “Smart Water Quality Monitoring [SWQM] System” submitted by Samia Islam (ID:
2015-1-60-102),Monira Mukta (ID: 2015-1-60-116) and Md. Emon Miea (ID: 2015-1-60-085) to the
Department of Computer Science and Engineering, East West University is accepted by the department in partial
fulfillment of requirements for the Award of the Degree of Bachelor of Science and Engineering on December,
2018.
Supervisor
............................
Surajit Das Barman
Senior Lecturer,
DepartmentofComputerScienceandEngineering,EastWestUniversity.
Chairperson
..................
(Dr. Ahmed Wasif Reza)
DepartmentofComputerScienceandEngineering,EastWestUniversity
2
Table of contents
Declaration .................................................................................................................................. 1
Table of contents ......................................................................................................................... 3
List of Figures ............................................................................................................................. 5
List of Tables ............................................................................................................................... 6
Abstract ........................................................................................................................................... 7
Chapter 1 ......................................................................................................................................... 8
Introduction ................................................................................................................................. 8
1.1 Overview and Motivation .................................................................................................. 8
1.2 Thesis Objective................................................................................................................. 9
Chapter 2 ....................................................................................................................................... 10
Literature Review ...................................................................................................................... 10
2.1 Smart Water Quality Monitoring [SWQM] System ........................................................ 10
2.2Existing Work on WQM ................................................................................................... 10
Chapter 3 ....................................................................................................................................... 12
Research Methodology .............................................................................................................. 12
3.1 Overview of the System ................................................................................................... 12
3.2 Circuit Diagram and Description of Working Principle .................................................. 13
3.3 Individual connection between different sensors and Arduino........................................ 14
3.4 Flowchart of Arduino Programming ................................................................................ 16
3.5 List of Equipment ............................................................................................................ 17
3.6 Picture of Hardware setup ................................................................................................ 17
3.7 Algorithm for Data Analysis ............................................................................................ 18
3.7.1 SVM [Support Vector Machine] Binary Classification ................................................ 18
3.7.2 Logistic Regression Binary Classification .................................................................... 19
3.7.3 Fast Forest Binary Classification .................................................................................. 19
3.7.4 AveragedPerceptron Binary Classification ................................................................... 21
3.8Flow Chart ........................................................................................................................ 21
3
3.9 Developed Apps ............................................................................................................... 22
3.10 Full Experimental Setup ................................................................................................ 24
Chapter 4 ....................................................................................................................................... 25
Analysis and Results ................................................................................................................. 25
4.1Instrumental Analysis ....................................................................................................... 25
4.2 Analysis............................................................................................................................ 25
4.3 Results .............................................................................................................................. 31
4.3.1Result from Physical Analysis ....................................................................................... 31
4.3.2 Result Analysis with ML .............................................................................................. 32
Chapter 5 ....................................................................................................................................... 35
Conclusion ................................................................................................................................. 35
Future Work ........................................................................................................................... 35
Reference ................................................................................................................................... 36
4
List of Figures
5
List of Tables
6
Abstract
Abstract- The importance to monitor the water quality level is important due to the significant
impact on human health and ecosystem. The aims of this project to develop an IoT based smart
water quality monitoring (SWQM) system that aids in continuous measurements of water
conditions based on four physical parameters i.e., temperature, pH, turbidity and conductivity
properties. Four sensors are connected to arduino-uno to detect those corresponding water
parameters. Extracted data from the sensors are transmitted to a developed desktop application.
Based on the measured result, the proposed SWQM system can successfully analyze the water
parameters using machine learning approaches to classify whether the test water sample is suitable
for human consumption or not.
7
Chapter 1
Introduction
Water is the most essential element for sustaining any living being. Every sphere of our life we
use water such as for drinking, washing, industrial purpose, agriculture, food processing and so
on. No other substance can take the place of water. As population growth is increasing day by day,
the chances of polluting this element is also increasing. But it is a matter of regret that this valuable
element is not unlimited. So, it is needed to use this valuable resource in proper way.
Moreover, the quality of water is a big issue in modern science. Water is being polluted in several
ways. As a consequences various fatal issues are arising such as skin diseases, global warming,
scarcity of pure drinking water for living beings etc. Hence, this is a major issue to check the
quality of water. In the developing world, 90% of all wastewater still goes untreated into local
rivers and streams. Some countries, with roughly a third of the world's population, also suffer from
medium or high-water stress, and 17 of these extracts more water annually than is recharged
through their natural water cycles. The strain not only affects surface freshwater bodies like rivers
and lakes, but it also degrades groundwater resources.
On the other hand, manual checking of water quality is expensive and time consuming. To reduce
these constraints, we have worked on water quality analyzing. Though several works have been
8
done on this but all of them were not satisfactory and reliable. To get accurate result and reliable
output we introduced a new approach which is based on machine learning algorithm.
There are many parameters in water to check the quality but major factors are:
PH level
Turbidity
Carbon dioxide level
Oxygen level
Conductivity
Temperature
Arsenic
Escherichia coli
But for our limitation of time and resource we worked with a few parameters. We worked with
temperature [2][3], turbidity [4], PH and conductivity parameters. These parameters are sensed
through various sensors.
9
Chapter 2
Literature Review
This chapter provides a discussion on the water quality monitoring system, brief description of
various water quality monitoring using IoT techniques. In addition, this chapter also discusses
about existing work on water quality monitoring.
10
water quality. To complete the experiment, the relationship among temperature, pH and dissolved
oxygen is analyzed and the experiment summarizes that the water temperature is inversely
proportional to pH and dissolved oxygen level. The article in [7] introduces a smart water quality
monitoring system for Fizi using IoT and remote sensing used for monitoring, collecting and
analysis data from remote location. Another smart solution for water quality monitoring technique
is described in [8].
In our thesis, we develop a smart water quality monitoring system. Here we collect data from
different sources by using various sensors. The values of different parameters are shown in the
computer and these data are saved in excel. We design a desktop application for predicting the
drinkable or not drinkable water after the data are read automatically. Then we get a result whether
water is drinkable or not. The next chapter will be described about the whole procedure how we
develop our system.
11
Chapter 3
Research Methodology
3.1 Overview of the System
The SWQM system will read data from water sample by sensors through the microcontroller and
analyze the data to predict its quality by ML algorithm.
12
3.2 Circuit Diagram and Description of Working Principle
The circuit is built on breadboard with Arduino UNO and four sensors. They are Digital
Temperature sensor, Analog EC (electric conductivity) sensor, Analog turbidity sensor and Analog
PH sensor. Each sensor needs 5V electricity to operate and a ground node. Therefore, we made a
common node of 5V pin and GND pin in the breadboard.
At the common voltage node,all of the power requiring nodes of the sensors are connected and the
grounds of the sensors are connected to the common ground. Each sensor has an output pin known
as data pin. The data pin of EC sensor are connected to the analog pin A0, the data pin of Turbidity
sensor is connected to the analog pin A3 and the data pin of PH sensor is connected to the analog
pin A5. The data pin of Temperature sensor is connected to the digital pin 5.
13
3.3 Individual connection between different sensors and Arduino
a) Temperature sensor Connected to Arduino
14
c) EC sensor connected to Arduino
15
3.4 Flowchart of Arduino Programming
Start
LOOP
P
This Block Diagram shows the working procedure of the microcontroller. In Arduino
programming, we used several libraries to read data likeDFRobot_EC, EEPROM,
OneWireandDallasTemperature. At first, it will initialize the pin configuration. It will be making
16
a data string combined of four parameters and print it to the serial port by the interval of
800miliseconds.
i) Arduino UNO
ii) Turbidity Sensor (SEN0189)
iii) Electric Conductivity Meter (DFR0300)
iv) Waterproof DS18B20 Digital Temperature Sensor (DFR0198)
v) pH Senor (SEN0161)
vi) Cable connection
vii) Breadboard etc.
17
3.7 Algorithm for Data Analysis
An algorithm is a step by step method of solving a problem. It is commonly used for data
processing, calculation and other related computer and mathematical operations.The best chosen
algorithm makes sure that system will do the given task at best possible manner. We used different
Machine Learning (ML) algorithms for predicting accurate level of water particles.
In our research, we have applied four algorithms on our collected data set, which are:
A brief explanation of those four algorithms is given below to get familiar that how the algorithms
are working out.
18
3.7.2 Logistic Regression Binary Classification
Logistic regression is a binary classification algorithm. It predicts binary answer based on different
feature. Except binary it works with different values.
Logistic regression can be binomial, ordinal or multinomial. Binomial or binary logistic regression
deals with situations in which the observed outcome for a dependent variable can have only two
possible types “0” and “1”.
Multinomial logistic deals with situations where the outcome can have three or more possible
ordinal deals with dependent variables that are ordered.
One may begin to understand by first considering a logistic model with given parameters, then
seeing how coefficients can be estimated from data. Consider a model with two predictors and
these may be continuous variables or indicator functions for binary variables (taking value 0 or 1).
Then the general form of the log-odds is:
𝐿 = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2 (2)
𝑂 = 𝑏 𝛽0+𝛽1𝑥1+𝛽2𝑥2 (3)
19
Algorithm: Random Forest
Precondition: A training set S := (x1, y1), . . . ,(xn, yn), features F, and number of trees in
forest B.
1 function RandomForest(S,F)
2 H ←∅
3 for i∈ 1, . . . , B do
4 S(i) ←A bootstrap sample from S
5 hi ←RandomizedTreeLearn(S(i), F)
6 H← H ∪ {hi}
7end for
8return H
9 end function
10 function RandomizedTreeLearn(S,F)
11 At each node:
12 f←very small subset of F
13 Split on best feature in f
14 return The learned tree
15 end function
20
3.7.4 AveragedPerceptron Binary Classification
Perceptron is a classification algorithm that makes its predictions based on linear function for an
instance with feature values. The prediction is given by the sign of sigma [0, D-1] (w_i*f_i), where
w_0, w_1…w_D-1 are the weights computed by the algorithm.
It is an online algorithm. It processes the instances in the training set one at a time. The weights
are initialized to be 0, or some random values. For example, in the training set, the value of sigma
[0, D-1] (w_i*f_i) is computed. If this value has the same sign as the label of the current example,
the weight remains the same. If they have opposite signs, the weights vector is updated by either
subtracting or adding. In a generalization of this algorithm, the weights are updated by adding the
feature vector multiplied by the learning rate.The weight vectors are stored together with a weight
that counts the number of iterations it survived in averaged perceptron.[13]
For better understanding, the procedure of implementing these algorithms with ML.Net
Framework has given below:
Create pipeline
Add dataset to pipeline
Assign numeric values to text
Column create (temperature, conductivity, pH, turbidity)
Add learner to the pipeline (Best Algorithm)
Add predicted label column
Train the model
Making Prediction using the model
3.8Flow Chart
The flow chart given below describes the functionality of the Desktop Application clearly. In the
flow chart, firstly we have to select the port. The sensors measure the temperature, pH level,
Electric Conductivity and turbidity of water. Then the read data are compared with WHO level
and the data can be predicted with ML. Lastly, it gives the result whether the water are drinkable
or not.When we click “Read Data” button, the program will start reading selected port
continuously the data-reading period. The time duration was set 5 second to extract latest stable
from the micro-controller
21
Fig. 9:Flowchart of proposed app
3.9 Developed Apps
An app performs a group of co-ordination of functions, tasks, activities for the benefit of users. It
reduces user’s time, effort and makes the task easy. In our research, a desktop app was developed
named “Sprinkle-Water Quality Checker” on .NET platform to calculate and analyze the data.
Data which have been collected through various sensors are inserted into this app for further work.
22
Fig. 10:Initial state of developed app
This app takes input and predicts a possible output result by applying ML at the backend. ML
helps to find out the best prediction.The “Sprinkle Water Quality Checker” will take all of the
input data and then it is ready apply algorithm for prediction. There is also an option to compare
the immediate read data with WHO Standard and check whether it is drinkable or not.
23
3.10 Full Experimental Setup
24
Chapter 4
Analysis and Results
4.1Instrumental Analysis
The parameters of water are measured through sensors listed in Table 1 below.
4.2 Analysis
We have divided the analysis part into two section. They are:
Physical Analysis
Machine Learning Algorithm Analysis
25
4.2.1 Physical Analysis
We collected data from different source like tap water, river water, some drinkable sources water.
We collected data by using different procedure and measured water particle with different sensor.
We read data such as temperature, pH, electric conductivity and turbidity of water. However, we
collected approximately 60 different data.
Temperature
50
45
40
35
30
25
20
15
10
5
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
The graphical representation of the measured temperatureof water samplesis shown in Fig. 13.
Most of the water were in normal temperature and some of data were in hot and cold temperature.
All of the data are represented in a graph. From the graph, we can easily observe the collected data.
Here, mostly data are in 24- 27 ranges. Some biased data move out from the range.
26
Conductivity
25
20
15
10
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
We drew this graph by using the data that we collected from different sources with the help of
electric conductivity sensor. We find that mostly data are in range from 0 to 1. Some of data cross
this boundary.
pH
12
10
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
27
We know that neutral range of pH is 7. But the data we collected mostly are more than 9. By
observing the data, we can come to a decision that these are based water type. According to the
WHO standard, the safety rage of pH is from 6.5 to 8.5.When we mixed some salt into the water,
then we got the value of pH around 7.
Turbidity
3500
3000
2500
2000
1500
1000
500
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Turbidity is another important factor for water. When the tap or filtered water was clean then we
got turbidity value 0.0. When the water was not clean and fresh looking, then we got different
turbidity value up to 3500.
We have chosen “Fast Forest” algorithm for our Desktop App. Because it gives the best output
among those four algorithms, which has already described.
28
This algorithm gives the best accuracy. The rate of its accuracy is 100%, which is fully satisfactory.
No other algorithm can give more accuracy than 100%. So, we have selected this algorithm for
our apps “Sprinkle Water Quality Checker”.
The output result of “Fast Forest Binary Classifier Algorithm” is given below:
29
The output of “Averaged Perceptron Binary Classifier” algorithm was also not satisfactory. It
gives 80% accuracy. The output of “Averaged Perceptron Binary Classifier” algorithm is given
below:
Another algorithm is “Logistic Regression Binary Classifier”. The output of “Logistic Regression
Binary Classifier” algorithm is given below:
30
4.3 Results
Result part is divided into two categories:
Table 2: The source of the water are mainly from three categories. Here WHO GV = World
Health Organization Guide Value.
% not
WHO GV
Source N Mean ± St. Dev Min Median Max within GV
All 60 25.98±3.95 16 25.09 44.63 °C -
Tap Water 23 25.41±0.77 24.11 25.11 27.2 -
Temperature
Biased Water 16 27.4±7.51 16 24.6 44.63 °C -
Drinking Water 21 25.52±0.92 23 25.76 27 -
0.3-
All 60 2.31±4.95 4.95 0.44 20.44 0.8mS/cm 71.67
Conductivity Tap Water 23 1.02±1.7 0.12 0.34 6.45 69.57
0.3-
Biased Water 16 4.6±7.72 0.12 0.57 20.44 81.25
0.8mS/cm
Drinking Water 21 1.97±4.26 0.12 0.48 19.89 66.67
All 60 9.03±0.58 7.3 9.15 10.26 6.5-8.5 80
Tap Water 23 9.1±0.6 7.3 9.23 9.88 82.61
PH
Biased Water 16 8.87±0.65 7.66 8.9 10.26 6.5-8.5 68.75
Drinking Water 21 9.07±0.51 7.99 9.12 9.89 85.71
All 60 550.73±921.21 0 0 3000 <5 NTU 31.67
Tap Water 23 236.22±628.55 0 0 2023 13.04
Turbidity
Biased Water 16 1080.06±1142.18 0 989.58 3000 <5 NTU 56.25
31
Table 2, represents the minimum, maximum, median range value of all data. Moreover, it shows
the source of data where N is the number of sample data. If we analyze temperature part then it is
noticeable that there is no “WHO GV” (World Health Organization Guide Value). [8] Since
temperature can vary place to place, so they did not give any guide value.
For conductivity “WHO GV”, referred value is 0.3-0.8mS/cm. However, in physical analysis,
71.67% over all data is failure where failure for “Tap water” is 69.57%, “Biased water” is 81.25%
and “Drinking water” is 66.67%.
The failure percentage of PH parameter is 80% over 60 sample data. Hence, Turbidity failure is
31.67%.
These four algorithms give prediction. However, their accuracy is different. Efficient algorithm is
an algorithm, which predicts the testing data most accurately as compared to other models and
hence, can be deployed successfully.
Therefore, in our thesis we applied these four algorithms and find out accuracy rate for same data
set. The accuracy comparison is shown below:
32
It can be understood easily from the graph below:
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Fast Forest Linear SVM Logistic Regression Averaged
Perceptron
Here, F1score is a machine learning term. It is used to measure a test’s accuracy. F1 Score is the
Harmonic Mean between precision and recall. The range for F1 Score is [0, 1]. It tells you how
precise classifier is (how many instances it classifies correctly), as well as how robust it is (it does
not miss a significant number of instances).
High precision but lower recall, gives an extremely accurate, but it then misses a large number of
instances that are difficult to classify. The greater the F1 Score, the better is the performance of
our model.
AUC is “Area under curved”. AUC is one of the most widely used metrics for evaluation. It is
used for binary classification problem. AUC of a classifier is equal to the probability that the
classifier will rank a randomly chosen positive example higher than a randomly chosen negative
example.
From the above comparison (Figure-22) we can see that the highest accuracy rate belongs to Fast
Forest algorithm. On the other hand, SVM, Logistic regression and Averaged Perceptron algorithm
give accuracy 80.00%, which is poorer than Fast forest algorithm.
33
Hence, using the best algorithm the apps predicts the result. The result shows in following way:
34
Chapter 5
Conclusion
The ultimate goal of this project work is to observe the quality of water samples by designing a
smart water quality monitoring (SWQM) deviceimplemented in IoT platform that can detect four
specific physical water parameters: temperatures, pH, turbidity and conductivity and analyze the
extracted data of these parameters using machine learning approaches. As our experiment is
limited to examine four water parameters, total 60 datasets are used to predict the accuracy.
Compare to the Linear SVM, Logistic Regression and Averaged Perceptron binary classifier, the
Fast Forest algorithm provides 100% accuracy.
Future Work
The proposed work provides good accuracy for small datasets of four individual parameters of
water samples. Further work can be done on large number of dataset considering the level of
chemical parameters present in water sample and improve the system’s effectiveness.
35
Reference
[4]Turbidity. Available-https://www.lenntech.com/turbidity.htm#ixzz3R3yPreK7
[5] Automated sensor network for monitoring and detection of impurity in drinking water system. Available-
http://www.ijraset.com/fileserve.php?FID=1615
[6] Pranata, Alif Akbar, Jae Min Lee, and Dong Seong Kim. "Towards an IoT-based water quality monitoring
system with brokerless pub/sub architecture." Local and Metropolitan Area Networks (LANMAN), 2017 IEEE
International Symposium on. IEEE, 2017.
[7] Prasad, A. N., et al. "Smart water quality monitoring system." Computer Science and Engineering (APWC
on CSE), 2015 2nd Asia-Pacific World Congress on. IEEE, 2015.
[8] Geetha, S., and S. Gouthami. "Internet of things enabled real time water quality monitoring system." Smart
Water 2.1 (2016): 1.
Available-https://en.wikipedia.org/wiki/Support_vector_machine
[12] RandomForest.
Available-http://pages.cs.wisc.edu/~matthewb/pages/notes/pdf/ensembles/RandomForests.pdf
Available-https://docs.microsoft.com/en-
us/dotnet/api/microsoft.ml.legacy.trainers.averagedperceptronbinaryclassifier?view=ml-dotnet
36
37