Iot-Based Air Quality Monitoring System With Machine Learning For Accurate and Real-Time Data Analysis
Iot-Based Air Quality Monitoring System With Machine Learning For Accurate and Real-Time Data Analysis
Hemanth Karnati
Abstract: Air pollution in urban areas has severe consequences for both human
health and the environment, predominantly caused by exhaust emissions from ve-
hicles. To address the issue of air pollution awareness, Air Pollution Monitoring
systems are used to measure the concentration of gases like CO2, smoke, alcohol,
benzene, and NH3 present in the air. However, current mobile applications are
unable to provide users with real-time data specific to their location. In this
paper, we propose the development of a portable air quality detection device that
can be used anywhere. The data collected will be stored and visualized using
the cloud-based web app ThinkSpeak. The device utilizes two sensors, MQ135
and MQ3, to detect harmful gases and measure air quality in parts per million
(PPM). Additionally, machine learning analysis will be employed on the collected
data.
INTRODUCTION
Air quality plays a crucial role in human health and the well-being of the
environment. Unfortunately, air pollution has been on the rise due to various
sources such as vehicle emissions, industrial activities, energy production, and
natural disasters like wildfires. Understanding and assessing the quality of the
air we breathe is of utmost importance. Air Quality Monitoring (AQM) systems,
integrated with sensors and advanced technologies, are utilized to measure
particulate matter and air pollutants like ozone, nitrogen oxides, and sulfur
dioxide. The data collected by these systems helps formulate policies, monitor
pollution reduction efforts, and empower the public to make informed decisions
regarding their health and well-being.
Currently, AQM stations are primarily used for calculating the Air Quality Index
(AQI) and monitoring pollution. However, the infrastructure requirements,
operational complexities, and ongoing expenses associated with these stations
limit the expansion of AQM networks and the availability of air pollution data.
To overcome these limitations, it is imperative to develop low-cost, efficient, and
real-time data-sensing devices. IoT technology provides a promising solution,
2
with recent advancements allowing the use of IoT sensors in various domains,
including smart cities, smart mobiles, smart refrigerators, and smartwatches.
Leveraging IoT, air quality can be monitored remotely using sensors (e.g.,
temperature and pressure sensors, noise sensors), Arduino for data processing,
and cloud platforms for storage. Machine learning algorithms, such as Linear
Regression, Random Forest, XGBoost, and ARIMA models, have also proven
effective in forecasting and predicting air pollutant levels. The availability of
affordable sensors and
data processing tools has enabled the deployment of air quality monitoring
systems on a large scale. However, maintaining the accuracy of these systems
is crucial, as erroneous data can lead to flawed policy decisions and ineffective
mitigation efforts. Regular calibration and validation are essential to ensure the
accuracy of air quality monitoring systems.
This paper presents several contributions:
• Development of a low-cost and user-friendly air pollution monitoring
system.
• Real-time data gathering capabilities within the AQM system.
• Utilization of Blynk for real-time data visualization.
• Adoption of ThingSpeak, an open-source software, for day-to-day pollution
visualization.
The importance of air quality and the necessity of monitoring systems are
discussed, highlighting the limitations of current AQM stations and the need for
cost-effective and efficient solutions using IoT technology. The paper outlines the
use of IoT sensors, Arduino, cloud platforms, and machine learning algorithms for
real-time air quality monitoring. The proposed system includes a low-cost, user-
friendly AQM system capable of gathering real-time data, a website displaying
the Air Quality Index, and the integration of Blynk to access IoT sensor data
and Thingspeak for data visualization.
PROPOSED WORK
The proposed framework for the IoT device with MQ3 and MQ135 sensors,
NodeMCU processor, Arduino IDE, ThingSpeak, and Blynk platforms, consists
of a well-structured hardware and software architecture, as well as an ML analysis
and visualization component. This framework aims to provide a reliable and
efficient system for collecting, analyzing, and visualizing air quality and alcohol
level data in real-time.
Hardware Architecture: The hardware architecture of the IoT device includes the
sensors, NodeMCU processor, WiFi module, power source, and other necessary
components. The sensors, MQ3 and MQ135, are responsible for collecting air
3
quality and alcohol level data respectively. The NodeMCU processor processes
the collected data and controls the behavior of the device.
The WiFi module enables internet connectivity for the device to transfer the
collected data to the ThingSpeak platform. The power source supplies power to
the device.
Software Architecture: The software architecture of the IoT device is composed
of several layers, including a sensor data acquisition layer, data processing layer,
internet connectivity layer, cloud data storage layer, and user interface layer.
The firmware running on the NodeMCU controls the device behavior and sends
collected data to the ThingSpeak platform for storage and analysis.
The Blynk app provides a real-time display of the collected data to the user.
ML Analysis and Visualization Component: After collecting data from the
ThingSpeak platform, the next
step is to analyze and visualize the collected data using ML algorithms.
Overall, this proposed framework for analyzing and visualizing the data collected
using the IoT device with
MQ3 and MQ135 sensors, NodeMCU processor, Arduino IDE, ThingSpeak, and
Blynk platforms provides a structured approach for processing, selecting features,
selecting appropriate ML algorithms, building, and training models, evaluating
models, and visualizing data. This framework can help to provide meaningful
insights for decision-making in various applications, such as environmental
monitoring and public.
4
FLOW DIAGRAM
5
data visualization.
CIRCUIT DESIGN
The sensors selected for the system were the MQ 135 gas sensor for volatile
organic compounds (VOCs) and the MQ 3 gas sensor for alcohol. The sensors
were calibrated by exposing them to known levels of pollutants and adjusting the
readings to match the expected values. The hardware design also consisted of an
ESP8266, Wi-Fi module, MQ 135, MQ 3 gas sensors, an Arduino microcontroller,
and a power source.
ALGORITHM
The below algorithm is followed to collect data from the sensors.
1. Define the Blynk credentials, WiFi credentials, and other variables required
for the code.
6
2. Setup the serial communication and the Blynk connection using Blynk.begin().
3. Set up the timer to run a function to send data to ThingSpeak every second.
4. Connect to the WiFi network using WiFi.begin() and wait until the connection
is established.
5. Define the changeMUX function and set the MUX_A pin as output.
6. In the loop, run the Blynk and timer functions, and read the sensor data
from the analog pin A0.
7. Calculate the sensor value 1 (ppm (parts per million)) value for the sensor
data using a formula.
8. Read the sensor data from A0 for a total of six times, and take the average of
these readings to get the
sensor value 0.
9. Change the MUX_A pin to HIGH, and read the sensor data from A0 for
another six times, and take the average of these readings to get the sensor value
1.
10. Connect to ThingSpeak using the WiFiClient object.
11. Build the request string with the ThingSpeak API key and field values (sen-
sorValue0 and sensorValue1) and send the GET request using the HTTPClient
object.
12. Delay for a second before running the loop again.
13. Define the function to be called by the timer to send data to ThinkSpeak.
14. Change the MUX_A pin to LOW and read the sensor data from A0.
15. Calculate the ppm value for the sensor data using a formula.
16. Change the MUX_A pin to HIGH and read the sensor data from A0 for a
total of six times, and take the average of these readings to get the sensor value
2.
17. Write the sensor value 1 and sensor value 2 to virtual pins V 1 and V 2
respectively using Blynk.virtualWrite().
7
Figure 3: Data representation in Blynk Application
8
The dataset also has missing values, which have been dropped in this
code. Additionally, some of the features in the dataset show skewness,
which can impact the performance of machine learning models. The
code in this example uses three different regression models - Random
Forest, Linear Regression, and Decision Tree - to predict AQI based
on the concentration of pollutants.
The models have been evaluated using various metrics such as mean
absolute error, root mean squared error, root mean squared logarith-
mic error, and R-squared score.
9
Figure 6: Vehicular pollution in Delhi, Mumbai
In Vehicular pollution the level of PM is high followed by PM2_5
from Figure 6
In industrial Pollutants the level of O3 is very high followed by toluene
from Figure 7 in the cities of Mumbai and Delhi. It’s important
to note that ozone is a secondary pollutant, which means it is not
emitted directly into the air, but rather forms because of chemical
reactions between other pollutants (VOCS and Nitrogen oxides).
10
Figure 8: AQI trends in different Cities
As we see in Figure 8 there is no regular pattern of AQI index, there are so
many external factors which affect the AQI of a particular city. It has been
recorded that the highest Aqi was in Ahmedabad in the year 2018. The least
AQI recorded was in Shillong.
From figure 8, The top 9 cities which have high industrial pollutants were noted
in Patna, Delhi, Kolkata, Amritsar, Visakhapatnam, Amaravati, Hyderabad,
Gurugram, Chandigarh.
The top 9 cities that have high vehicular pollutant was noted in Delhi, Patna, Am-
ritsar, Visakhapatnam, Gurugram, Kolkata, Hyderabad, Chandigarh, Amaravati
from Figure 9.
11
Figure 9: Industrial Pollution by city
12
in the regression model.
Based on the results of these metrics we can compare the performance of the
different models and choose the best one for our dataset. The model with the
lowest MAE, RMSE, and RMSLE and the highest R-squared value is the best
model for the dataset.
By adding SMOTE (Synthetic Minority Over-sampling Technique) to the regres-
sion models we can com-
pare the results, and we can conclude about using smote in Regression models.
13
Table 3: Prediction of AQI_Bucket : Without SMOTE
SOLUTIONS
To reduce the emissions of gases CO, NO2, O3, SO2, NH3, PM2_5, PM10, NH3
one can implement various scientific methodologies and can reduce them in many
ways.
a) Scientific Methodologies
To lower the levels of the previously mentioned gases, a variety of scientific
methods can be used.
• Catalytic converters: Used in cars to transform dangerous pollutants like
CO, NO, andNO2 into less dangerous ones. To achieve this, exhaust gases
are passed through a catalyst, which sets off a chemical reaction that
transforms the harmful gases into less dangerous ones.
14
• Scrubbers: Before exhaust gases are discharged into the environment,
pollutants are removed from them using scrubbers in factories and power
plants. By doing this, it may be possible to lessen the emissions of hazardous
gases like SO2, PM2.5, andPM10.
• Flue gas desulfurization: This method removes Sulphur dioxide from
exhaust gases coming from power plants and other industrial operations.
The SO2 is changed into a less dangerous compound that can be safely
disposed of using a chemical procedure to accomplish this.
• Selective catalytic reduction: This procedure helps power plants and
other industrial processes decrease their nitrogen oxide emissions. This
is accomplished by injecting a reductant into the exhaust gases, typically
ammonia or urea, which interacts with the NOx to transform it into
harmless nitrogen and water vapor.
• Biofiltration: A natural method of purifying the air is known as biofiltration.
By sending the air through a biofilter, which has microorganisms that
convert the contaminants into harmless chemicals, this is accomplished.
• Carbon capture and storage: This procedure collects carbon dioxide from
industrial processes and stores it in a secure area, like an underground
storage facility. The amount of CO2 released into the atmosphere, which is
a significant cause of climate change, can be decreased as a result. It’s vital
to remember that the most efficient method will depend on the precise
source of the emissions and the surrounding environment. As a result, it’s
critical to do a thorough study of the problem and create a customized
solution that considers all of the necessary variables.
b) General Methods
These gases are examples of air pollutants that can harm both the ecosystem
and human health. The following are some ways to lower their levels.
• Cut back on pollution: Reducing the emissions of these gases is one of the
best methods to lower.
air pollution. Stricter emission regulations for factories and vehicles,
encouragement of the use of renewable energy sources, and control
of industrial processes that emit these gases can all help accomplish
this.
• Encourage the use of public transportation: Since automobile emissions
are a major source of air pollution, encouraging people to take the bus,
walk, or bike instead of driving can help reduce those emissions.
• Enhance energy efficiency: Reducing emissions from power plants, another
significant source of air pollution, can be accomplished by making structures
and appliances more energy efficient.
15
• Plant trees: Trees are a useful instrument for decreasing air pollution
because they absorb pollutants like carbon dioxide from the air. Urban
tree planting can aid in lowering the airborne concentrations of these
pollutants.
• Install air filters: By removing pollutants from the air and enhancing
interior air quality, air filters can be installed in homes, workplaces, and
public areas.
It’s important to remember that these remedies need both individual and group
efforts to be successful. Governments, corporations, and people must therefore
cooperate to combat air pollution and safeguard both the environment and
human health.
CONCLUSION
An air quality monitoring system made of Arduino, a few sensors, and a multi-
plexer can be an effective and affordable solution for monitoring the air quality
in a particular environment. The system can be easily customized and expanded
with additional sensors depending on the specific needs of the application.
With the use of a multiplexer, multiple sensors can be connected to a single
Arduino board, reducing the overall hardware cost and complexity. The collected
data can be easily visualized and analyzed using software tools, allowing users
to monitor and track air quality over time, detect potential issues and take
necessary actions to improve the air quality.
Overall, this type of air quality monitoring system can be a useful tool for
environmental monitoring, health management, and pollution control. By adding
SMOTE (Synthetic Minority Over-sampling Technique) to the regression models
we compared the results, and we came to a conclusion about using smote in
Regression models.
One area of improvement could be in the calibration of the sensors, which
could be optimized for specific environments or applications to improve accuracy.
Additionally, machine learning algorithms could be used to identify patterns in
the data and predict changes in air quality, allowing users to take preventive
actions before air quality deteriorates. Another area of future work could be to
improve the user interface and data visualization of the system, making it more
accessible and easier to use for non-experts.
Overall, continued development and optimization of the system can help to
increase its effectiveness and applicability in a variety of settings, from indoor
air quality monitoring in homes and offices to outdoor pollution monitoring in
urban environments.
16
REFERENCES
1) Yangyang Ma, Shengqi Yang, Zhangqin Huang, Yibin Hou, Leqiang Cui,
and Dongfang Yang. Hierarchical air quality monitoring system design.
In 2014 International Symposium on Integrated Circuits (ISIC), pages
284–287. IEEE, 2014.
2) Chiou-Jye Huang and Ping-Huan Kuo. A deep cnn-lstm model for par-
ticulate matter (pm2. 5) forecasting in smart cities. Sensors, 18(7):2220,
2018.
3) Ayaskanta Mishra. Air pollution monitoring system based on iot: Fore-
casting and predictive modeling using machine learning. Proceedings of
the IEEE International Conferencre on Applied Electromagnetics, Signal
Processing & Communication, KIIT, Bhubaneswar, Odisha, India, pages
22–24, 2018.
4) ByungWan Jo and Rana Muhammad Asad Khan. An internet of things
system for underground mine air quality pollutant prediction based on
azure machine learning. Sensors, 18(4):930, 2018.
5) Ke Gu, Zhifang Xia, and Junfei Qiao. Stacked selective ensemble for pm
2.5 forecast. IEEE Transactions on Instrumentation and Measurement,
69(3):660–671, 2019.
6) Puneet Kalia and Mamtaz Alam Ansari. Iot based air quality and particu-
late matter concentration monitoring system. Materials Today: Proceed-
ings, 32:468–475, 2020.
7) Attapon Udomlumlert, Napaphat Vichaidis, Pichitchai Kamin, Thattapon
Surasak, Scott C-H Huang, and Nattagit Jiteurtragool. The development
of air quality monitoring system using iot and lpwan. In 2020-5th Inter-
national Conference on Information Technology (InCIT), pages 254–258.
IEEE, 2020.
8) Daniel Schürholz, Sylvain Kubler, and Arkady Zaslavsky. Artificial
intelligence-enabled context aware air quality prediction for smart cities.
Journal of Cleaner Production, 271:121941, 2020.
9) Saurabh Singh, Pradip Kumar Sharma, Byungun Yoon, Mohammad Shoja-
far, Gi Hwan Cho, and InHo Ra. Convergence of blockchain and artificial
intelligence in iot network for the sustainable smart city. Sustainable Cities
and Society, 63:102364, 2020.
10) M Saifeddine Hadj Sassi and Lamia Chaari Fourati. Deep learning and
augmented reality for iot-based air quality monitoring and prediction
system. In 2021 International Symposium on Networks, Computers and
Communications (ISNCC), pages 1–6. IEEE, 2021.
11) RP Janani, K Renuka, A Aruna, et al. Iot in smart cities: A contemporary
survey. Global Transitions Proceedings, 2(2):187–193, 2021.
17
12) Ranganathan Rani Hemamalini, Rajasekaran Vinodhini, Balusamy Shan-
thini, Pachaivannan Partheeban, Mani Charumathy, and Karunakaran
Cornelius. Air quality monitoring and forecasting using smart drones
and recurrent neural network for sustainable development in chennai city.
Sustainable Cities and Society, 85:104077, 2022.
13) Quynh Anh Tran, Quang Hung Dang, Tung Le, Huy Tien Nguyen, and
Tan Duy Le. Air quality monitoring and forecasting system using iot and
machine learning techniques. In 2022 6th International Conference on
Green Technology and Sustainable Development (GTSD), pages 786–792.
IEEE, 2022.
14) Teodoro Montanaro, Ilaria Sergi, Matteo Basile, Luca Mainetti, and Luigi
Patrono. An iot-aware solution to support governments in air pollution
monitoring based on the combination of real-time data and citizen feedback.
Sensors, 22(3):1000, 2022.
15) Edwin Collado, Gokul Harish, and Yessica Sáez. Design of a solar-powered
air pollution monitoring system under tropical climate environments. In
2022 IEEE 40th Central America and Panama Convention (CONCAPAN),
pages 1–6. IEEE, 2022.
16) P Asha, LBTJRRGS Natrayan, BT Geetha, J Rene Beulah, R Sumathy,
G Varalakshmi, and S Neelakandan. Iot enabled environmental toxicology
for air pollution monitoring using ai techniques. Environmental research,
205:112574, 2022.
17) Varsha Julakanti, Sai Tarun Raj Soudaboiena, et al. Design of air pollution
monitoring system using iot. In 2022 International Conference on Applied
Artificial Intelligence and Computing (ICAAIC), pages 1494–1499. IEEE,
2022.
18) SMSD Malleswari and T Krishna Mohana. Air pollution monitoring system
using iot devices. Materials Today: Proceedings, 51:1147–1150, 2022.
19) Shweta Sharma, Poonam Tanwar, Ankur Yadav, B Kesava Sairam, and
Sahil Jaswal. Critical review of air quality prediction using machine
learning techniques. In 2022 Sixth International Conference on I-SMAC
(IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pages 1–7. IEEE,
2022.
20) Taofeek D Akinosho, Lukumon O Oyedele, Muhammad Bilal, Ari Y Barrera-
Animas, Abdul Quayyum Gbadamosi, and Oladimeji A Olawale. A scalable
deep learning system for monitoring and forecasting pollutant concentration
levels on uk highways. Ecological Informatics, 69:101609, 2022.
21) Wuxia Zhang, Yupeng Wu, and John Kaiser Calautit. A review on occu-
pancy prediction through machine learning for enhancing energy efficiency,
air quality and thermal comfort in the built environment. Renewable and
Sustainable Energy Reviews, 167:112704, 2022.
18
22) Yunqian Lv, Hezhong Tian, Lining Luo, Shuhan Liu, Xiaoxuan Bai,
Hongyan Zhao, Kai Zhang, Shumin Lin, Shuang Zhao, Zhihui Guo, et al.
Understanding and revealing the intrinsic impacts of the covid-19 lockdown
on air quality and public health in north china using machine learning.
Science of The Total Environment, 857:159339, 2023.
23) Pranvera Kortoçi, Naser Hossein Motlagh, Martha Arbayani Zaidan, Pak
Lun Fung, Samu Varjonen, Andrew Rebeiro-Hargrave, Jarkko V Niemi,
Petteri Nurmi, Tareq Hussein, Tuukka Petäjä, et al. Air pollution exposure
monitoring using portable low-cost air quality sensors. Smart health,
23:100241, 2022.
24) Deepak Narayan Paithankar, Abhijeet Rajendra Pabale, Rushikesh Vilas
Kolhe, P William, and Prashant Madhukar Yawalkar. Framework for
implementing air quality monitoring system using lpwabased iot technique.
Measurement: Sensors, 26:100709, 2023.
25) K Sridhar, P Radhakrishnan, G Swapna, R Kesavamoorthy, L Pallavi,
and R Thiagarajan. A modular iot sensing platform using hybrid learning
ability for air quality prediction. Measurement: Sensors, 25:100609, 2023.
AUTHORS PROFILE
Hemanth Karnati, He is a dedicated and enthusiastic 3rd-year student pursuing
a Bachelor of Technology (B.Tech) degree in Computer Science and Engineering
(CSE) at VIT Vellore. His passion for technology and innovation has driven
him to explore various domains within the field. Hemanth's research work
primarily revolves around the fields of IoT and ML. He has conducted extensive
studies on leveraging IoT devices and networks to create intelligent systems that
enhance efficiency, automation, and decision-making processes. His research
work showcases his ability to apply ML algorithms and techniques to IoT data,
enabling predictive analysis, anomaly detection, and optimization in diverse
applications.
19