Prediction of Air Pollution in Smart Cities Using Machine Learning Techniques
Prediction of Air Pollution in Smart Cities Using Machine Learning Techniques
https://doi.org/10.22214/ijraset.2021.39241
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 9 Issue XII Dec 2021- Available at www.ijraset.com
Abstract: Air-pollution is one of the main threats for developed societies. According to the World Health Organization (WHO),
pollution is the main cause of deaths among children aged under five years. Smart cities are called to play a decisive role to
increase such pollution in real-time. The increase in air pollution due to fossil fuel consumption as well as its ill effects on the
climate has made air pollution forecasting an important research area in today’s times. Deployment of the Internet of things
(IoT) based sensors has considerably changed the dynamics of predicting air quality. prediction of spatio-temporal data has been
one of the major challenges in creating a good predictive model.
There are many different approaches which have been used to create an accurate predictive model. Primitive predictive machine
learning algorithms like simple linear regression have failed to produce accurate results primarily due to lack of computing
power but also due to lack of optimization techniques. A recent development in deep learning as well as improvements in
computing resources has increased the accuracy of predicting time series data. However, with large spatio-temporal data sets
spanning over years.
Employing regression models on the entire data can cause per date predictions to be corrupted. In this work, we look at dealing
with pre-processing the times series. However, pre-processing involves a similarity measure, we explore the use of Dynamic Time
Warping (DTW). K-means is then used to classify the spatio-temporal pollution data over a period of 16 years from 2000 to 2016.
Here Mean Absolute error (MAE) and Root Mean Square Error (RMSE) have been used as evaluation criteria for the
comparison of regression models.
Keywords: Spatio-temporal data, Primitive predictive machine learning algorithms, regression models
I. INTRODUCTION
The data set provides information of the NO2, SO2, PM10 levels for through 19 years, 2000 - 2019. Relation between the pollutants
to their geographical locations translates the problem into a classification issue. The knowledge of similarity between time series is
widely used for speech recognition and signature recognition. In our paper, we make use of two pieces of knowledge - factors
influencing pollution and seasonality observed in every year between 2000 - 2019. With respect to these concepts to determine the
similarity between time series of multiple cities and the similarity between time series of the 192 months in the years 2000 - 2019.
Here worked largely with NO2 data as this has been seen to be the cause for lung diseases compared to other methods. In future the
dataset will be replaced by real time data getting from sensors. SVM is particularly useful since the data involves a time series and is
non-linearly related.
This method can also provide a better generalization error. Here conducted several experiments using different models and
determined a low cost-complexity. There has been extensive research on developing highly accurate spatio -temporal models using
different machine learning approaches. Finally the model produce each gases contribution in air pollution. This section emphasizes
on some approaches are considered before choosing the appropriate model for our work.
Due to the numerous topography and extent of industrialization within the cities, predicting environment pollutant values help in
foreseeing the effect and extent of pollution. Countries deploy many sensors to record different pollutant levels in urban areas also
as near industrial zones, but the most index employed by governments to depict the pollution levels is that the Air Quality Index.
This is often a crucial measure because it helps to work out the general quality of air which consequently is employed to work out
the adverse health and climate effects which are caused to the environment.
Here, presenting a spatio temporal prediction model which might be highly effective in determining the AQI also as individual
pollutant levels over a period of your time.
IV. EXECUTION
When our data is comprised of attributes with varying scales, many machine learning algorithms can enjoy rescaling the attributes to
all or any have an equivalent scale. This is useful for optimization algorithms is used in the core of machine learning algorithm like
gradient descent. It is also useful for algorithms that weight inputs like regression and neural networks and algorithms that use
distance measures like K-Nearest Neighbors. We can rescale our data using scikit-learn using the MinMaxScaler class. Here for
month wise clustering, Rescaling of data applied to predict future values of gases. The following are the execution process carried to
find the Integration and reduction data
1) Data Collection
2) Data Preprocessing
3) Data Classification
4) Data Clustering
5) Target Model
Table 2: Raw Data
S.NO AREA CODE NO2 AQI SO2 AQI PM10 AQI
1 01 46 34 14
2 02 46 34 14
3 03 46 34 8
4 04 34 37 8
V. CONCLUSION
For predicting air pollutant level, here considered multiple models. The most suitable method as per our evaluation is to cluster
months with similar behavior of pollutant levels. K-Means clustering can be used for time series prediction. We observe that
pollutant levels follow seasonal behavior. Using K-Means clustering, month wise similar pollutant level behavior were clustered
together. For calculating distance for clustering, we conclude that Euclidean distance is not the correct approach. Dynamic Time
Warping is one of the possible measures to calculate the alignment between two time-series. Implementing DTW along with LB-
Keogh (lower bound DTW) helps fasten the DTW for the given large dataset. Thus, we have k clusters, each representing a group of
similar behavior patterns, with each cluster fitted to one regression line each.
City wise clustering the data by using sensors and forecast seasonal clustering. Implementing ARIMA model to create time-series
regression over the clusters for prediction. This provides with one time series regression line for each cluster. Implementing
Decision tree and Naïve Bayes algorithm to takes into account for spatial and temporal data in order to predict the output.
REFERENCES
[1] YING ZHANG, YANHAO WANG, QINGQING WANG1, "A Predictive Data Feature Exploration-Based Air Quality Prediction Approach", January, 2019.
[2] Richard O.Sinnott, Ziyue Guan, "Prediction of Air Pollution through Machine Learning Approaches on the Cloud," 2018 IEEE/ACM 5th International
Conference on Big Data Computing Applications and Technologies (BDCAT), no. December, pp. 1109, 2018.
[3] Asgari, Marjan, Mahdi Farnaghi, and Zeinab Ghaemi, “Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster.” In Proceedings of
the 2017 International Conference on Cloud and Big Data Computing, pp. 89-93. ACM, 2017.
[4] D. Zhu, C. Cai, T. Yang, and X. Zhou, “A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization,” no. December, pp.
114, 2017.
[5] R. W. Gore, “An Approach for Classification of Health Risks Based on Air Quality Levels,” pp. 5861, 2017.
[6] Bougoudis, K. Demertzis, and L. Iliadis, “EANN HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air
pollution modeling in Athens,” Neural Compute. Appl., vol. 27, no. 5, pp. 1191, 1206, 2016.
[7] A. J. Cohen et al., “Articles Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the
Global Burden of Diseases Study 2015,” Lancet, vol. 6736, no. 17, pp. 112, 2016.
[8] Y. Xing, Y. Xu, M. Shi, and Y. Lian, “The impact of PM2 . 5 on the human respiratory system,” vol. 8, no. I, pp. 6974, 2016.
[9] S. B. Hiregoudar , K. Manjunath, K. S.patil, “A Survey: Research Summary on Neural Networks”, International Journal of Research in Engineering and
Technology, ISSN: 2319 1163, Volume 03, Special Issue 03, pages 385-389, May, 2014.
[10] A. Kumar, H. Kim, and G. P. Hancke, “Environmental monitoring systems: A review,” IEEE Sensors J., vol. 13, no. 4, pp. 1329–1339, Apr.2013.
[11] V. Sharma, S. Rai, A. Dev, “A Comprehensive Study of Artificial Neural Networks”, International Journal of Advanced Research in Computer Science and
Software Engineering, ISSN2277128X, Volume 2, Issue 10 october, 2012.
[12] U. Gehring et al., “Traffic-related air pollution and the development of asthma and allergies during the first 8 years of life,” Amer. J. Respiratory Critical Care
Med., vol. 181,no. 6, pp. 596–603, 2010.
[13] O. A. Postolache, J. M. D. Pereira, and P. M. B. S. Girao, “Smart sensors network for air quality monitoring applications,” IEEE Trans. Instrum. Meas., vol.58,
no. 9, pp. 3253–3262, sep.2009.