0% found this document useful (0 votes)
5 views

Earthquake Prediction Using Machine Learning Algorithm

The document discusses the development of an earthquake prediction system utilizing machine learning algorithms, specifically Random Forest and Support Vector Machine, to forecast seismic events. It emphasizes the importance of accurate predictions to minimize loss of life and property, detailing the processes of data acquisition, pre-processing, and model building. The study aims to enhance reaction strategies to earthquakes by leveraging historical data and advanced predictive modeling techniques.

Uploaded by

Bikash Sadhukhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Earthquake Prediction Using Machine Learning Algorithm

The document discusses the development of an earthquake prediction system utilizing machine learning algorithms, specifically Random Forest and Support Vector Machine, to forecast seismic events. It emphasizes the importance of accurate predictions to minimize loss of life and property, detailing the processes of data acquisition, pre-processing, and model building. The study aims to enhance reaction strategies to earthquakes by leveraging historical data and advanced predictive modeling techniques.

Uploaded by

Bikash Sadhukhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Recent Technology and Engineering (IJRTE)

ISSN: 2277-3878, Volume-8 Issue-6, March 2020

Earthquake Prediction using Machine Learning


Algorithm
Pratiksha Bangar, Deeksha Gupta, Sonali Gaikwad, Bhagyashree Marekar, Jyoti Patil

Abstract: Per the statistics received from BBC, data varies for
every earthquake occurred till date. Approximately, up to I. INTRODUCTION
thousands are dead, about 50,000 are injured, around 1-3 Million
are dislocated, while a significant amount go missing and Earthquake’s association with structural damage and loss of
homeless. Almost 100% structural damage is experienced. It also life is one that keeps on enduring and thus is focal point of
affects the economic loss, varying from 10 to 16 million dollars. consideration for a many fields, say, seismological research
A magnitude corresponding to 5 and above is classified as and environmental engineering yet not limited to these[1].
deadliest. The most life-threatening earthquake occurred till date
took place in Indonesia where about 3 million were dead, 1-2 It’s significance is stretched out to human life too, for to
million were injured and the structural damage accounted to sustain and to survive. A prediction that can be accurate and
100%. Hence, the consequences of earthquake are devastating relied on is a requisite for all the areas prone to disasters and
and are not limited to loss and damage of living as well as non- as well as for locations that have less to none chances. It will
living, but it also causes significant amount of change-from get us ready for all the worst possible scenarios and for
surrounding and lifestyle to economic. Every such parameter necessary measures as well that can be taken before hand to
desiderates into forecasting earthquake. A couple of minutes’
notice and individuals can act to shield themselves from damage solve upcoming crisis. As the technology is evolving and
and demise; can decrease harm and monetary misfortunes, and helping humans for a better and a convenient lifestyle,
property, characteristic assets can be secured. possibility at saving life is taken up with the help of efficient
In current scenario, an accurate forecaster is designed and ML algorithm and Data Science to give accurate forecast.
developed, a system that will forecast the catastrophe. It focuses Machine Learning is a subset of Artificial Intelligence. It
on detecting early signs of earthquake by using machine learning permits the system to adapt to a behaviour of a particular
algorithms. System is entitled to basic steps of developing
learning systems along with life cycle of data science. Data-sets kind based on its own learning and possesses the ability to
for Indian sub-continental along with rest of the World are improve itself naturally solely from experience without any
collected from government sources. Pre-processing of data is explicit programming, human mediation or help[8].
followed by construction of stacking model that combines Initialisation of a machine learning process starts with
Random Forest and Support Vector Machine Algorithms. feeding an honest quality data-set to the algorithm(s), so as
Algorithms develop this mathematical model reliant on “training to build a ML prediction model. Algorithms perform
data-set”. Model looks for pattern that leads to catastrophe and
adapt to it in its building, so as to settle on choices and forecasts knowledge discovery and statistical evaluation, determining
without being expressly customized to play out the task. After patterns and trends in data. Selection of algorithms relies on
forecast, we broadcast the message to government officials and data and on the task that requires automation.
across various platforms. Our target is foreseeing catastrophic events and
The focus of information to obtain is keenly represented by the 3 improving the manner in which we react to them. Great
factors – Time, Locality and Magnitude. forecasts and admonitions spare lives. A notice of an
Keywords: Earthquake, Forecast, Machine Learning, Ran- approaching calamity can be issued well ahead of time as it
dom Forest, Support vector Machine will help in reducing both death occurrence and structural
loss.
ML algorithms construct two types of predictive
models, Regression and Classification models[6]. Each of
them approaches data in a different way. Concerned system
makes use of regression model whose core idea is
forecasting a numerical value.

Revised Manuscript Received on February 01, 2020. A. Earthquake Forecast


Pratiksha Banagr, Department of Information Technology, Anticipating a seismic event is viewed as an impossible
Jayawantrao Sawant College Of Engineering, Pune, India
phenomenon. It is a troublesome errand due to non-linearity
Email: [email protected]
Deeksha Gupta, Department Of Information Technology, Jayawantrao of the event and unreliability [3] in it yet the ability of ML
Sawant College Of Engineering, Pune, India algorithms to assemble prescient models has transformed it
Email: [email protected] into a potential wonder. Earthquake forecast for Indian
Sonali Gaikwad, Department Of Information Technology, Jayawantrao
Sawant College Of Engineering, Pune, India subcontinent along with rest of the World requires
Email: [email protected] employing their earthquake catalogue aka data-set. A
Bhagyashree Marekar, Department Of Information Technology, earthquake catalogue refers to a complete list of earthquake
Jayawantrao Sawant College Of Engineering, Pune, India
Email: [email protected] location, time, magnitude and depth that have happened in
Jyoti Patil, Ph. D. Research Scholar, Department of CSE, Koneru the past[3]. Methodology relies
Lakshmaiah Education Foundation (KLEF), Guntur, A.P.India. on sequence of these past
Email: [email protected]
earthquakes,

Published By:
Retrieval Number: F9110038620/2020©BEIESP
Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.E9110.018620 4684 & Sciences Publication
Earthquake Prediction using Machine Learning Algorithm

recognising suitable, necessary and appropriate parameters, of each differs for batch processing (data given at once) and
identifying patterns in these parameters and understanding online processing (data generation in a continuous manner).
correlations between actual earthquakes from the past so as She concludes that ensembles are usually considered
to predict future occurrence. impractical for systems where online processing takes place
Various Random Forest-Support Vector Machine but here, its performance is better than batch processing
ensemble model are studied, modelled and deployed. with an advantage of low run time, especially for larger
data-sets.[4] Her insights are helpful for us in constructing
II. RELATED WORK our own ensemble models.
Ant´onioE Ruano, Maria G. Ruano, Pedro M.
Wenrui Li , Nakshatra, Nishita Narvekar, Nitisha Raut,
Ferreira, Ozias Barros, G.Madureira, Hamid R.Khosravani
Birsen Sirkeci, Jerry Gao introduce us to the idea that a
acquire seismic information from the PVAQ and the PESTR
strong earthquake is followed by aftershocks. We can detect
station of the seismic monitoring system. They mention a
location of these aftershocks by analysis of arrival time of
significant objective fact that detectors already present at
P-waves and S-waves. Data collection from 16 earthquake
such stations produce enormous number of bogus alarms
stations in SAC file format, which contains time series data
and fail in detection of the event due to their being based
and is a waveform, used by authors to study trends in P-
upon a standard STA/LTA ratio. Thus they present a new
waveand S-wave. Data is clipped followed by noise removal
seismic detector entitled to SVM classifier and its
to only obtain needed waveform by means of triggering
application is in a continuous manner on such stations. They
algorithm and filters. AR picker algorithm used to determine
compare specificity and recall measures obtained for each
values of P-wave and S-wave arrival time which are treated
station, and conclude that the SVM classifier could
as extracted feature. Waveform is then converted into ASCII
differentiate between noise and seismic events successfully.
format. Data is then fed to different machine learning
Next, they shift their focus in reducing detection time in
models-SVM, Decision trees Random forest and linear
Early Warning System. Obtained results (88 and 110 sec)
regression for comparison purpose. Random Forest
are too huge to be considered for deployment, so a new
distinguishes between earthquake leading and non-
approach is inherited of overlapping windows and as a
earthquake leading data the best, with an accuracy of 90.
result, time obtained was 1.3 sec and 1.8 sec respectively.
Use of triangulation technique to calculate epicentre, predict
On the other hand, a change in values of recall and
arrival time of P-wave and S-wave and the difference
specificity, result in increase in correct detection and in false
between the two arrivals.[2]
alarms as well.[5]
Khawaja Muhammad Asim, Adnan Idris, Francisco
Mart´ınez-A´ lvarez, Talat Iqbal carried out prediction of
III. PROPOSED WORK
earthquake for Hindu-Kush region where small to medium
earthquakes hit regularly, in accordance with tree based Developing predictive modelling involves gradual
ensemble classifiers like rotboost, random forest and procedure. Tools which are conventionally used for
rotation forest. They employ earthquake data-set, and developing model are Python, Hadoop and R.
convert magnitude into binary classes, hence adapting Various steps involved are:
concept of binary classification. A new combination of A. DATA ACQUISITION
features based on 3 factors- Gutenberg-Richter relationship,
Data acquisition is the process for bringing data for
seismic rate changes and distribution of fore-shock
production use either from source outside the system and
frequency. Highlighting factor is calculation of 51 seismic
into the system, or from data produced by the system. This
feature using suitable procedures and techniques. Since all
is the underlying advance to start and alludes to gathering
the models performed exceptionally well, we can conclude
required information. We obtain required data sets from
the strategy of calculating 51 features was very effective.
government provided website such as –
Rotation forest gives an accuracy of 95.9% and titles itself
 USGS.gov (United States Geological Survey)-
the best among rest models.[3] The useful insights for us
Scientific agency of the United States
come in the fact that for every region on this earth, a
government.[13]
prediction model needs to be deployed however there is no
prediction of when and of what magnitude will an  IMD.gov (India Meteorological Department)-
earthquake occur of. Agency of the Ministry of Earth Sciences of the
G.T Prasanna Kumari develops a classification Government of India.[14]
model using ensemble learning methods. Emphasis is Google Acquired Kaggle contains data-set collected from
hugely on two notable ensemble algorithms, named Bagging different agencies of different governments.
and Boosting to foresee how creation of diverse ensembles The columns in the data-set are -
improves precision of algorithm and how they contrast in  Date
their effectiveness with respect to traditional approach of  Time
constructing a single model, usually followed in ML to build  Latitude
classifiers. Bagging and Boosting are discussed in depth by  Longitude
specifying how each algorithm’s process flow is different  Magnitude
from the other, different ways in which they can be applied,  Depth
their respective algorithms, powerfulness, achievements and 
limitations as well. She further discusses how performance

Retrieval Number: F9110038620/2020©BEIESP Published By:


DOI:10.35940/ijrte.E9110.038620 Blue Eyes Intelligence Engineering
4685 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-6, March 2020

B. DATA PRE-PROCESSING Data Pre-processing is a technique that converts given


primary data into a clean data set to make it suitable for use. In Random Forest, the root and splitting nodes are
It consists of two steps: calculated in a random manner[9].
 Data Engineering
 Feature Engineering
Data Engineering
Real-World Data is not in a structured and compatible form,
a per-cent of it could be found as incorrect, invalid, out-of-
range, off-base, impossible as well as missing data which
influence the outcomes causing them to be deceiving,
misleading and incorrect. Irrelevant and unreliable data can
make pattern recognition and knowledge discovery in the
training phase progressively troublesome. Hence, it is the
most significant advance in an ML framework and one
needs to clean the information to dispose of such qualities or
validate/correct them. It involves data integration,
Fig. 1. Random Forest
computing missing values, taking care of categorical values, Therefore random forest is a model comprising of various
transformation, and error correction. trees with the capability of making decision based on rule
Feature Engineering and the procedure of choosing root nodes and parent nodes
is random.
It involves either Feature Selection or Feature Extraction
and Feature Scaling. Building A Support Vector Machine Regression Model :
A data set contains numerous of features which are Regression and classification chores can be
random and may not be useful in prediction. Feature performed by Support Vector Machines, a supervised
Engineering deals with reduction of random features under learning algorithm. SVM segregates different data classes
consideration and obtaining a set of minimum features using a decision line named hyperplane. When predicting a
which contribute to accurate prediction. Many algorithms numerical value, SVR attempts to find a function f(x) in the
are provided by ML for feature selection/extraction. Feature form of decision boundary at a certain deviation from €,
scaling is strategy used to standardize or normalize the range which is a threshold value for all prediction to be within,
of features in the data-set. Feature Engineering is useful as it from obtained targets value Yi, the original hyperplane, such
compresses the data, reduces the storage space, computation that data points are within the boundary line. This decision
time and removes redundant features. boundary is the Margin of tolerance - a boundary that allows
errors under given range.[10][11][12]
C. MODEL BUILDING
The yield of an ML algorithm is a ‘model’. To begin
with, the target variable and feature variable are
comprehended and fetched. Second, the data-set is
partitioned into training and testing data-set and third, the
regressor/classifier model is constructed and fitted to
training data-set.
In python, scikit-learn is a simple, basic, efficient
open source library that executes a range of machine
learning algorithms featuring various classification,
regression and clustering algorithms using a unified
interface.[15] Step by step building is as follows:
Building A Random Forest Regression Model :
Random forests are an ensemble learning method that can be
Fig. 2. Support Vector Regressor
fabricated for both regression as well as classification chore.
It takes on the task of constructing multiple of decision trees Building A Stacking Regressor Model :
during training and outputs the class that is mean prediction Stacking regression is an ensemble learning method. Several
(regression) of each individual tree or the mode of the regression models collaborate, as a result, meta-regressor is
classes (classification). This huge number of trees represents build & itself finds its best fit by making use of output of
a forest. Decision trees are rule based models; on a given individual regression models, trained on absolute training
training data-set with targets and features, the decision tree set, as meta-features.[7] Widely used to attain accuracy.
algorithm will come up with rules to carry out classification Fig 3, represents our model. “R1” and R2” are Random
and regression. Features will be nodes and their presence Forest and Support Vector Regressor respectively.
and absence will represent likeliness. This helps in
constructing a path of rules to work with. The root and
splitting node is based on information gain or gini index[9].

Published By:
Retrieval Number: F9110038620/2020©BEIESP
Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.E9110.018620 4686 & Sciences Publication
Earthquake Prediction using Machine Learning Algorithm

Fig. 3. Stacking Fig. 6. Bar plot for Bagging

D. PREDICTIONS 3. Prediction Using Boosting


Accuracy:76%
Algorithm:
1.Input data-set and load libraries.
2.Data Pre-processing.
3.Model Building.
4.Making Predictions.
Data Visualisation:
1. Affected Areas

Fig. 6. Bar plot for Bagging

4. Prediction Using Stacking


Accuracy:83%

Fig. 4. Data Visualization for Indian Sub-Continent

Fig. 6. Bar plot for Stacking


Fig. 5. Data Visualization for rest of The World
2. Prediction Using Bagging IV. RESULT
Accuracy:74% The randomforest-support vector machine model in
combination work well for large dataset. The accuracy
obtained for stacking model is the highest- 83% as
compared to the accuracy of bagging and boosting.
Response time is same for all the methodologies. Training

Retrieval Number: F9110038620/2020©BEIESP Published By:


DOI:10.35940/ijrte.E9110.038620 Blue Eyes Intelligence Engineering
4687 & Sciences Publication
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-6, March 2020

time taken is slightly higher for stacking. Results are as 14. “India Meteorological Department.” Wikipedia, Wikimedia
Foundation, 21 Jan. 2020,
follows : en.wikipedia.org/wiki/IndiaMeteorologicalDepartment.
15. Kumar, Vivek. “Vivek Kumar.” Pluralsight, 13 May 2019,
Table- I: Result Table www.pluralsight.com/guides/building-classification-models-scikit-
Parametes/ ACCURACY TRAINING RESPONSE learn
Algorithms TIME TIME
Bagging 74% 3m5sec 5 sec
Boosting 76% 3m19sec 5sec AUTHORS PROFILE
Stacking 83% 11m37sec 5sec
Pratiksha Bangar, an undergraduate, is pursuing
Bachelor of Engineering, in the branch of
V. CONCLUSION Information Technology from Department Of
Information Technology, JSPM’s Jaywantrao
Thus we can conclude that integration of seismic activity Sawant College of Engineering, Pune and currently
is in her final year. Research area is Machine
with machine learning technology yields efficient and Learning.
significant result and can be used to predict earthquakes
widely, given the past history of the same is well Deeksha Gupta, an undergraduate, is pursuing
Bachelor of Engineering, in the branch of
maintained. Our attempt can be termed successful. The Information Technology from Department Of
collaboration of the two can further be advanced to guard Information Technology, JSPM’S Jaywantrao
earthquakes more acutely. Large datasets prove to be very Sawant College of Engineering, Pune and currently
is in her final year. Area of Interest is Machine
significant. Prediction models can be deployed in an area- Learning.
centric manner, thus increasing the chances of accurate
prediction exponentially but at the cost of studying Sonali Gaikwad, an undergraduate, is pursuing
algorithms used to build Stacking model, as it will perform Bachelor of Engineering, in the branch of
Information Technology from Department Of
well only if the algorithms chosen to build metaregressor are Information Technology, JSPM’S Jaywantrao
accurate themselves. The use of the methodology can be Sawant College of Engineering, Pune and currently
expanded in predicting various natural disasters as well. is in her final year. Research area is Machine
Learning.

REFERENCES Bhagyashree Marekar, an undergraduate is


1. C. Li and X. Liu, ”An improved PSO-BP neural network and its pursuing Bachelor of Engineering, in the branch of
application to earthquake prediction,” 2016 Chinese Control and Information Technology from Department Of
Decision Conference (CCDC), Yinchuan, 2016, pp. 3434-3438. Information Technology, JSPM’S Jaywantrao
2. W. Li, N. Narvekar, N. Nakshatra, N. Raut, B. Sirkeci and J. Gao, Sawant College of Engineering, Pune and currently
“Seismic Data Classification Using Machine Learning,” 2018 IEEE is in her final year. Area of interest is Machine
Fourth International Conference on Big Data Computing Service and Learning.
Applications (BigDataService), Bamberg, 2018, pp. 56-63.
3. K. M. Asim, A. Idris, F. Mart´ınez-A´ lvarez and T. Iqbal, “Short MS. Jyoti Patil, Currently working as Head of
Term Earthquake Prediction in Hindukush Region Using Tree Based Department and Associate professor Department of
Ensemble Learning,” 2016 International Conference on Frontiers of Information Technology JSPM’s Jayawantrao
Information Technology (FIT), Islamabad, 2016, pp. 365-370. Sawant college of Engineering Hadapsar
4. Kumari, G. T. Prasanna. “A Study Of Bagging And Boosting ,Pune,India. Her Major Area of Research are Deep
Approaches To Develop Meta-Classifier.”, Engineering Science and learning,Data Analytics, Hadoop MapReduce,
Technology: An International Journal, Vol.2, 2012, pp. 850-855. Image processing. She is Pursuing Ph.D. in deep
5. Ant´onio E Ruano, G. Madureira, Ozias Barros, Hamid R. learning bigdata from KLEF deemed to be University,
Khosravani, Maria G. Ruano, Pedro M. Ferreira. ”A Support Vector Guntur,A.P,India.She has published almost 11 papers in national and
Machine Seismic Detector for Early-Warning Applications”, IFAC international journals.She has Published Patent On “Detection Of Brain
Proceedings Volumes, 2013, pp. 400-405 Tumor Levels In MapReduce 3D MRI Images Using Hadoop”
6. Nick Minaie. “A Beginner’s Guide to Selecting Machine Learning
Predictive Models in Python”, Towardsdatascience, Medium, 16 July
2019.
7. U. Pasupulety, A. Abdullah Anees, S. Anmol and B. R. Mohan,
“Predicting Stock Prices using Ensemble Learning and Sentiment
Analysis,” 2019 IEEE Second International Conference on Artificial
Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy,
2019, pp. 215-222.
8. The Expert Team. “What is Machine Learning? A definition”,
Expertsystem, 7 March 2017.
9. Flach, Peter. Machine Learning: the Art and Science of Algorithms
That Make Sense of Data. Cambridge University Press, 2017, pp.
331-333
10. Support Vector Regression,
www.saedsayad.com/supportvectormachinereg.htm.
11. Martin M. (2002) On-Line Support Vector Machine Regression.
In:Elomaa T., Mannila H., Toivonen H. (eds) Machine Learning:
ECML 2002. ECML 2002. Lecture Notes in Computer Science, vol
2430. Springer, Berlin, Heidelberg
12. Smola, A.J., Sch¨olkopf, B. A tutorial on support vector regression.
Statistics and Computing 14, 199–222 (2004).
13. “United States Geological Survey.” Wikipedia, WikiMedia
Foundation, 11 Mar. 2020,
en.wikipedia.org/wiki/UnitedStatesGeologicalSurvey.

Published By:
Retrieval Number: F9110038620/2020©BEIESP
Blue Eyes Intelligence Engineering
DOI:10.35940/ijrte.E9110.018620 4688 & Sciences Publication

You might also like