Mini Project PPT, Sumit Malan

Uploaded by

schaudhary2332

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views12 pages

Mini Project PPT, Sumit Malan

Uploaded by

schaudhary2332

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Graphic Era

Deemed to be University

Mini Project
Topic:-Rainfall Prediction System

Submitted by:- Sumit Malan

Mentor Name:-Ms. Meenakshi Maindola Course:- B.Tech(CSE)
University Roll no:- 2019162
Introduction
Rainfall prediction plays a
crucial role in various industries
and sectors. Accurate and
reliable forecasts are essential
for effective planning and
decision-making. From
agriculture to construction,
transportation to energy
production, rainfall forecasts
help organizations optimize their
operations and mitigate risks.
Problem Statement
•Rainfall prediction is a complex task due to the inherent variability and
dynamics of weather systems.
•Factors such as atmospheric conditions, temperature, humidity, and
wind patterns contribute to the complexity of rainfall prediction.
•Accurately predicting rainfall patterns requires advanced modeling
techniques and access to large amounts of historical weather data.
•Rainfall patterns are influenced by various factors such as topography,
vegetation, and land use, making it challenging to accurately predict
rainfall in specific regions.
•Climate change and global warming introduce additional uncertainties
in rainfall prediction, as they can alter weather patterns and
precipitation levels.
METHODOLOGY
The overall architecture include four major
components: Data Exploration and Analysis, Data
Pre-processing, Model Implementation, and Model
Evaluation

1. Data Exploration and Analysis Exploratory:

Data Analysis is valuable to machine learning
problems since it allows to get closer to the
certainty that the future results will be valid,
Pair-wise Correlation Matrix which is
correctly interpreted, and applicable to the performed to understand interactions between
desired business contexts different fields in the data set
2. Data Preprocessing: Data preprocessing is a data mining technique that involves
transforming raw data into an understandable format. Real-world data is often
incomplete, inconsistent, and lacking in certain behaviors and is likely to contain
many errors.
We have carried below preprocessing steps:-

2.1 Missing Values: As per our EDA step, we learned that we have few instances with
null values. Hence, this becomes one of the important step. To impute the missing
values, we will group our instances based on the location and date and thereby
replace the null values by there respective mean values.

2.2 Categorical Values: Categorical feature is one that has two or more categories,
but there is no intrinsic ordering to the categories. We have a few categorical
features - WindGustDir, WindDir9am, WindDir3pm with 16 unique values. Now it gets
complicated for machines to understand texts and process them, rather than
numbers, since the models are based on mathematical equations and calculations.
Therefore, we have to encode the categorical data.
3. Model Implementation: We chose different classifiers each belonging to
different model family (such as Linear classifier, Tree-based, Distance-based).
Logistic Regression is a classification algorithm used to predict a binary
outcome (1 / 0, Yes / No, True / False) given a set of independent variables.
In simple words, it predicts the probability of occurrence of an event by
fitting data to a logit function. Hence, this makes Logistic Regression a better
fit as ours is a binary classification problem.
Decision Tree In this technique, we split the population or sample into two or
more homogeneous sets (or sub-populations) based on most significant
differentiator in input variables. This characteristics of Decision Tree makes it
a good fit for our problem as our target variable is binary categorical
variable.
Random Forest is a supervised ensemble learning algorithm.Here, we have a
collection of decision trees, known as Forest. To classify a new object based
on attributes, each tree gives a classification and we say the tree votes for
that class. The forest chooses the classification having the most votes (over
all the trees in the forest).
Model Evaluation: For evaluating our classifiers we used below evaluation
metrics. Accuracy is the ratio of number of correct predictions to the total
number of input samples. It works well only if there are equal number of
samples belonging to each class. As we have, imbalanced data, we will also
consider other metrics.
Area Under Curve(AUC) is used for binary classification problem. AUC of a
classifier is equal to the probability that the classifier will rank a randomly
chosen positive example higher than a randomly chosen negative example
Precision is the number of correct positive results divided by the number of
positive results predicted by the classifier.
Recall is the number of correct positive results divided by the number of all
relevant samples (all samples that should have been identified as positive).
RESULT:
Experiment 1 - Original Dataset: Post all the preprocessing steps (as mentioned
above in the Methodology section), we ran all the implemented classifiers each one
with the same input data . It depicts two considered metrics (10-skfold Accuracy and
Area Under Curve) for all the classifiers.
Accuracy wise Gradient Boosting with a
learning rate of 0.25 performed best,
coverage wise Random Forest and Decision
Tree performed worsts.

Experiment 2 - Undersampled Dataset: Post all the preprocessing steps (as

mentioned above in the Methodology section) including the undersampling step, we
ran all the implemented classifiers each one with the same input data. It depicts
two considered metrics (10-skfold Accuracy and Area Under Curve) for all the
classifiers.
Accuracy and coverage wise Logistic
Regression performed best and
Decision Tree performed worsts.

Experiment 3 - Oversampled Dataset:

Post all the preprocessing steps (as
mentioned above in the Methodology
section) including the oversampling
step, we ran all the implemented
classifiers each one with the same
input data.It depicts two considered
metrics (10-skfold Accuracy and Area
Accuracy and coverage wise Decision Tree
Under Curve) for all the classifiers.
performed best and Logistic Regression
performed worsts
Discussion:
The first important thing we learned is the importance of knowing
your data. While imputing the missing value, we grouped two other
features and calculated the mean instead of directly calculating the
mean for all the instances. This way our imputed values were closer
to the correct information.

Another thing we learned is about the leaky features. While exploring

our data, we came to that one of our feature (RiskMM) was used for
generating the target variable and hence it made no sense to use this
feature for predictions.

We learned about the curse of dimensionality while dealing with

categorical variables which we solved using feature hashing.
Conclusion:
We explored and applied several preprocessing steps and learned there
impact on the overall performance of our classifiers. We also carried a
comparative study of all the classifiers with different input data and
observed how the input data can affect the model predictions.

We can conclude that Australian weather is uncertain and there is no such

correlation among rainfall and the respective region and time. We figured
certain patterns and relationships among data which helped in determining
important features. Refer to the appendix section.

As we have a huge amount of data, we can apply Deep Learning models

such as Multilayer Perceptron, Convolutional Neural Network, and others. It
would be great to perform a comparative study between the Machine
learning classifiers and Deep learning models.
Future Work
While our research has provided valuable insights into rainfall prediction, there
are several areas for further exploration and improvement. Some potential
avenues for future work include:
1.Data Collection: Expanding the dataset used for training and testing the
models by incorporating data from additional weather stations and sources.
This would provide a more comprehensive and diverse dataset, leading to more
accurate predictions.
2.Feature Engineering: Exploring new variables and features that could
improve the predictive power of the models. This could include factors such as
humidity, wind speed, and atmospheric pressure.
3.Model Optimization: Investigating different machine learning algorithms and
techniques to optimize the models' performance. This could involve ensemble
methods, deep learning, or hybrid models that combine multiple approaches.
4.Real-Time Prediction: Developing real-time prediction models that can
provide up-to-date rainfall forecasts. This would require efficient data
processing and modeling techniques to handle large volumes of streaming data

Flood detection PPT
No ratings yet
Flood detection PPT
28 pages
Pertemuan 5
No ratings yet
Pertemuan 5
67 pages
Final Year Project Report
100% (1)
Final Year Project Report
59 pages
A Novel Brain Tumor Classification Model
No ratings yet
A Novel Brain Tumor Classification Model
12 pages
Review Article: Mental Health Prediction Using Machine Learning: Taxonomy, Applications, and Challenges
No ratings yet
Review Article: Mental Health Prediction Using Machine Learning: Taxonomy, Applications, and Challenges
19 pages
Final Report 1301174460 1301174539 AMLdocx
No ratings yet
Final Report 1301174460 1301174539 AMLdocx
12 pages
Rainfall Prediction
100% (2)
Rainfall Prediction
33 pages
RAINFALL PREDICTION USING MACHINE LEARNING
No ratings yet
RAINFALL PREDICTION USING MACHINE LEARNING
6 pages
Project - Machine Learning - Rajendra M Bhat
100% (11)
Project - Machine Learning - Rajendra M Bhat
19 pages
Kerala Flood Prediction With ML & Tableau Dashboard
No ratings yet
Kerala Flood Prediction With ML & Tableau Dashboard
45 pages
Rainfall Prediction Using Machine Learning
100% (1)
Rainfall Prediction Using Machine Learning
6 pages
Project Document
No ratings yet
Project Document
49 pages
Horse Pologne
No ratings yet
Horse Pologne
37 pages
Rainfall Prediction Project
100% (4)
Rainfall Prediction Project
19 pages
Predict Rain Tomorrow in Australia
No ratings yet
Predict Rain Tomorrow in Australia
29 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
AI Project
No ratings yet
AI Project
30 pages
Project - Presentation For ML Based Weather Prediction
No ratings yet
Project - Presentation For ML Based Weather Prediction
46 pages
ML Complete Notes-AIDS
No ratings yet
ML Complete Notes-AIDS
115 pages
Untitled presentation
No ratings yet
Untitled presentation
18 pages
Variability in The Application of Difficult Airway
No ratings yet
Variability in The Application of Difficult Airway
2 pages
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
100% (1)
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
4 pages
nhess-2018-94-manuscript-version3
No ratings yet
nhess-2018-94-manuscript-version3
28 pages
Does EFL Readers Lexical and Grammatical Knowledge Predict Their Reading Ability Insights From A Perceptron Artificial Neural Network Study
No ratings yet
Does EFL Readers Lexical and Grammatical Knowledge Predict Their Reading Ability Insights From A Perceptron Artificial Neural Network Study
23 pages
DMW_Project
No ratings yet
DMW_Project
14 pages
Rainfall Prediction
No ratings yet
Rainfall Prediction
29 pages
(IJCST-V10I2P14) :prof. A. D. Wankhade, Bhagyashri Jaiswal, Divya Gupta, Mahima Gadodiya, Sanket Raut
No ratings yet
(IJCST-V10I2P14) :prof. A. D. Wankhade, Bhagyashri Jaiswal, Divya Gupta, Mahima Gadodiya, Sanket Raut
4 pages
RCT 011-09-2014 modLC
No ratings yet
RCT 011-09-2014 modLC
12 pages
jose_MINI2nd
No ratings yet
jose_MINI2nd
39 pages
PPT-PDF
No ratings yet
PPT-PDF
30 pages
Unit I Predictive Analytics
No ratings yet
Unit I Predictive Analytics
39 pages
Evaluation of Information Retrieval Systems
No ratings yet
Evaluation of Information Retrieval Systems
9 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
A13 Miniproject
No ratings yet
A13 Miniproject
95 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
BMS Institute of Technology and Management Department of MCA
100% (1)
BMS Institute of Technology and Management Department of MCA
10 pages
Rainfall prediction
No ratings yet
Rainfall prediction
46 pages
Normas TADL-Q (Muñoz-Neira, 2012)
No ratings yet
Normas TADL-Q (Muñoz-Neira, 2012)
11 pages
Rain Prediction Using Random Forest
No ratings yet
Rain Prediction Using Random Forest
30 pages
PyTorch & PyTorch Geometric
No ratings yet
PyTorch & PyTorch Geometric
21 pages
Softcom-Assignment2
No ratings yet
Softcom-Assignment2
16 pages
Rainfall
No ratings yet
Rainfall
62 pages
Early Identification of Patients at Risk For Difficult Intubation in The Intensive Care Unit
No ratings yet
Early Identification of Patients at Risk For Difficult Intubation in The Intensive Care Unit
9 pages
The evaluation of consumed food portions as a screening test SEFI
No ratings yet
The evaluation of consumed food portions as a screening test SEFI
6 pages
c11 Rain Fall Prediction
No ratings yet
c11 Rain Fall Prediction
33 pages
Rice 2017
No ratings yet
Rice 2017
6 pages
Comparative Analysis of Machine Learning Models for Weather Data Processing
No ratings yet
Comparative Analysis of Machine Learning Models for Weather Data Processing
6 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Risk Assesment CARAS
No ratings yet
Risk Assesment CARAS
23 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
Latex Report Main 1
No ratings yet
Latex Report Main 1
26 pages
Flood Prediction
No ratings yet
Flood Prediction
26 pages
Presentationfinal-1
No ratings yet
Presentationfinal-1
14 pages
Quiet Quitting Among Employees a Proposed Cut-Off
No ratings yet
Quiet Quitting Among Employees a Proposed Cut-Off
13 pages
Continue
No ratings yet
Continue
3 pages
Final Report
No ratings yet
Final Report
9 pages
DSN X Microsoft Data Science With Azure Assessment
No ratings yet
DSN X Microsoft Data Science With Azure Assessment
31 pages
Practise Questions
No ratings yet
Practise Questions
26 pages
Omkar Reddy Gojala: Education
No ratings yet
Omkar Reddy Gojala: Education
1 page
Dyslexia and WISC-III
No ratings yet
Dyslexia and WISC-III
34 pages
CSI5155 ML Project Report
No ratings yet
CSI5155 ML Project Report
23 pages
Rainfall Prediction Project
No ratings yet
Rainfall Prediction Project
19 pages
Context: Description
No ratings yet
Context: Description
5 pages
IJCRT2404206
No ratings yet
IJCRT2404206
6 pages
Stat Learn Big Data 20130401
No ratings yet
Stat Learn Big Data 20130401
53 pages
aml.weather
No ratings yet
aml.weather
6 pages
Case Study-3
No ratings yet
Case Study-3
1 page
journal22
No ratings yet
journal22
5 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Discover Internet of Things: A Pragmatic Ensemble Learning Approach For Rainfall Prediction
No ratings yet
Discover Internet of Things: A Pragmatic Ensemble Learning Approach For Rainfall Prediction
15 pages
Rainfall Prediction
No ratings yet
Rainfall Prediction
8 pages
IoT Framework For Real Time Weather Monitoring Using Machine Learning Techniques
No ratings yet
IoT Framework For Real Time Weather Monitoring Using Machine Learning Techniques
7 pages
Integrating Temporal and Meteorological Metrics for Rainfall Prediction Using Machine Learning Models (2)
No ratings yet
Integrating Temporal and Meteorological Metrics for Rainfall Prediction Using Machine Learning Models (2)
8 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
5 pages
OPABP NidhiSrivastava
No ratings yet
OPABP NidhiSrivastava
7 pages
R1-Weather Prediction Mode1
No ratings yet
R1-Weather Prediction Mode1
7 pages
Likelihood Ratio PDF
No ratings yet
Likelihood Ratio PDF
5 pages
Ieee Research Paper
No ratings yet
Ieee Research Paper
2 pages
Weather Forecasting Using Decision Tree Regression
No ratings yet
Weather Forecasting Using Decision Tree Regression
7 pages
Prediction_Of_Rainfall_Using_Machine_Lea
No ratings yet
Prediction_Of_Rainfall_Using_Machine_Lea
5 pages
Rainfall
No ratings yet
Rainfall
24 pages
Research Paper Rain Prediction System
No ratings yet
Research Paper Rain Prediction System
6 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Prediction of Rainfall Using Machine Learning Techniques
No ratings yet
Prediction of Rainfall Using Machine Learning Techniques
16 pages
IRJET_Flood_Prediction_and_Rainfall_Anal
No ratings yet
IRJET_Flood_Prediction_and_Rainfall_Anal
5 pages
21 - Rainfall Prediction Using Machine Learning
No ratings yet
21 - Rainfall Prediction Using Machine Learning
2 pages
S 5
No ratings yet
S 5
19 pages
Celeb-Df: A Large-Scale Challenging Dataset For Deepfake Forensics
No ratings yet
Celeb-Df: A Large-Scale Challenging Dataset For Deepfake Forensics
10 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Mini Project PPT, Sumit Malan

Uploaded by

Mini Project PPT, Sumit Malan

Uploaded by

Graphic Era

Submitted by:- Sumit Malan

1. Data Exploration and Analysis Exploratory:

Experiment 2 - Undersampled Dataset: Post all the preprocessing steps (as

Experiment 3 - Oversampled Dataset:

Another thing we learned is about the leaky features. While exploring

We learned about the curse of dimensionality while dealing with

We can conclude that Australian weather is uncertain and there is no such

As we have a huge amount of data, we can apply Deep Learning models

You might also like