Batch 04 Documentation 1
Batch 04 Documentation 1
MACHINE LEARNING
Thesis/Dissertation submitted in the partial fulfillment of the requirements for the award of the
degree of
BACHELOR OF TECHNOLOGY
in
CSE(Data Science)
by
L. PREMCHAND 20K91A6721
K. DURGA NIKHIL 20K91A6715
V.ROOPA REDDY 20K91A6750
K.AKHIL SAI 21K95A6706
We, Mr. L. Premchand bearing Hall Ticket Number: 20K91A6721, Mr. K. Durga
Nikhil bearing Hall Ticket Number: 20K91A6715, Ms. V. Roopa Reddy bearing Hall
Ticket Number: 20K91A6750, Mr. K. Akhil Sai bearing Hall Ticket Number:
21K95A6706 hereby declare that the major project report titled “AIRLINE DATA
ANALYTICS USING MACHINE LEARNING” under the guidance of Mr. S.
Rajarajacholan Asst.Prof in Department of Computer Science and Engineering (Data
Science) is submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in CSE(Data Science).
SUBMITTED BY,
L.Premchand 20K91A6721
K.Durga Nikhil 20K91A6715
V.Roopa Reddy 20K91A6750
K.Akhil Sai 21K95A6706
CERTIFICATE
This is to certify that the major project report entitled AIRLINE DATA
ANALYTICS USING MACHINE LEARNING, being submitted by Mr.
L.PremChand, bearing Roll.No:.20K91A6721, Mr. K.DurgaNikhil, bearing
Roll.No:.20K91A6715, Ms. V.RoopaReddy, bearing Roll.No:.20K91A6750,
Mr. K.Akhil Sai, bearing Roll.No:.21K95A6706 in partial fulfillment of
requirements for the award of degree of Bachelor of Technology in CSE(Data
Science), to the TKR College of Engineering & Technology is a record of
bonafide work carried out by them under my guidance and supervision.
By,
L.Premchand 20K91A6721
K.Durga Nikhil 20K91A6715
V.Roopa Reddy 20K91A6750
K.Akhil Sai 21K95A6706
TABLE OF CONTENTS
ABSTRACT i
LIST OF FIGURES ii
1 INTRODUCTION 1
1.1 Motivation 2
1.2 Problem definition 3
1.3 Limitations of existing system 4
1.4 Proposed system 5
2 LITERATURE REVIEW
2.1 Paper-1 5
2.2 Paper-2 6
2.3 Paper-3 7
2.4 Paper-4 8
2.5 Paper-5 9
2.6 Paper-6 10
2.7 Paper-7 11
2.8 Literature Survey Conclusion 12
3 REQUIREMENTS ANALYSIS 13
3.1 Functional Requirements 14
3.2 Non-Functional Requirements 15
4 DESIGN
4.1 Data flow diagrams 16
4.2 System architecture 17
4.3 Taxonomy of Flight Delay Prediction 18
4.4 Overview of Classification Approach 18
4.5 Methodology of Flight Delay Prediction 19
4.6 Data Collection Model 20
5 CODING 21
7.2 Validation 53
7.3 Conclusion 51
8 CONCLUSION 53
REFERENCES 54
ABSTRACT
vii
LIST OF FIGURES
vii
i
Chapter 1
INTRODUCTION
1.1 Motivation
Developing an integrated framework for flight delay prediction, fare price estimation,
and customer satisfaction enhancement in the airline industry is motivated by a multifaceted set
of considerations. Firstly, the imperative of operational efficiency drives the need for accurate
flight delay predictions. By leveraging advanced data analytics and machine learning, airlines
can proactively manage disruptions, optimize crew schedules, and enhance overall punctuality,
contributing to streamlined operations and improved resource utilization. Secondly, the dynamic
nature of the aviation market necessitates a strategic approach to pricing. The integration of
machine learning algorithms for fare price estimation enables airlines to adapt in real-time to
market dynamics, maximizing revenue, attracting a broader passenger base, and maintaining a
competitive edge.
Lastly, a commitment to customer satisfaction propels the incorporation of sentiment
analysis and feedback mining. Understanding passenger preferences and promptly addressing
concerns contributes to an enhanced overall passenger experience, fostering increased loyalty,
positive brand perception, and a competitive advantage in the market.
The overarching motivation for this integrated framework extends beyond individual
components to provide a comprehensive solution. It aims to give airlines a strategic advantage,
ensuring adaptability to market fluctuations and fostering sustained success in the highly
competitive and dynamic aviation industry. Through the simultaneous optimization of
operational efficiency, revenue generation, and customer satisfaction, the framework seeks to
revolutionize airline management, creating a holistic and adaptive approach to address the
challenges of the industry.
This integrated framework acknowledges the industry's relentless pace and multifaceted
demands. It seeks to empower airlines with a holistic toolset, allowing them to navigate
uncertainties with resilience. The synergy of accurate flight delay predictions, adaptive pricing
strategies, and a customer-centric focus positions the framework as a strategic enabler for
airlines aiming not only to survive but thrive in a challenging and competitive aviation
landscape. In essence, the motivation is grounded in the pursuit of a comprehensive solution that
enhances operational excellence, revenue maximization, and passenger satisfaction, creating a
paradigm shift in airline management.
1
1.2 Problem definition
The aviation industry faces multifaceted challenges that impact operational efficiency,
revenue management, and customer satisfaction. This problem statement aims to address these
challenges through an integrated data analytics and machine learning approach. The primary
objectives include:
Airlines contend with the operational and financial ramifications of flight delays. The
challenge is to develop a robust machine learning model that can accurately predict potential
flight delays by analyzing historical data, weather conditions, air traffic patterns, and other
relevant parameters. The goal is to enable airlines to implement proactive measures such as
optimized crew scheduling and real-time adjustments to mitigate delays and enhance overall
punctuality.
Pricing strategies are crucial for airlines to stay competitive and maximize revenue. The
problem involves leveraging machine learning algorithms to analyze market dynamics,
competitor pricing, demand trends, and other variables for accurate fare price prediction. The
objective is to create a dynamic pricing system that adapts in real-time, enabling airlines to
optimize ticket prices, attract a broader passenger base, and maintain a competitive edge in the
market.
2
1.3 Limitations of existing system
Existing systems in airline data analytics exhibit several limitations that hinder their
effectiveness in addressing key challenges within the industry. One significant drawback lies in
the limited predictive accuracy of current flight delay prediction models. Traditional approaches
often struggle to incorporate real-time data on dynamic factors such as changing weather
conditions and air traffic patterns, leading to suboptimal performance in accurately forecasting
potential delays.
Furthermore, the presence of data silos and integration challenges adds another layer of
complexity to existing airline data analytics systems. Airline data is frequently
compartmentalized across different departments and systems, hindering seamless integration.
This fragmentation limits the ability of current systems to provide a holistic view of operations,
potentially obstructing the development of a comprehensive understanding of the intricate
interdependencies within the airline ecosystem. Overcoming these limitations is crucial for the
development of more robust and adaptive systems that can effectively meet the evolving
demands of the aviation industry.
3
1.4 Proposed system
The proposed advanced airline data analytics system is designed to overcome the
limitations inherent in existing frameworks by introducing a comprehensive and adaptive
solution. At the core of this system is a dynamic flight delay prediction model that harnesses
advanced machine learning algorithms to continuously analyze real-time data streams,
including weather conditions, air traffic patterns, and historical flight data. This approach
enhances the accuracy of predictions, enabling proactive measures to minimize disruptions and
improve overall punctuality.
Complementing this is an adaptive fare price estimation mechanism that responds in real-
time to market dynamics, competitor pricing strategies, and changing demand patterns. By
leveraging machine learning algorithms, the system can dynamically adjust ticket prices,
maximizing revenue and ensuring a competitive edge for airlines in the dynamic aviation
market.
The proposed system also places a strong emphasis on real-time customer satisfaction
enhancement. Integrating sentiment analysis and feedback mining techniques, the system
gauges passenger satisfaction levels promptly. This customer-centric approach facilitates
tailored services, quick issue resolution, and continuous improvement, fostering enhanced
passenger loyalty and positive brand perception.
Addressing data silos and integration challenges, the system adopts a centralized data
architecture, breaking down departmental barriers and providing a holistic view of airline
operations. A user-friendly dashboard offers real-time insights into flight operations, pricing
dynamics, and customer satisfaction metrics, empowering airline staff to make informed, data-
driven decisions.
Crucially, the proposed system adopts a continuous learning and optimization approach.
Machine learning algorithms continuously learn from new data, ensuring ongoing optimization
of predictive models, pricing strategies, and customer satisfaction initiatives. This adaptive
learning mechanism positions the system to remain resilient and effective in navigating the
ever-evolving landscape of the aviation industry.
4
Chapter 2
LITERATURE REVIEW
2.1 PAPER-1
Title: Flight Delay Prediction With Priority Information of Weather and Non-Weather Features
Authors: Qiang Li, Ranzhe Jing, and Zhijie Sasha Dong
Description: Flight delay prediction is crucial for intelligent airport management systems,
relying on historical data and various features to estimate potential delays. This paper addresses
the complexity of flight delays by categorizing influencing factors into weather features (e.g.,
temperature, humidity, wind speed) and non-weather features (day-of-month, day-of-week,
scheduled departure, and arrival time). It identifies that weather features predominantly affect
delays during adverse weather conditions, while non-weather features become more significant
when weather conditions improve. Consequently, there's a need to prioritize weather and non-
weather features in prediction models. They develop a clustering algorithm-based analysis
approach to evaluate the impact of these features on flight delays and embed a probability
sampling method in the feature selection stage to choose influential features. Experiments
conducted on U.S. domestic flights in July 2018 demonstrate that the proposed model
significantly enhances flight delay prediction accuracy.
Merits:
1. Feature Prioritization: The model considers the priority of weather and non-weather
features, reflecting their varying impacts on flight delays, thereby improving prediction
accuracy.
2. Innovative Approach: The integration of clustering algorithms and probability sampling
methods adds sophistication to the prediction model, potentially enhancing its
effectiveness.
Demerits:
1. Limited Scope: The study focuses on U.S. domestic flights in a specific month (July
2018), potentially limiting the generalizability of findings to other regions or time
periods.
2. Data Dependency: The effectiveness of the proposed model heavily relies on the
availability and quality of historical flight data, which may vary across different airports
and regions.
3. Complexity: The integration of clustering algorithms and probability sampling methods
may increase the complexity of the prediction model, making it less accessible for users
5
without advanced technical expertise.
2.2 PAPER-2
Title: A Deep Learning Approach for Flight Delay Prediction Through Time-Evolving Graphs
Authors: Kaiquan Cai, Yue Li, Yi-Ping Fang, and Yanbo Zhu
Description: This paper addresses the challenge of flight delay prediction by investigating it
from a network perspective, specifically considering the multi-airport scenario. Unlike
previous works that often focus on single-airport scenarios, this study recognizes the
importance of time-varying spatial interactions within airport networks. To model the time-
evolving and periodic graph-structured information inherent in airport networks, the authors
propose a flight delay prediction approach based on the graph convolutional neural network
(GCN).
The paper proposes a solution to address the challenge of incorporating both delay time-series
and dynamic graph structures into GCN. It introduces a temporal convolutional block
leveraging the Markov property. This block aims to capture the evolving patterns of flight
delays across a series of graph snapshots. Additionally, to address potential issues like
unknown occasional air routes during emergencies leading to incomplete graph-structured
inputs for GCN, an adaptive graph convolutional block is integrated into the proposed method
to uncover spatial interactions hidden in airport networks.These results underscore the potential
of deep learning approaches based on graph-structured inputs in tackling the flight delay
prediction problem.
Merits:
1. Network Perspective: Recognizing flight delays as a network problem, the approach
captures time-varying spatial interactions across multiple airports, potentially providing
more comprehensive insights into delay patterns.
2. Performance Improvement: The proposed approach outperforms benchmark methods,
indicating its efficacy in enhancing prediction accuracy.
3. Scalability: By considering multi-airport scenarios, the methodology has the potential to
scale to larger airport networks, accommodating more complex spatial relationships.
Demerits:
1. Complexity: The integration of multiple deep learning techniques, including GCNs,
temporal convolutional blocks, and adaptive graph convolutional blocks, may increase
the complexity of the model and require significant computational resources.
2. Data Dependency: Like many deep learning approaches, the effectiveness of the
proposed method heavily relies on the availability and quality of historical flight data,
6
which may vary across different regions and time periods.
3. Interpretability: Deep learning models, especially those involving complex architectures
like GCNs, may lack interpretability, making it challenging to understand the underlying
factors driving prediction outcomes.
2.3 PAPER-3
Title: Spatiotemporal Propagation Learning for Network-Wide Flight Delay Prediction
Authors: Yuankai Wu, Hongyu Yang, Yi Lin, and Hong Liu
Description: This paper introduces the SpatioTemporal Propagation Network (STPN) as a
novel solution to the challenge of accurately predicting flight delays while considering
spatiotemporal dependencies and external factors that contribute to delay propagation in the
aviation industry. STPN is a space-time separable graph convolutional network designed to
model delay propagation by integrating spatial and temporal factors. It employs a multi-graph
convolution model to capture geographic proximity and airline schedules from a spatial
perspective, while utilizing a multi-head self-attention mechanism to learn various types of
temporal dependencies in delay time series. The model is trained end-to-end and is capable of
explicitly accounting for both spatial and temporal aspects of delay propagation. Experimental
evaluations conducted on real-world delay datasets demonstrate that STPN surpasses state-of-
the-art methods for multi-step ahead arrival and departure delay prediction in large-scale
airport networks.The comprehensive experiments establish STPN as a robust benchmark for
general spatiotemporal forecasting in the aviation domain. The code for STPN is publicly
available for further research and development.
Merits:
1. Incorporation of Spatiotemporal Dependencies: STPN effectively integrates spatial and
temporal factors, capturing the complex dependencies underlying delay propagation in
airport networks.
2. Performance Improvement: Experimental results demonstrate that STPN outperforms
state-of-the-art methods for both arrival and departure delay prediction, establishing it as
a superior solution in large-scale airport networks.
Demerits:
1. Complexity: The architecture of STPN, involving space-time separable graph
convolutional networks and multi-head self-attention mechanisms, may introduce
computational complexity and require significant resources for training and inference.
2. Data Dependency: The effectiveness of STPN heavily relies on the availability and
quality of historical flight delay datasets, which may vary across different regions and
time periods.
7
3. Model Interpretability: While STPN generates interpretable counterfactuals, the overall
interpretability of deep learning models, especially those with complex architectures,
may still pose challenges for stakeholders.
2.4 PAPER-4
Title: Exploratory Data Analysis on Aviation Dataset
Authors: Saba Firdous, Haseeba Fathiya, Lipsa Sadath
Description: This paper presents an exploratory data analysis conducted on a flight dataset,
aimed at extracting valuable insights into arrival and departure delays and uncovering
relationships between flight timings and delays. With the burgeoning utilization of big data
analytics, particularly in industries like aviation, there's a growing interest in leveraging past data
to inform decision-making processes. The study delves into various aspects of flight delays,
utilizing the dataset to identify flights most prone to delays and drawing conclusions useful for
future flight selections. By analyzing the flight delay data, the authors aim to provide actionable
insights that can aid stakeholders in making informed decisions regarding flight scheduling and
management.
Merits:
1. Insight Generation: The exploratory data analysis facilitates the extraction of valuable
insights from the flight dataset, providing useful information on arrival and departure
delays.
2. Practical Applications: The findings of the analysis, such as identifying flights most
prone to delays, have practical implications for stakeholders in the aviation industry,
aiding in decision-making processes.
3. Informative for Future Planning: Conclusions drawn from the analysis can inform future
flight scheduling and management strategies, potentially improving operational
efficiency and customer satisfaction.
4. Contribution to Knowledge: The study contributes to the body of knowledge in aviation
data analytics by providing empirical evidence and insights into flight delays and related
factors.
Demerits:
1. Limited Scope: The exploratory data analysis may be limited in scope, focusing primarily
on descriptive statistics and basic relationships between variables, potentially
overlooking more complex patterns or causal relationships.
2. Data Quality: The effectiveness of the analysis heavily depends on the quality and
completeness of the flight dataset used, which may vary and affect the reliability of the
8
conclusions drawn.
3. Lack of Advanced Techniques: The analysis may lack the utilization of advanced
statistical or machine learning techniques, limiting the depth of insights that can be
derived from the data.
2.5 PAPER-5
Title: Flight Delay Prediction Based on Aviation Big Data and Machine Learning
Authors: Guan Gui, Fan Liu, Jinlong Sun, Jie Yang, Ziqi Zhou, and Dongxu Zhao
Description: This paper presents a comprehensive approach to flight delay prediction using
aviation big data and machine learning techniques. The authors compare several machine
learning-based models in generalized flight delay prediction tasks, leveraging diverse datasets
comprising automatic dependent surveillance-broadcast (ADS-B) messages, weather
conditions, flight schedules, and airport information. The prediction tasks encompass various
classification and regression tasks to address different aspects of flight delay prediction.
Experimental results indicate that while long short-term memory (LSTM) models exhibit
potential in handling aviation sequence data, they are prone to overfitting in the limited dataset.
In contrast, the proposed random forest-based model demonstrates higher prediction accuracy,
reaching 90.2% for binary classification, and effectively mitigates the overfitting issue.
Merits:
1. Broad Scope: The study explores a wide range of factors influencing flight delays,
incorporating diverse datasets such as ADS-B messages, weather conditions, flight
schedules, and airport information, leading to a more comprehensive understanding of
delay prediction.
2. Comparison of Machine Learning Models: The authors compare several machine
learning-based models, allowing for an evaluation of their effectiveness in predicting
flight delays, thereby informing the selection of the most suitable approach.
3. Practical Application: The research has practical implications for the airline industry,
offering insights and methodologies that can enhance the efficiency of flight scheduling
and management.
Demerits:
1. Limited Dataset: The study highlights the challenge of overfitting in the limited dataset,
indicating potential limitations in the generalizability of the findings to larger datasets or
different contexts.
2. Model Complexity: While random forest models demonstrate higher prediction accuracy,
they may lack the interpretability of simpler models, potentially hindering the
9
understanding of underlying factors driving flight delays.
3. Data Integration Challenges: Integrating diverse datasets from multiple sources, such as
ADS-B messages and weather data, may pose challenges in data preprocessing and
feature engineering, potentially affecting the quality of predictions.
2.6 PAPER-6
Title: A Holistic Approach on Airfare Price Prediction Using Machine Learning Techniques
Authors: Theofanis Kalampokas, Konstantinos Tziridis, Nikolaos Kalampokas, Alexandros
Nikolaou, Eleni Vrochidou, and George A. Papakostas
Description: This paper presents a comprehensive analysis of airfare price prediction using
machine learning techniques, with the aim of identifying similarities in pricing policies among
different airline companies. In response to the dynamic nature of airline ticket pricing and the
globalization of markets, where competitiveness is key, the study explores the application of
artificial intelligence (AI) models for predicting airfare prices. A dataset comprising 136,917
flights from Aegean, Turkish, Austrian, and Lufthansa Airlines to six popular international
destinations is utilized. To address the airfare price prediction problem, the study considers 16
model architectures from three domains: Machine Learning (ML) with eight state-of-the-art
models, Deep Learning (DL) with six CNN models, and Quantum Machine Learning (QML)
with two models. Experimental results demonstrate the effectiveness of these models, with
accuracies ranging from 89% to 99% in the regression problem for different international
destinations and airline companies.
Merits:
1. Comprehensive Analysis: The study provides a comprehensive analysis of airfare price
prediction, considering both destination-based and airline-based evaluations.
2. Utilization of AI Models: By leveraging machine learning, deep learning, and quantum
machine learning techniques, the study explores various approaches to addressing the
airfare price prediction problem, demonstrating the versatility of AI in this domain.
3. Practical Implications: The findings have practical implications for end users seeking
affordable airfare, as well as for airline companies looking to optimize their pricing
strategies.
Demerits:
1. Data Dependency: The effectiveness of the models heavily relies on the quality and
availability of the dataset, which may vary and affect the generalizability of the results.
2. Model Complexity: Some of the models considered, especially those in the deep learning
and quantum machine learning domains, may be complex and require significant
10
computational resources for training and inference.
3. Interpretability: While the models demonstrate high accuracy, their interpretability may
be limited, making it challenging to understand the factors driving airfare price
predictions.
2.7 PAPER-7
Title: Contextualized Recommendation of Aviation Ancillary Services Based on Passenger
Portraits.
Authors: Yingmin Zhang, Wenquan Luo, Min Li, Tingting Chen
Description: This article addresses the critical need for personalized recommendations of
ancillary aviation services for Chinese domestic airlines to optimize profits. Traditional
collaborative filtering (CF) algorithms face challenges in accurately recommending services
due to the sparsity of massive data sets. To overcome this limitation, the study incorporates
instant contextualized travel-related factors into the modeling of air passengers' ancillary
service preferences, enabling dynamic recommendations. Additionally, the article proposes a
four-tuple approach to construct a contextual ontology, serving as a conceptual model for
ancillary aviation services. By integrating both online and offline contextualized data, the
approach aims to provide tailored recommendations that enhance the passenger experience and
contribute to airline profitability.
Merits:
1. Personalized Recommendations: By incorporating contextualized travel-related factors,
the approach enables personalized recommendations of ancillary aviation services,
catering to the unique preferences and needs of individual passengers.
2. Dynamic Recommendations: The inclusion of instant contextualized data allows for
dynamic recommendations, adapting to changes in passenger behavior and preferences in
real-time.
3. Enhanced Modeling: The four-tuple approach to constructing a contextual ontology
provides a comprehensive conceptual model for understanding and representing ancillary
aviation services, enhancing the accuracy and relevance of recommendations.
Demerits:
1. Data Complexity: Integrating massive amounts of contextualized online and offline data
may pose challenges in data processing, storage, and analysis, potentially increasing
computational complexity.
2. Algorithm Scalability: The dynamic nature of recommendations may require
11
sophisticated algorithms capable of processing and adapting to large volumes of data in
real-time, which could be resource-intensive.
3. Privacy Concerns: Collecting and utilizing personal data for recommendation purposes
may raise privacy concerns among passengers, necessitating robust data protection
measures and compliance with regulatory requirements.
12
Chapter 3
REQUIREMENTS ANALYSIS
From a functional perspective, the system's core capabilities should encompass dynamic
flight delay prediction. This involves implementing a sophisticated machine learning model
capable of analyzing real-time data on weather conditions, air traffic patterns, and historical
flight performance. The goal is to offer actionable insights that empower airlines to take
proactive measures, optimizing crew schedules and making real-time adjustments to mitigate
potential disruptions and enhance overall punctuality.
Additionally, the system should address the complexities of fare pricing by integrating
machine learning algorithms. This functionality requires real-time analysis of market dynamics,
competitor pricing strategies, and evolving demand trends. The outcome should be an adaptive
pricing model that can dynamically adjust ticket prices, ensuring airlines can maximize revenue
and remain competitive in the dynamic aviation market.
13
time feedback, leading to enhanced passenger loyalty, positive brand perception, and an overall
improvement in the travel experience.
Beyond these core functionalities, an effective system should also address data
management challenges. This involves implementing a centralized data architecture to break
down silos and integrate diverse data sources across different departments. A unified data
repository provides a holistic view of airline operations, supporting more informed decision-
making and comprehensive analytics.
Finally, continuous learning and optimization are imperative for the system's
adaptability. Machine learning algorithms should be designed to continuously learn from new
data, ensuring the ongoing optimization of predictive models, pricing strategies, and customer
satisfaction initiatives. These functionalities collectively form the foundation of a robust and
adaptive advanced airline data analytics system that can revolutionize how airlines manage their
operations and enhance the overall passenger experience.
In crafting an advanced airline data analytics system, the functional requirements delineate
the core capabilities essential for addressing the multifaceted challenges within the aviation
industry. A paramount functionality is the development and integration of a sophisticated
machine learning model dedicated to flight delay prediction. This model should dynamically
analyze real-time data streams, incorporating variables such as weather conditions, air traffic
patterns, and historical flight performance. The aim is to provide actionable insights that
empower airlines to implement proactive measures, optimizing crew schedules and making
real-time adjustments to enhance overall punctuality.
Complementing this, the system must integrate machine learning algorithms within the
fare price estimation module. This functionality requires real-time analysis of market
dynamics, competitor pricing strategies, and evolving demand trends. The outcome should be
an adaptive pricing model capable of dynamically adjusting ticket prices. This adaptability
ensures that airlines can maximize revenue and remain competitive in the ever-changing
14
landscape of the aviation market.
Beyond the functional aspects, non-functional requirements form the backbone of the
system, ensuring its security, scalability, responsiveness, and reliability. Security measures must
be robust, safeguarding sensitive airline data and adhering to industry standards and regulations
to ensure data privacy and protection against unauthorized access.
In terms of scalability, the system must be designed to accommodate the growing volume
of data and user interactions over time. The architecture should support increased load and data
processing demands without compromising performance. Responsiveness is critical,
necessitating the development of a system that provides real-time insights to support timely
decision-making by users. Minimizing latency in data processing and reporting enhances the
overall user experience.
15
Chapter 4
DESIGN
16
4.2 SYSTEM ARCHITECTURE
17
4.3 TAXONOMY OF FLIGHT DELAY PREDICTION PROBLEM
18
4.5 METHODOLOGY OF FLIGHT DELAY PREDICTION
19
4.6 DATA COLLECTING MODEL
20
Chapter 5
CODING
PyCharm Community edition supports Jupyter notebooks in read-only mode, to get full support
for local notebooks download and try PyCharm Professional now!
Try DataSpell — a dedicated IDE for data science, with full support for local and remnotebooks
Try Datalore — an online environment for Jupyter notebooks in the browser
Also read more about JetBrains Data Solutions on our website
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
flights=pd.read_csv('flights.csv')
flights=flights.sample(n=5000)
flights.head
flights=pd.read_csv('flights.csv')
flights.shape
(5000, 31)
flights.isnull().values.any()
True
flights.isnull().sum()
YEAR 0
MONTH 0
DAY 0
DAY_OF_WEEK 0
AIRLINE 0
FLIGHT_NUMBER 0
TAIL_NUMBER 13
ORIGIN_AIRPORT 0
DESTINATION_AIRPORT 0
SCHEDULED_DEPARTURE 0
DEPARTURE_TIME 77
DEPARTURE_DELAY 77
21
TAXI_OUT 80
WHEELS_OFF 80
SCHEDULED_TIME 0
ELAPSED_TIME 100
AIR_TIME 100
DISTANCE 0
WHEELS_ON 86
TAXI_IN 86
SCHEDULED_ARRIVAL 0
ARRIVAL_TIME 86
ARRIVAL_DELAY 100
DIVERTED 0
CANCELLED 0
CANCELLATION_REASON 4919
AIR_SYSTEM_DELAY 4043
SECURITY_DELAY 4043
AIRLINE_DELAY 4043
LATE_AIRCRAFT_DELAY 4043
WEATHER_DELAY 4043
dtype: int64
[7]
sns.countplot(x='CANCELLATION_REASON',data=flights)
<AxesSubplot:xlabel='CANCELLATION_REASON', ylabel='count'>
22
Reason for Cancellation of flight: A - Airline/Carrier; B - Weather; C - National Air System; D
- Security
We can observe from graph easily that mostly weather is responsible for delays of flight.
[8]
sns.countplot(x="MONTH",hue="CANCELLATION_REASON",data=flights)
<AxesSubplot:xlabel='MONTH', ylabel='count'>
23
plt.figure(figsize=(10, 10))
axis = sns.countplot(x=flights['ORIGIN_AIRPORT'], data =flights, order=flights['ORIGIN_AI
RPORT'].value_counts().iloc[:20].index)
axis.set_xticklabels(axis.get_xticklabels(), rotation=90, ha="right")
plt.tight_layout()
plt.show()
24
axis = plt.subplots(figsize=(10,14))
Name = flights["AIRLINE"].unique()
size = flights["AIRLINE"].value_counts()
plt.pie(size,labels=Name,autopct='%5.0f%%')
plt.show()
25
axis = plt.subplots(figsize=(20,14))
sns.heatmap(flights.corr(),annot = True)
plt.show()
26
Very High Correlation Between Arrival Delay and Departure Delay¶
It shows that maximum of the Arrival Delays are due to the Departure Delays.
[12]
corr=flights.corr()
corr
[13]
variables_to_remove=["YEAR","FLIGHT_NUMBER","TAIL_NUMBER","DEPARTURE_TI
ME","TAXI_OUT","WHEELS_OFF","ELAPSED_TIME","AIR_TIME","WHEELS_ON","TA
XI_IN","ARRIVAL_TIME","DIVERTED","CANCELLED","CANCELLATION_REASON","
AIR_SYSTEM_DELAY", "SECURITY_DELAY","AIRLINE_DELAY","LATE_AIRCRAFT_
DELAY","WEATHER_DELAY","SCHEDULED_TIME","SCHEDULED_ARRIVAL"]
flights.drop(variables_to_remove,axis=1,inplace= True)
flights.columns
Index(['MONTH', 'DAY', 'DAY_OF_WEEK', 'AIRLINE', 'ORIGIN_AIRPORT',
'DESTINATION_AIRPORT', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY',
'DISTANCE', 'ARRIVAL_DELAY'],
dtype='object')
[14]
27
airport = pd.read_csv('airports.csv')
airport
[15]
flights.loc[~flights.ORIGIN_AIRPORT.isin(airport.IATA_CODE.values),'ORIGIN_AIRPORT']
='OTHER'
flights.loc[~flights.DESTINATION_AIRPORT.isin(airport.IATA_CODE.values),'DESTINATI
ON_AIRPORT']='OTHER'
flights
[16]
print(flights.ORIGIN_AIRPORT.nunique())
print(flights.DESTINATION_AIRPORT.nunique())
print(flights.AIRLINE.nunique())
245
247
14
[17]
flights=flights.dropna()
flights
[18]
flights.shape
(4900, 10)
[19]
df=pd.DataFrame(flights)
df['DAY_OF_WEEK']= df['DAY_OF_WEEK'].apply(str)
df["DAY_OF_WEEK"].replace({"1":"SUNDAY", "2": "MONDAY", "3": "TUESDAY", "4":"
WEDNESDAY", "5":"THURSDAY", "6":"FRIDAY", "7":"SATURDAY"},inplace=True)
flights
[20]
dums = ['AIRLINE','ORIGIN_AIRPORT','DESTINATION_AIRPORT','DAY_OF_WEEK']
df_cat=pd.get_dummies(df[dums],drop_first=True)
df_cat
[21]
df_cat.columns
Index(['AIRLINE_AS', 'AIRLINE_B6', 'AIRLINE_DL', 'AIRLINE_EV', 'AIRLINE_F9',
'AIRLINE_HA', 'AIRLINE_MQ', 'AIRLINE_NK', 'AIRLINE_OO', 'AIRLINE_UA',
28
...
'DESTINATION_AIRPORT_VPS', 'DESTINATION_AIRPORT_WYS',
'DESTINATION_AIRPORT_XNA', 'DESTINATION_AIRPORT_YUM',
'DAY_OF_WEEK_MONDAY', 'DAY_OF_WEEK_SATURDAY',
'DAY_OF_WEEK_SUNDAY',
'DAY_OF_WEEK_THURSDAY', 'DAY_OF_WEEK_TUESDAY',
'DAY_OF_WEEK_WEDNESDAY'],
dtype='object', length=506)
[22]
df.columns
Index(['MONTH', 'DAY', 'DAY_OF_WEEK', 'AIRLINE', 'ORIGIN_AIRPORT',
'DESTINATION_AIRPORT', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY',
'DISTANCE', 'ARRIVAL_DELAY'],
dtype='object')
[23]
flights.columns
Index(['MONTH', 'DAY', 'DAY_OF_WEEK', 'AIRLINE', 'ORIGIN_AIRPORT',
'DESTINATION_AIRPORT', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY',
'DISTANCE', 'ARRIVAL_DELAY'],
dtype='object')
[24]
var_to_remove=["DAY_OF_WEEK","AIRLINE","ORIGIN_AIRPORT","DESTINATION_AI
RPORT"]
df.drop(var_to_remove,axis=1,inplace=True)
df
[25]
data=pd.concat([df,df_cat],axis=1)
data
[26]
data.shape
(4900, 512)
[28]
final_data = data.sample(n=4000)
final_data
[29]
29
final_data.shape
(4000, 512)
[30]
from sklearn.model_selection import train_test_split
from sklearn import metrics
[31]
X=final_data.drop("DEPARTURE_DELAY",axis=1)
Y=final_data.DEPARTURE_DELAY
[32]
X
[33]
Y
1123360 -10.0
3734853 19.0
3632240 -11.0
1965248 -6.0
1794390 -7.0
...
2873286 -2.0
1567991 -8.0
2464375 14.0
1338384 -5.0
4391207 4.0
Name: DEPARTURE_DELAY, Length: 4000, dtype: float64
[34]
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
[35]
from sklearn.ensemble import RandomForestRegressor
reg_rf = RandomForestRegressor()
reg_rf.fit(X_train,y_train)
RandomForestRegressor()
[36]
y_pred = reg_rf.predict(X_test)
[37]
30
reg_rf.score(X_train,y_train)
0.9887406575063343
[38]
reg_rf.score(X_test,y_test)
0.926441861317447
[39]
metrics.r2_score(y_test,y_pred)
0.926441861317447
[40]
print('MAE:', metrics.mean_absolute_error(y_test,y_pred))
print('MSE:', metrics.mean_squared_error(y_test,y_pred))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test,y_pred)))
MAE: 6.382899999999999
MSE: 116.8092135
RMSE: 10.807831119146893
[41]
pp=pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
pp
[42]
31
ring='neg_mean_squared_error', n_iter = 10, cv = 5, verbose=2, random_state=42, n_jobs = 1)
[45]
[46]
rf_random.best_params_
{'n_estimators': 61,
'min_samples_split': 5,
'min_samples_leaf': 5,
'max_features': 'auto',
'max_depth': 15}
[47]
p=rf_random.predict(X_test)
[48]
metrics.r2_score(y_test,p)
0.9183853390631461
[49]
print('MAE:', metrics.mean_absolute_error(y_test,p))
print('MSE:', metrics.mean_squared_error(y_test,p))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test,p)))
MAE: 6.446730396836321
MSE: 129.60284918634386
RMSE: 11.384324713672914
[50]
zz=pd.DataFrame({'Actual':y_test,'Predicted':p})
zz
[51]
[52]
GBR=gbr.fit(X_train,y_train)
pre=GBR.predict(X_test)
32
[53]
print('MAE:', metrics.mean_absolute_error(y_test,pre))
print('MSE:', metrics.mean_squared_error(y_test,pre))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test,pre)))
MAE: 6.346691657004585
MSE: 120.8803082663399
RMSE: 10.994558120558548
[54]
metrics.r2_score(y_test,pre)
0.9238781752481778
[55]
gg=pd.DataFrame({'Actual':y_test,'Predicted':pre})
gg
[56]
PyCharm Community edition supports Jupyter notebooks in read-only mode, to get full support
for local notebooks download and try PyCharm Professional now!
Try DataSpell — a dedicated IDE for data science, with full support for local and remote
notebooks
Try Datalore — an online environment for Jupyter notebooks in the browser Also read more
about JetBrains Data Solutions on our website
1.Inroduction Understanding passenger satisfaction is crucial for airlines.Its very important for
imporving the services of Airline thats help the passenger for happy and healthy journey.
[69]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
33
warnings.filterwarnings("ignore")
[70]
passenger=pd.read_csv("Airline_Passenger_data.csv
[71]
passenger. head
[72]
Passenger .tail.
[73]
passenger. sample(10)
[74]
f. Display satistical information like mean ,median, mode,quartile (Q1 , Q2, Q3) ,standard
deviation, minimum and maximum and total number of entries in the Airline Passenger
Satsifaction Survey.
[77]
passenger.describe.
[78]
print('Average age:',passenger['Age'].mean())
[79]
passenger.sort_values(['Age']
34
[80]
passenger[passenger['Gender']=='Female']
[81]
passenger[passenger['Age']<=30]
3.Data Processing
Checking the null values or entry are present or not Airline Passenger Satisfaction Survey
Drop the null values are present in Arrival Delay in Minutes coulmns
[83]
passenger=passenger.dropna()
[84]
Check the overall count or entries of passenger after removing the missing value or null values
passenger[passenger.duplicated]
Encoding Types of Encoding:- 1. Label or Ordinal Encoding :- Ordinal Encoding:- In ordinal
encoding, each unique category value is assigned an integer value.This ordinal encoding
transform is available in the scikit-learn Python machine learning library via the OrdinalEncoder
class. 2. One hot Encoding (get dummies):- One hot encoding is a technique that we use to
represent categorical variables as numerical values in a machine learning model.
[87]
from sklearn.preprocessing import OrdinalEncoder
[88]
x=passenger.iloc[:,:-1]
x
Perform Ordinal Encoding on x variable
[89]
df=OrdinalEncoder)
x[['Gender','Customer Type','Type of Travel','Class']]=df.fit_transform(x[['Gender','Customer Ty
pe','Type of Travel','Class']])
x.tail()
35
Perform Label Encoding on target y variable (satisfaction) column
[90]
passenger['satisfaction']=passenger['satisfaction'].map({'satisfied':1,'neutral or dissatisfied':0
[91]
y=passenger.iloc[:,-1]
y
#Target variable y is satisfaction either it is 0 0r 1
0 0
1 0
2 1
3 0
4
1
994 1
995 1
996 0
997 0
998 1
Name: satisfaction, Length: 998, dtype: int64 Data Visualization to understand the distribution
of various Airline features
Perform EDA Exploratory Data Analysis (EDA) is an approach that is used to analyze the data
and discover trends, patterns, or check assumptions in data with the help of statistical
summaries and graphical representations.
Boxplot :-It is a type of chart that depicts a group of numerical data through their quartiles. It is
a simple way to visualize the shape of our data. It makes comparing characteristics of data
between categories very easy.
[92]
36
[93]
sns.boxplot(x=passenger['Flight Distance'])
<Axes: xlabel='Flight Distance'>
[94]
sns.boxplot(x=passenger['Checkin service'])
37
<Axes: xlabel='Checkin service'>
Train and Test datasets are the two key concepts of machine learning, where the training dataset
is used to fit the model, and the test dataset is used to evaluate the model.
[109]
[110]
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=1)
Logistic regression :- Logistic regression is a data analysis technique that uses mathematics to
find the relationships between two data factors. It then uses this relationship to predict the value
of one of those factors based on the other. The prediction usually has a finite number of
outcomes, like yes or no.
[111]
38
from sklearn.linear_model import LogisticRegression
[112]
model=LogisticRegression()
model.fit(xtrain,ytrain)
[113]
ypred=model.predict(xtest)
ypred
array([0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0,
1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0,
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0,
1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
1, 0])
Classification of Model
Predictive Modeling
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data.his technique can be performed on
structured or unstructured data and its main goal is to identify the category or class to which a
new data will fall under.
[114]
[115]
print(classification_report(ytest,ypred)
Display Confusion matrix
[116]
print(confusion_matrix(ytest,ypred))
39
model.score(xtest,ytest)
0.78
[121]
40
Chapter 6
Flask serves as a versatile foundation for web applications, offering developers a robust
set of tools and libraries while allowing for extensive customization. Its minimalistic design
philosophy encourages developers to adopt best practices and design patterns suited to their
project's unique requirements. Flask's modular architecture facilitates the integration of third-
party extensions, enabling developers to extend functionality with ease, whether it be
integrating authentication mechanisms, database management systems, or other advanced
features. This extensibility makes Flask a popular choice for building diverse web solutions,
from simple websites to complex enterprise applications, providing developers with the
freedom to tailor their projects to specific needs.
In contrast, Stream lit simplifies the process of creating interactive web applications
focused on data visualization and machine learning models. By abstracting away the
complexities of web development, Stream lit enables data scientists and machine learning
practitioners to quickly prototype and deploy web-based interfaces for their models and
analyses. With Stream lit, developers can leverage familiar Python syntax to create interactive
components such as sliders, buttons, and plots, allowing users to explore data and manipulate
41
parameters in real-time.
6.2.1 Forms
42
Fig 6.2: Signup
43
Fig 6.5:Flight delay Predication
44
6.2.3 Result Analysis
Performance Metrics:
R-squared (R2) Score on Test Data: 0.9264
Mean Absolute Error (MAE): 6.38
Mean Squared Error (MSE): 116.81
Root Mean Squared Error (RMSE): 10.81
Analysis:
The Random Forest Regressor model achieved a high R-squared score of 0.9264
on the test data, indicating that it explains 92.64% of the variance in the target
variable.
The MAE of 6.38 suggests that, on average, the model's predictions are off by
approximately 6.38 minutes from the actual departure delay.
The RMSE of 10.81 indicates the average deviation of the model's predictions
from the actual values is approximately 10.81 minutes.
The MAE of 6.35 suggests that, on average, the model's predictions are off by
approximately 6.35 minutes from the actual departure delay, which is similar to
Random Forest Regressor.
The RMSE of 10.99 indicates the average deviation of the model's predictions
from the actual values is approximately 10.99 minutes, also similar to Random
45
Forest Regressor.
The confusion matrix reveals that out of 105 instances of class 0, 83 are correctly
classified (True Negatives), while 22 are incorrectly classified as class 1 (False Positives).
Similarly, out of 95 instances of class 1, 73 are correctly classified (True Positives), while 22
are incorrectly classified as class 0 (False Negatives). The model demonstrates a training
accuracy of 82.21% and a testing accuracy of 88.00%, indicating good generalization
performance.
Data Loading and Exploration: You loaded the dataset, checked for missing values, and
performed exploratory data analysis (EDA) to understand the structure and distribution of the
data.
Data Preprocessing: You preprocessed the data by handling categorical variables, converting
datetime features, and extracting relevant information from text features.
Feature Engineering: You created new features like day and month of journey, hour, and
minute of departure and arrival times, and extracted duration hours and minutes from the
duration column.
Handling Categorical Data: You encoded categorical variables using techniques like one-hot
encoding and label encoding.
Feature Selection: You used techniques like correlation analysis and feature importance using
ExtraTreesRegressor to select the most relevant features.
46
Model Building: You built a Random Forest Regression model to predict flight prices.
Model Evaluation: You evaluated the model using metrics like Mean Absolute Error (MAE),
Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared score.
Chapter 7
Designing test cases and scenarios for airline data analytics using machine learning
projects involves ensuring that the system functions accurately, efficiently, and reliably. Here's a
structured approach to designing test cases and scenarios:
Data Preprocessing:
Test Case 1: Verify that missing values in the dataset are handled appropriately (e.g., imputation,
deletion).
Test Case 2: Ensure that categorical variables are encoded properly (e.g., one-hot encoding, label
encoding).
Test Case 3: Validate that data scaling or normalization is applied correctly to numerical
47
features.
Feature Engineering:
Test Case 4: Confirm that feature selection techniques (e.g., correlation analysis, feature
importance) are applied accurately.
Test Case 5: Validate the creation of new features from existing ones (e.g., feature
transformations, interaction terms).
Model Training:
Test Case 6: Check that the appropriate machine learning algorithms are selected based on the
problem (e.g., regression, classification).
Test Case 7: Ensure that hyperparameters tuning is performed correctly (e.g., using cross-
validation, grid search).
Test Case 8: Validate the splitting of data into training and testing sets and that it is done
randomly and consistently.
Model Evaluation:
Test Case 9: Verify the accuracy of the model's predictions against a baseline (e.g., simple
heuristics).
Test Case 10: Validate model performance metrics (e.g., accuracy, precision, recall, F1-score).
Test Case 11: Ensure that the model's generalization ability is tested with unseen data (e.g.,
cross-validation, holdout set).
48
Test Case 19: Validate compliance with data protection regulations (e.g., GDPR, HIPAA).
Performance Testing:
Test Case 20: Measure the computational resources (e.g., memory, processing time) required for
model training and prediction.
Test Case 21: Test the scalability of the system to handle large volumes of data efficiently.
User Acceptance Testing:
Test Case 22: Engage stakeholders to validate that the system meets their requirements and
expectations.
Test Case 23: Solicit feedback from end-users to identify areas for improvement and
enhancement.
Documentation and Maintenance:
Test Case 24: Ensure that comprehensive documentation is provided for the system, including
model architecture, data sources, and usage instructions.
Test Case 25: Validate that the system is maintainable and easily upgradable with future
enhancements or bug fixes.
By following this structured approach and tailoring it to the specific requirements of your airline
data analytics project, you can ensure the reliability, accuracy, and efficiency of your machine
learning solution.
7.2 Validation
Validating machine learning models for airline data analytics involves several key steps
to ensure the reliability and accuracy of the models. Here's a generalized process you can
follow:
Gather relevant data from various sources such as flight schedules, ticket sales, weather
reports, maintenance logs, etc. This may involve techniques like data imputation, normalization,
and feature engineering.
Conduct EDA to understand the characteristics and patterns in the data. Identify
correlations between different variables and their potential impact on flight operations.
Feature Selection:
49
Select the most relevant features that contribute to the predictive power of the model.
Use techniques like correlation analysis, feature importance, or domain expertise to guide
feature selection.
Choose appropriate machine learning algorithms suitable for the problem at hand (e.g.,
regression for predicting ticket prices, classification for predicting flight delays). Split the data
into training, validation, and test sets. Train the model on the training set and tune
hyperparameters using the validation set.
Evaluation Metrics:
Select appropriate evaluation metrics based on the specific problem. For example,
accuracy, precision, recall, F1-score for classification tasks, Mean Absolute Error (MAE), Mean
Squared Error (MSE), or Root Mean Squared Error (RMSE) for regression tasks. Ensure the
chosen metrics align with business objectives and requirements.
Cross-Validation:
Evaluate the model's performance on the test set, which serves as an independent dataset
not used during training or validation. Assess if the model's performance meets the desired
criteria and business requirements.
Ensure the model's decisions are interpretable and transparent, especially in critical
applications like airline operations. Techniques such as feature importance analysis, SHAP
values, or model-specific interpretation methods can help explain the model's predictions
50
Feedback Loop:
Establish a feedback loop to continuously improve the model based on new data and
insights gained from deployment. By following these steps, you can validate machine learning
models effectively for airline data analytics, ensuring their reliability and usefulness in real-
world applications.
7.3 Conclusion
51
Chapter 10
CONCLUSION
In essence, the advanced airline data analytics system embodies a holistic solution that
not only revolutionizes how airlines manage their operations but also elevates the passenger
experience. The real-time insights derived from data analytics empower airlines to make
informed decisions, adapt to market dynamics, and cultivate lasting customer loyalty. As the
aviation industry evolves, this integrated framework stands as a testament to the potential of
data-driven strategies in shaping the future of air travel
REFERENCES
[2] N, Prabakaran & Kannadasan, Rajendran. (2018). Airline Delay Predictions using
Supervised Machine Learning. International Journal of Pure and Applied Mathematics.
119.
[4] A Model for Accurate Prediction in Geros Data Using Naive Bayes Classifier K Netti
and Y Radhika CSIR-National Geophysical Research Institute, Uppal Road, Hyderabad-
500007, India Received 17 April 2016; revised 15 October 2016; accepted 13 February
53
2017*
[5] Naive bay’s classification algorithm in prediction of Flight delays using MR Ujwala
Urkurde and Prathiba Richardia IJRIT April 2016.
[6] On-Time Flight Departure Prediction System Using Naive Bayes Classification Method
(Case Study: XYZ Airline) IJCTT December 2017
[9] http://blog.echen.me/2011/04/27/choosing-a-machine-learningclassifier/[Dated:3/2/2019
12:26:00].
54