0% found this document useful (0 votes)

74 views28 pages

Airline Search Engine Project

This document describes an airline search engine project that analyzes airline and airport data. The project builds a data pipeline to clean, transform, and load raw data into Spark. It then demonstrates various queries like finding airports in a country, airlines with a certain number of stops, active airlines in the US, and routes between cities. The project utilizes tools like Python, Spark, AWS and shows screenshots of sample outputs.

Uploaded by

nehal siddiqui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views28 pages

Airline Search Engine Project

Uploaded by

nehal siddiqui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Governors State University

OPUS Open Portal to University Scholarship

All Capstone Projects Student Capstone Projects

Fall 2022

Airline Search Engine Project

Arun Kailasa

Follow this and additional works at: https://opus.govst.edu/capstones

Recommended Citation
Kailasa, Arun, "Airline Search Engine Project" (2022). All Capstone Projects. 568.
https://opus.govst.edu/capstones/568

For more information about the academic degree, extended learning, and certificate programs of Governors State
University, go to http://www.govst.edu/Academics/Degree_Programs_and_Certifications/

Visit the Governors State Computer Science Department

This Capstone Project is brought to you for free and open access by the Student Capstone Projects at OPUS Open
Portal to University Scholarship. It has been accepted for inclusion in All Capstone Projects by an authorized
administrator of OPUS Open Portal to University Scholarship. For more information, please contact
[email protected].
AIRLINE SEARCH ENGINE PROJECT
By

Arun Kailasa
B. Tech, Vaagdevi College of Engineering, 2020

GRADUATE CAPSTONE SEMINAR PROJECT

Submitted in partial fulfillment of the requirements

For the Degree of Master of Science,

With a Major in Computer Science

Governors State University

University Park, IL 60484

2022
ABSTRACT

The Airline Search Engine Project is a tool that helps anyone to find the facts/data related to Airlines/Airports. For this
project, the raw data set is available in the .dat format. We are going to use this data, which can be downloaded from
[1].

The tool may also do some first cleaning of the data if needed for forming dimensional data, the cleaning process such
as data value unification, data type and size unification, deduplication, dropping columns, and correcting some known
errors.

The data will be processed with the help of languages like Python and Spark. By storing the data, we can distribute
storage systems such as Hadoop and Amazon S3. The Integrated Development Environment (IDE) used in this project
would be editors such as Google Colab and PyCharm.

This tool can be run as a job in different clusters such as EMR (Elastic MapReduce), HDInsight, Cloudera, and
Databricks. It can solve/derive data by analyzing terra bytes of raw data into useful information. We can create reports
out of it, which Data Analysts, Data Scientists, and businesspeople can use.
Table of Contents
1 Project Description...................................................................................................................................................................................... 3
1.1 Appendix A: ........................................................................................................................................................................................ 3
1.2 Appendix B: ........................................................................................................................................................................................ 3
1.3 Appendix C: ........................................................................................................................................................................................ 2
2 Architecture and flow of the Data Pipeline ................................................................................................................................................ 4
3 Tools and Technologies .............................................................................................................................................................................. 4
4 Project Structure ......................................................................................................................................................................................... 5
5 Project folder Hierarchy.............................................................................................................................................................................. 6
6 Utility Code .................................................................................................................................................................................................. 7
7 Code for creating the Spark session............................................................................................................................................................ 7
8 Transformation and Cleaning..................................................................................................................................................................... 8
9 Complete Project Code: ............................................................................................................................................................................... 9
10 Project Output Screenshots................................................................................................................................................................... 11
10.1 Find a list of Airports operating in the Country X .......................................................................................................................... 11
10.2 Find the list of Airlines having X stops ........................................................................................................................................... 11
10.3 List of Airlines operating with codeshare ........................................................................................................................................ 12
10.4 Find the list of Active Airlines in the United States......................................................................................................................... 12
10.5 Which country (or) territory has the highest number of Airports .................................................................................................. 13
10.6 The top K cities with most Incoming Airlines ................................................................................................................................. 13
10.7 The top K cities with most Outgoing Airlines .................................................................................................................................. 14
10.8 Trip that connects two cities X and Y ........................................................................................................................................... 14
10.9 Trip that connects X and Y with less than Z stops .......................................................................................................................... 15
10.10 All the cities reachable within d hops of a city ........................................................................................................................... 15
10.11 Find list of Airports operating in the Country X ........................................................................................................................ 16
10.12 Find the list of Airlines having X stops ....................................................................................................................................... 16
10.13 List of Airlines operating with code share .................................................................................................................................. 17
10.14 Find the list of Active Airlines in the United States .................................................................................................................... 17

i
10.15 Which country (or) territory has the highest number of Airports .............................................................................................. 18
10.16 The top K cities with most incoming Airlines........................................................................................................................... 18
10.17 The top K cities with most outgoing Airlines ........................................................................................................................... 19
10.18 Trip that connects two cities X and Y ....................................................................................................................................... 19
10.19 Trip that connects X and Y with less than Z stops .................................................................................................................. 20
10.20 All the cities reachable within d hops of a city ............................................................................................................................ 20
11 AWS Output Screenshot ....................................................................................................................................................................... 21
12 Acknowledgement ................................................................................................................................................................................. 23
13 References: ........................................................................................................................................................................................... 23

ii
1 Project Description

This tool is going to process various raw data sets which you can find in Appendix A and from this raw data we can
derive some useful facts which you can find in Appendix B. The tool will process raw data and initially create various
dimensional data models such as Airports, Airlines, Routes, Planes, and Countries tables. The schema of those tables
can be found in Appendix C.

1.1 Appendix A:
The raw data sets are
1) Airport.dat – Which contains information related to Airports such as Airport id, Airport Name, etc.
2) Airlines.dat – Which contains information related to Airlines such as Airline id, Airline name, etc et al. [5].
3) Routes.dat – Which contains information related to routes such as Source Airport, Destination Airport.
4) Plane – Which contains information related to plane such as Plane name, etc.
5) Country – Which contains information related to Country name, iso_code et al. [5].

1.2 Appendix B:
a. Find list of Airports operating in the Country X.

b. Find the list of Airlines having X stops.

c. List of Airlines operating with code share.

d. Find the list of Active Airlines in the United States.

i. Airline aggregation:

e. Which Country (or) Territory has the highest number of Airports.

f. The top K cities with most Incoming/Outgoing Airlines.

i. Trip recommendation:

g. Define a trip as a sequence of connected routes. Find a trip that connects two cities X and Y
(reachability).
h. Find a trip that connects X and Y with less than Z stops (constrained reachability).

i. Find all the cities reachable within d hops of a city (bounded reachability).

a. Fast Transitive closure/connected component implemented in parallel/distributed algorithms.

1
1.3 Appendix C:

Table name Airports

airport_id bigint
Name string
city string

country String

iata String

icao String

latitude Double

longitude Double

altitude Bigint

timezone Double

dst String

tz_database String

type String

source String

2
Table name Airlines
Airlineid bigint
Name string
Alias String
Iata String
Icao String
Callsign String
Country String
active String

Table Name Routes

Airline string
Airlineid String
Source_airport String
Source_airport_id String
Destination_airport string
Destination_airportid string
Codeshare string
Stops Bigint
Equipment string

Table Name Planes

Name String
Iata String
Icao string

Table Name Countries

Name String
Iso_code String
Dafif_code String

3
2 Architecture and flow of the Data Pipeline

The given data set will be uploaded to either the Amazon S3 bucket et al. [4,6] or can be uploaded to Hadoop
attributed filesystem. The uploaded data will be processed with the help of Apache Spark engine et al. [3]. The
Apache Spark engine mostly will be cluster like Amazon Elastic Map Reduce (EMR) service or locally installed Spark.
Once the data is processed, we can store the data again in another Amazon S3 bucket or it can be stored in the HDFS
also. The output data can be viewed with the help of various tools such as Apache Superset, Tableau, Presto query
engine, Amazon Athena et al. [6] or it can be created as another Hive table et al. [3].

Figure 1: Architecture and flow of the Data Pipeline [2].

3 Tools and Technologies

Google Colab, Spark, Python, AWS, PyCharm, HDFS, AWS Resources such as S3 bucket, Identity Access
Management (IAM), AWS Glue Data Catalog, AWS Glue Crawler, AWS Athena, SQL.

4
4 Project Structure

The Airline Search Engine Project is developed with Integrated Development Environment ( IDE) such as
PyCharm et al. [8] and by installing necessary language binaries like PySpark and Spark et al. [3,11].

Figure 2: PySpark version 3.1.2 and Spark version 3.1.2.

5
The pip list command shows the PySpark version used in this project. PySpark version 3.1.2 and Spark version 3.1.2.

Figure 3: pip list command showing PySpark Version.

5 Project folder Hierarchy

A separate project is created for this, and it includes a separate virtual environment to install the necessary project
dependency modules like Pandas et al. [10], NumPy, etc. The folder structure includes a separate folder for data
loading/reading and some util Spark code will be developed and developed folder like the util folder.

Figure 4: Project folder Hierarchy

6
6 Utility Code

Utility code was developed to read the Spark session configuration and to set the Spark configuration at run time as well. The
load_df utility was developed to read the data. You can find the code in the belowscreenshot.

Figure 5 : Utility Code

7 Code for creating the Spark session

Figure 6: Code for creating the Spark session

7
8 Transformation and Cleaning

Doing some transformation and cleaning work like replace strings like “\N” and “- “with na and transformation by
replacing all null values with strings like na. You can find the output in the screen below after this transformation
and cleaning.

Figure 7: Transformation and Cleaning

8
9 Complete Project Code:

Figure 8: Project Code

Figure 9: Complete Project Code

9
Figure 10: Complete Project Code

Figure 11: Spark Session Configuration Code

10
10 Project Output Screenshots

10.1 Find a list of Airports operating in the Country X

spark.sql("select , count() over () as count from airports where country =

'Greenland'").show(100)

Output:

Figure 10.1: Output for list of Airports operating in the Country X (‘GREENLAND’)

10.2 Find the list of Airlines having X stops

spark.sql("select * from routes where stops > 0").show(100)

Output:

Figure 10.2: Output for list of Airlines having X stops

11
10.3 List of Airlines operating with codeshare
spark.sql("select * , count(*) over() as count from routes where codeshare != 'na' ").show(100)

Output:

Figure 10.3: Output for list of Airlines operating with codeshare

10.4 Find the list of Active Airlines in the United States

spark.sql("select *, count(*) over() as count from airlines where country = 'United States' andactive =
'Y'").show(100)

Output:

Figure 10.4: Output for list of active Airlines in the United States

12
10.5 Which country (or) territory has the highest number of Airports
spark.sql("select count(*) as cnt, country from airports group by country order by cnt desc").show(20)

Output:

Figure 10.5: Output for the countries with highest number of Airports

10.6 The top K cities with most Incoming Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

tb2.incoming_flight_count from airports inner join (select count (*) as incoming_flight_count, destinationairportid
from routesgroup by destinationairportid ) tb2 on airports.airportid = tb2.destinationairportid) otb order by
otb.incoming_flight_countdesc""").show(100)

Output:

Figure 10.6: Output for the top cities with most incoming Airlines

13
10.7 The top K cities with most Outgoing Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

tb2.outgoing_flight_count from airports inner join (select count (*) as outgoing_flight_count, sourceairportid from
routes groupby sourceairportid ) tb2 on airports.airportid = tb2.sourceairportid) otb order by otb.outgoing_flight_count
desc""").show(100)

Output:

Figure 10.7: Output for top cities with most outgoing Airlines

10.8 Trip that connects two cities X and Y

spark.sql("""select * from routes where sourceairportid = '2613' and

destinationairportid='2531' """).show(100)

Output:

Figure 10.8: Output for trip that connects two cities X and Y

14
10.9 Trip that connects X and Y with less than Z stops

spark.sql("""select * from routes where sourceairportid = '2613' and

destinationairportid='2531' and stops < 1 """).show(100)

Output:

Figure 10.9: Output for trip that connects X and Y with less than Z stops

10.10 All the cities reachable within d hops of a city

spark.sql("""select destinationairport from routes where stops = 1 """).show(100)

Output:

Figure 10.10: Output for all the cities reachable within d hops of a city

15
10.11 Find list of Airports operating in the Country X

spark.sql("select , count() over () as count from airports where country =

'Greenland'").show(100)

Output:

Figure 10.11: Output for list of Airports operating in the country X

10.12 Find the list of Airlines having X stops

spark.sql("select * from routes where stops > 0").show(100)

Output:

Figure 10.12: Output for the list of Airlines having X stops

16
10.13 List of Airlines operating with code share

spark.sql("select * , count(*) over() as count from routes where codeshare != 'na' ").show(100)

Output:

Figure 10.13: Output for Airlines operating with code share

10.14 Find the list of Active Airlines in the United States

spark.sql("select *, count(*) over() as count from airlines where country = 'United States' andactive =
'Y'").show(100)

Output:

Figure 10.14: Output for list of active airlines in the United States

17
10.15 Which country (or) territory has the highest number of Airports

spark.sql("select count(*) as cnt, country from airports group by country order by cnt desc").show(20)

Output:

Figure 10.15: Output for multiple countries having highest number of Airports

10.16 The top K cities with most incoming Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

Output:

Figure 10.16: Output for top K cities with most incoming Airlines

18
10.17 The top K cities with most outgoing Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

Output:

Figure 10.17: Output for top K cities with most outgoing Airlines

10.18 Trip that connects two cities X and Y

spark.sql("""select * from routes where sourceairportid = '2613' and

destinationairportid='2531' """).show(100)

Output:

Figure 10.18: Output for trip that connects two cities X and Y

19
10.19 Trip that connects X and Y with less than Z stops

spark.sql("""select * from routes where sourceairportid = '2613' and

destinationairportid='2531' and stops < 1 """).show(100)

Output:

Figure 10.19: Output for trip that connects X and Y with less than Z stops

10.20 All the cities reachable within d hops of a city

spark.sql("""select destinationairport from routes where stops = 1 """).show(100)

Output:

Figure 10.20: Output for all the cities reachable within d hops of a city

20
11 AWS Output Screenshot

Figure 30: AWS Crawlers page

Figure 31: AWS Tables

21
Figure 32: AWS Athena Query

Figure 33: AWS Athena Output

22
12 Acknowledgement

I would like to thank my major professor, Liu Yunchuan, for having faith in me and my talents and for continuing to believe that I
would be able to complete the project on schedule. This Project was completed successfully thanks to the support, ongoing direction,
and insightful feedback. I also want to express my sincere gratitude to my mentor for being on my panel, working as my academic
advisor, helping me make all the important choices, and having faith in me.

13 References:
[1] http://openflights.org/data.html.
[2] https://docs.aws.amazon.com/glue/latest/ug/tutorial-create-job.html
[3] https://spoddutur.github.io/spark-notes/spark-as-cloud-based-sql-engine-via-thrift-server.html
[4] https://docs.aws.amazon.com/s3/index.html
[5] https://www.iata.org/en/publications/directories/code-search/
[6] https://www.youtube.com/watch?v=8VOf1PUFE0I
[7] https://docs.aws.amazon.com/iam/index.html
[8] https://www.jetbrains.com/pycharm/learn/
[9] https://docs.python.org/3/library/index.html
[10] https://pandas.pydata.org/docs/
[11] https://spark.apache.org/docs/latest/api/python/index.html

Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
Azure Databricks
67% (6)
Azure Databricks
69 pages
Image Recognition Using CNN
0% (1)
Image Recognition Using CNN
12 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Crop Disease Detection System
0% (1)
Crop Disease Detection System
57 pages
Azure Data Engineer Content
No ratings yet
Azure Data Engineer Content
6 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
(Studies in Big Data) Mamta Mittal - Valentina E. Balas - Lalit Mohan Goyal - Raghvendra Kumar - Big Data Processing Using Spark in Cloud (2019, Springer) PDF
No ratings yet
(Studies in Big Data) Mamta Mittal - Valentina E. Balas - Lalit Mohan Goyal - Raghvendra Kumar - Big Data Processing Using Spark in Cloud (2019, Springer) PDF
274 pages
Digital Naturalist Final (1) 22280
0% (1)
Digital Naturalist Final (1) 22280
51 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
90 pages
Python and Machine Learning: A Practical Training Report On
No ratings yet
Python and Machine Learning: A Practical Training Report On
65 pages
Intership Report Music Recomandation System
No ratings yet
Intership Report Music Recomandation System
50 pages
Unit - Iv: Machine Learning (ML) For Iot
No ratings yet
Unit - Iv: Machine Learning (ML) For Iot
17 pages
Campus Selection Procedure Android App Project Report
No ratings yet
Campus Selection Procedure Android App Project Report
86 pages
Internship 7th Sem
No ratings yet
Internship 7th Sem
16 pages
CFFD Documentation
No ratings yet
CFFD Documentation
91 pages
b3 Plant Leaf Disease Detection
No ratings yet
b3 Plant Leaf Disease Detection
62 pages
Big Data
No ratings yet
Big Data
30 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
MCA Project Report Format - MU - Updated
100% (1)
MCA Project Report Format - MU - Updated
20 pages
Internship Report DiabetesPrediction
No ratings yet
Internship Report DiabetesPrediction
15 pages
Flight Delay Prediction: Project Synopsis On
No ratings yet
Flight Delay Prediction: Project Synopsis On
13 pages
Internship - Report Nithin
No ratings yet
Internship - Report Nithin
25 pages
Project Final Report
100% (1)
Project Final Report
44 pages
Black Book
No ratings yet
Black Book
58 pages
Plant Disease Identification and Crop Management - K - MITRA Report
100% (1)
Plant Disease Identification and Crop Management - K - MITRA Report
59 pages
Expression Recognition in E Learning Environment Using Deep PDF
No ratings yet
Expression Recognition in E Learning Environment Using Deep PDF
63 pages
Liver Tumor Detection Using Matlab: A Project Report On
No ratings yet
Liver Tumor Detection Using Matlab: A Project Report On
83 pages
Disease Prediction Using ML
100% (1)
Disease Prediction Using ML
43 pages
Fruit Old
No ratings yet
Fruit Old
37 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
11 pages
Project Report
100% (1)
Project Report
63 pages
Developer Training For Apache Spark and Hadoop: Hands-On Exercises
No ratings yet
Developer Training For Apache Spark and Hadoop: Hands-On Exercises
113 pages
Seminar On Deep CNN
No ratings yet
Seminar On Deep CNN
36 pages
Internship Report
No ratings yet
Internship Report
30 pages
Yashu CG Report-1
No ratings yet
Yashu CG Report-1
26 pages
Project Report On Flight Price Predication Using ML Techniques
No ratings yet
Project Report On Flight Price Predication Using ML Techniques
23 pages
Final Intership Report
No ratings yet
Final Intership Report
32 pages
Title Lightweight Model Implementation Using Neural Network For Fruit Recognition
No ratings yet
Title Lightweight Model Implementation Using Neural Network For Fruit Recognition
48 pages
Plant Leaf Disease Detection and Classification
No ratings yet
Plant Leaf Disease Detection and Classification
53 pages
Mini Project Format With Guidelines
No ratings yet
Mini Project Format With Guidelines
7 pages
Synopsis P
100% (1)
Synopsis P
6 pages
Disease Prediction Using Deep Learning
No ratings yet
Disease Prediction Using Deep Learning
25 pages
Deep Learning Based Car Damage Detection, Classification and Severity
No ratings yet
Deep Learning Based Car Damage Detection, Classification and Severity
7 pages
Vandana Internship Report
No ratings yet
Vandana Internship Report
48 pages
Project Report PDF
100% (2)
Project Report PDF
29 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
02 - Data Analytics Prefessional Course
100% (1)
02 - Data Analytics Prefessional Course
16 pages
Image Processing Final Report
No ratings yet
Image Processing Final Report
44 pages
Classification of Fruits and Detection of Disease Using CNN: Bachelor of Engineering IN Information Technology
No ratings yet
Classification of Fruits and Detection of Disease Using CNN: Bachelor of Engineering IN Information Technology
65 pages
Features of MapReduce
No ratings yet
Features of MapReduce
4 pages
Deploy Machine Learning Models
100% (1)
Deploy Machine Learning Models
45 pages
Python Mini Project
No ratings yet
Python Mini Project
11 pages
M.Tech (CSE) Scheme & Syllabus 2024-25
No ratings yet
M.Tech (CSE) Scheme & Syllabus 2024-25
59 pages
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
4 pages
Autism Spectrum Disorder Detection Using Facial Images
No ratings yet
Autism Spectrum Disorder Detection Using Facial Images
14 pages
Summer Internship Report On: Aws Data Engineering (Topic)
No ratings yet
Summer Internship Report On: Aws Data Engineering (Topic)
21 pages
Weather Prediction Using CPT+ Algorithm: Proposed Scheme
No ratings yet
Weather Prediction Using CPT+ Algorithm: Proposed Scheme
12 pages
Aarthi Report
100% (1)
Aarthi Report
28 pages
Top 100+ Data Engineer Interview Questions and Answers For 2022
No ratings yet
Top 100+ Data Engineer Interview Questions and Answers For 2022
4 pages
Deep Learning in Healthcare: Opportunities and Challenges
No ratings yet
Deep Learning in Healthcare: Opportunities and Challenges
3 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
No ratings yet
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
6 pages
Seminar Header File
No ratings yet
Seminar Header File
7 pages
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
No ratings yet
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
34 pages
Full Stack Data Science Brochure
No ratings yet
Full Stack Data Science Brochure
15 pages
18-C7 Big Data
No ratings yet
18-C7 Big Data
27 pages
A Study On E-Commerce Recommender System Based On Big Data
No ratings yet
A Study On E-Commerce Recommender System Based On Big Data
5 pages
Final Report
No ratings yet
Final Report
51 pages
Data Engineering 6 Months Plan
No ratings yet
Data Engineering 6 Months Plan
3 pages
Application of Big Data Analytics Pertaining To Power System Security
No ratings yet
Application of Big Data Analytics Pertaining To Power System Security
6 pages
Data Engineering 101 SQL and PySpark 1727161935
No ratings yet
Data Engineering 101 SQL and PySpark 1727161935
58 pages
Zeppelin Using
No ratings yet
Zeppelin Using
16 pages
CCA-175 Docs and Projects
No ratings yet
CCA-175 Docs and Projects
5 pages
Ankita
No ratings yet
Ankita
5 pages
DMV Lab Manual
No ratings yet
DMV Lab Manual
45 pages
Ai ML
No ratings yet
Ai ML
1 page
CSE-205 Data Structures and Algorithms Syllabus
No ratings yet
CSE-205 Data Structures and Algorithms Syllabus
2 pages
Error Log
No ratings yet
Error Log
2 pages
Tiger Jobs List
No ratings yet
Tiger Jobs List
11 pages
50 PySpark Interview Questions PDF
No ratings yet
50 PySpark Interview Questions PDF
7 pages
BDA Unit-5
No ratings yet
BDA Unit-5
44 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
186 pages
Open Source Technology For Big Data Analytics
No ratings yet
Open Source Technology For Big Data Analytics
2 pages
Case Study
No ratings yet
Case Study
14 pages
Tuning Aws Glue For Apache Spark
No ratings yet
Tuning Aws Glue For Apache Spark
98 pages
Aids QB
No ratings yet
Aids QB
8 pages
Magic Commands For Data Engineers Databricks
No ratings yet
Magic Commands For Data Engineers Databricks
13 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet

Airline Search Engine Project

Uploaded by

Airline Search Engine Project

Uploaded by

Governors State University

OPUS Open Portal to University Scholarship

All Capstone Projects Student Capstone Projects

Airline Search Engine Project

Follow this and additional works at: https://opus.govst.edu/capstones

Visit the Governors State Computer Science Department

GRADUATE CAPSTONE SEMINAR PROJECT

Submitted in partial fulfillment of the requirements

For the Degree of Master of Science,

With a Major in Computer Science

Governors State University

b. Find the list of Airlines having X stops.

c. List of Airlines operating with code share.

d. Find the list of Active Airlines in the United States.

e. Which Country (or) Territory has the highest number of Airports.

f. The top K cities with most Incoming/Outgoing Airlines.

a. Fast Transitive closure/connected component implemented in parallel/distributed algorithms.

Table name Airports

Table Name Routes

Table Name Planes

Table Name Countries

Figure 1: Architecture and flow of the Data Pipeline [2].

3 Tools and Technologies

Figure 2: PySpark version 3.1.2 and Spark version 3.1.2.

Figure 3: pip list command showing PySpark Version.

5 Project folder Hierarchy

Figure 4: Project folder Hierarchy

Figure 5 : Utility Code

7 Code for creating the Spark session

Figure 6: Code for creating the Spark session

Figure 7: Transformation and Cleaning

Figure 8: Project Code

Figure 9: Complete Project Code

Figure 11: Spark Session Configuration Code

10.1 Find a list of Airports operating in the Country X

spark.sql("select *, count(*) over () as count from airports where country =

10.2 Find the list of Airlines having X stops

Figure 10.2: Output for list of Airlines having X stops

Figure 10.3: Output for list of Airlines operating with codeshare

10.4 Find the list of Active Airlines in the United States

10.6 The top K cities with most Incoming Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

10.8 Trip that connects two cities X and Y

spark.sql("""select * from routes where sourceairportid = '2613' and

spark.sql("""select * from routes where sourceairportid = '2613' and

10.10 All the cities reachable within d hops of a city

spark.sql("""select destinationairport from routes where stops = 1 """).show(100)

spark.sql("select *, count(*) over () as count from airports where country =

Figure 10.11: Output for list of Airports operating in the country X

10.12 Find the list of Airlines having X stops

Figure 10.12: Output for the list of Airlines having X stops

Figure 10.13: Output for Airlines operating with code share

10.14 Find the list of Active Airlines in the United States

10.16 The top K cities with most incoming Airlines

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

spark.sql("""select * from (select airports.airportid, airports.name, airports.city,airports.country,

10.18 Trip that connects two cities X and Y

spark.sql("""select * from routes where sourceairportid = '2613' and

spark.sql("""select * from routes where sourceairportid = '2613' and

10.20 All the cities reachable within d hops of a city

Figure 30: AWS Crawlers page

Figure 31: AWS Tables

Figure 33: AWS Athena Output

You might also like

spark.sql("select , count() over () as count from airports where country =

spark.sql("select , count() over () as count from airports where country =