A4
A4
Abstract— The aviation industry is heavily influenced by In this study, we analyse the data having more than
customer reviews and satisfaction. This study aims to identify 100,000 samples and implement EDA and run different
the major factors that impact passenger satisfaction in the machine learning classification techniques to predict whether
aviation industry and to provide insights that can help airlines the passenger was satisfied or not with the experience the
improve their services, gain a competitive advantage, and airline provided. By giving a list of criteria with decreasing
achieve business success. To this end, we employed several effect power, we also intend to aid the airlines in
classification algorithms, including Logistic Regression, SVM, understanding which component has the greatest impact on
Naive Bayes, Light GBM, AdaBoost, and XGBoost. Our results consumer satisfaction.
indicate that the LightGBM classifier produced the highest
accuracy. In conclusion, our findings suggest that the top five II. LITERATURE SURVEY
factors affecting passenger satisfaction are: (1) Inflight Wi-Fi
service, (2) Age, (3) Flight distance, (4) Customer type, and (5) Eren Sezgen et. al. [3] have used text mining approach to
Type of travel. These findings can be used by airlines to analyze online reviews to determine the factors impacting
prioritize their efforts and resources in order to enhance passenger satisfaction. Results show that depending on the
customer satisfaction and improve their business performance. airline business model and service class, the factors that
determine consumer satisfaction and dissatisfaction differ
Keywords— Aviation, Airline industry, Customer satisfaction, slightly.
Classification, Data mining, Inflight services, LightGBM
Hayadi et. al. [4] have used Random Forest Algorithm
I. INTRODUCTION for their study. The results show Inflight Wifi service as an
important factor in getting customer satisfaction.
The aviation industry is booming after the Covid
restrictions were eased. Measuring customer satisfaction is a We have used LightGBM to improve the accuracy as
key way that businesses such as airlines evaluate their LightBGM is a boosting algorithm.
performance [1]. An key factor in determining business
performance and a tactical instrument for attaining a R.Archana and Dr. M.V Subha [5] found that various
competitive edge is customer loyalty and passenger factors can affect customer satisfaction with Indian Airlines.
satisfaction, which is becoming more widely acknowledged. Their research showed that the impact of different
In order to improve consumer satisfaction and ultimately dimensions of airline service on passenger satisfaction and
boost revenues and profits, airline firms invest a significant the image of the airline was significant and positive.
amount of money in providing high-quality services. Rahim Hussain et. al. [6] have examined the relationship
As the importance of delivering high-quality service between service quality, service provider reputation, client
becomes increasingly crucial for the survival and expectations, perceived value, client happiness, and brand
competitiveness of airlines, the measurement of customer loyalty in a Dubai-based airline. SERVQUAL framework
satisfaction in the airline industry is becoming more frequent has been implemented. Since only one airline was included
and relevant [2]. Airline carriers now need to provide in the data collection, the conclusions' generalizability is
elevated services as it sustains consumer loyalty. Customers called into question.
who are unsatisfied or disconnected naturally lead to lower Hariguna Taqwa et. al. [7] have proposed a new method
people on the plane and less net profits. It is crucial that of combining K-means and Naive Bayes classifier
customers have the best experience every time they fly. Few algorithms to distinguish between positive and negative
of the factors that add to better experience when passenger classes in product review comments. The results showed that
travels are on-time flights, decent in-flight entertainment, the accuracy value using K-means and naive Bayes classifier
refreshments, and greater legroom space. without manual data achieved a higher accuracy value of
2
Authorized licensed use limited to: De Montfort University. Downloaded on February 10,2025 at 03:42:31 UTC from IEEE Xplore. Restrictions apply.
improved variant of the gradient boosting method is called
XGBoost (Extreme Gradient Boosting).In comparison to
other gradient boosting approaches, XGBoost is ten times
faster and has strong psychometric properties.
XGBoost is characterized by it’s unique splits that it The feature extraction of each split node must be
performs by calculating the gain of variance. [13] For compared, and the largest one must be chosen for splitting
instance, let O be the training set on a fixed node of the DT, for XGBoost. The comparative study must take into account
the variance gain of splitting feature j at a point d for this the information gain of all samples. In comparison,
node as presented in (1) [13] : LightGBM calculates the information gain with a much
మ మ smaller number of samples and is substantially efficient.
ቆσ ԝ ቇ ቆσ ԝ ቇ
ଵ ቄೣ אೀǣೣೕ ರቅ ቄೣ אೀǣೣೕ ಭቅ
ܸ ࣩפሺ݀ሻ ൌ ൮ ೕ ೕ ൲ (1)
ೀ פೀ ሺௗሻ ೝפೀ ሺௗሻ LightGBM Model works as follows :
where no ൌ σܫሾݔ ܱ אሿǡ ݊פை ሺ݀ሻ ൌ σܫൣݔ ܱ אǣ ݔ ݀൧ i. The input data is divided into smaller subsets called
"leaves" and organized into a tree-like structure
and ݊פை ሺ݀ሻ ൌ σܫൣݔ ܱ אǣ ݔ ݀൧. called a decision tree.
ii. The model uses the decision tree to make
Where no is the total number of observations in the predictions based on the features of the input data.
dataset, nij is total number of observations to the left of the iii. For each tree in the model, the data is split into two
dataset and njr is is total number of observations to the right subsets based on the value of a selected feature.
of the dataset, and gi is the negative gradient of the loss The process is repeated until the data is divided
function with respect to the model output. into leaves.
For a feature j, the DT algorithm selects iv. The model uses the leaves of the decision tree to
make predictions by taking the average of the
d*jൌ ݆ܸܽ݀ݔܽ݉݃ݎሺ݀ሻ (2) responses in each leaf.
and calculates the largest gain. v. The final prediction is made by averaging the
predictions of all the trees in the model.
Then, the data is split according to feature j כat point djכ
into the left and right child nodes.
LightGBM :
The open-source Gradient Boosting Decision Tree
(GBDT) algorithm called 'LightGBM' was developed by
Microsoft. It employs a leaf-wise tree growth strategy, which
involves selecting the leaf node with the greatest gain in
variance as the split point at each iteration of tree
construction [13]. LightGBM's multi-thread optimization and
leaf growth technique with depth restriction helps to reduce
excessive XGBoost memory consumption so that big data
processing can be done more quickly, with fewer false
alarms, and with fewer missed detections [14].
LightGBM can be differentiated from other GBDT
Fig. 2. LightGBM Model Leaf-wise Growth
models by the way the gain of variation is calculated [13].
Considering the same inputs presented for the calculation of The benefits of using LightGBM :
the gain of variance. In LightGBM, the splits occue
considering weak and strong learners (small and big 1. Improved training speed and efficiency:
gradients, gi). In this case, the training instances are ranked LightGBM is known for its fast training speed and
according to the absolute values of their gradients in the high efficiency compared to other gradient boosting
descending order. Then a top x percent of instances with the algorithms.
larger gradients are kept to form an instance subset A. For
the remaining set Ac formed by the (1-x) percent of instances 2. Enhanced accuracy: LightGBM is often able to
with smaller gradients, a subset B with size b * |Ac| is achieve better accuracy in classification and
randomly formed. Finally, the split of the instances regression tasks compared to other algorithms, due
according to an estimated variance gain over the Subset AUB to its ability to handle large-scale data and
is performed. incorporate feature interactions.
As presented in (2) [13] , the variance gain is calculated as 3. Reduced memory usage: LightGBM is designed to
భషೌ మ minimize memory usage, making it well-suited for
ଵ ቀσೣ אಲ ԝ ା ್ σೣ אಳ ԝ ቁ
ܸ כሺ݀ሻ ൌ ቆ ೕ working with large datasets.
ሺௗሻ
భషೌ మ (3) 4. Support for parallel, distributed, and GPU
ቀσೣ אಲೝ ԝ ା ್ σೣ אಳೝ ԝ ቁ
ቇ learning: LightGBM supports parallel and
ೕ
ೝ ሺௗሻ distributed training, as well as training on GPU,
3
Authorized licensed use limited to: De Montfort University. Downloaded on February 10,2025 at 03:42:31 UTC from IEEE Xplore. Restrictions apply.
allowing it to scale to larger datasets and make use
of advanced hardware resources.
5. Ability to handle large-scale data: LightGBM is
able to handle large-scale data efficiently, making it
a good choice for tasks that require processing large
datasets.
A cross-validation procedure was used to optimize the
performance of the model. This approach involves dividing
the dataset into multiple subsets, training the model on
different subsets, and evaluating its performance on the
remaining subsets. This helps to reduce the risk of overfitting
and improve the generalizability of the model.
4
Authorized licensed use limited to: De Montfort University. Downloaded on February 10,2025 at 03:42:31 UTC from IEEE Xplore. Restrictions apply.
According to the results depicted in Fig. 5, the top 10 that the classifier will rank a randomly chosen positive
most influential features on passenger satisfaction are: instance higher than a randomly chosen negative instance.
The Fig. 6 in the paper shows an AUC score of 0.995, which
1. Inflight Wi-Fi service is very close to 1. This indicates that the classifier used has a
2. Age good fit and is able to distinguish between positive and
3. Flight Distance negative instances with high accuracy.
4. Customer Type - Loyal Customer
5. Type of Travel - Business travel
6. Baggage handling
7. Online boarding
8. Inflight service
9. Seat comfort
10. Inflight entertainment
5
Authorized licensed use limited to: De Montfort University. Downloaded on February 10,2025 at 03:42:31 UTC from IEEE Xplore. Restrictions apply.
and suggestions, which helped us improve the quality of our [8] Jin-Woo Park, Rodger Robertson and Cheng-Lung Wu, “The effect of
research. airline service quality on passengers’ behavioural intentions: a Korean
case study ,” in Journal of Air Transport Management,vol. 10, Issue 6,
2004, pp. 435 – 439.
REFERENCES
[9] Saha, G.C. and Theingi, "Service quality, satisfaction, and behavioural
[1] Clement Kong Wing Chow, “Customer satisfaction and service quality intentions: A study of low ̺ cost airline carriers in Thailand",
in the Chinese airline industry,” in Journal of Air Transport Managing Service Quality: An International Journal, vol. 19 No. 3,
Management, vol. 35, 2014, pp.102–107. 2009, pp. 350-372.
[2] Stelios Tsafarakis, Theodosios Kokotas and Angelos Pantouvakis, “A [10] Lacic, E., Kowald, D., and Lex, E. “ High Enough? Explaining and
multiple criteria approach for airline passenger satisfaction Predicting Traveler Satisfaction Using Airline Review,” HT 2016 -
measurement and service quality improvement,” in Journal of Air Proc. 27th ACM Conf. Hypertext Soc. Media, 2016, pp. 249–254.
Transport Management, vol. 68, 2018, pp.61–75.
[11] Gures, Nuriye, Arslan, Seda and Tun, Sevil. “Customer Expectation,
[3] Eren Sezgen, Keith J. Mason and Robert Mayer, “Voice of airline Satisfaction and Loyalty Relationship in Turkish Airline Industry,”
passenger: A text mining approach to understand customer International Journal of Marketing Studies. vol. 6, no. 1, 2014, pp. 66–
satisfaction,” in Journal of Air Transport Management, vol. 77, 2019, 74.
pp. 65–74.
[12] Ban, Hyun-Jeong and Kim, Hak-Seon, “Understanding Customer
[4] Hayadi, B.Herawan, Jin-Mook Kim, Khodijah Hulliyah, and Husni Experience and Satisfaction through Airline Passengers’ Online
Teja Sukmana. "Predicting Airline Passenger Satisfaction with Review” Sustain., vol. 11, no. 15, 2019
Classification Algorithms," International Journal of Informatics and
[13] M. R. Machado, S. Karray and I. T. de Sousa, "LightGBM: an
Information Systems [Online], 4.1 (2021): 82-94. Web. 28 Nov. 2022
Effective Decision Tree Gradient Boosting Method to Predict
[5] R.Archana and Dr.M.V.Subha, “A study on service quality and Customer Loyalty in the Finance Industry," 2019 14th International
passenger satisfaction on Indian airlines,” International Journal of Conference on Computer Science & Education (ICCSE), 2019, pp.
Multidisciplinary Research, vol. 2, Issue 2, February 2012. 1111-1116, doi: 10.1109/ICCSE.2019.8845529.
[6] Rahim Hussain, Amjad Al Nasser and Yomna K. Hussain, “Service [14] M. Tang et al., “An Improved LightGBM Algorithm for Online Fault
quality and customer satisfaction of a UAE-based airline: An empirical Detection of Wind Turbine Gearboxes,” Energies, vol. 13, no. 4, p.
investigation,” in Journal of Air Transport Management, vol. 42, 2015, 807, Feb. 2020, doi: 10.3390/en13040807.
pp. 167–175.
[15] Gan, M.; Pan, S.; Chen, Y.; Cheng, C.; Pan, H.; Zhu, X. “Application
[7] Hariguna Taqwa, Wiga Maulana Baihaqi, and Aulia Nurwanti. of the Machine Learning LightGBM Model to the Prediction of the
"Sentiment Analysis of Product Reviews as A Customer Water Levels of the Lower Columbia River”. J. Mar. Sci. Eng. 2021,
Recommendation Using the Naive Bayes Classifier Algorithm, 9, 496. https://doi.org/10.3390/ jmse9050496.
" International Journal of Informatics and Information
Systems [Online], 2.2 (2019): 48-55. Web. 28 Nov. 2022.
6
Authorized licensed use limited to: De Montfort University. Downloaded on February 10,2025 at 03:42:31 UTC from IEEE Xplore. Restrictions apply.