0% found this document useful (0 votes)
35 views3 pages

Report 3

This report analyzes an airline passenger satisfaction survey dataset using unsupervised machine learning techniques. The data was preprocessed by removing null values, outliers, and encoded categorical features. Feature engineering included normalization and removing highly correlated features. KMeans and DBSCAN clustering were applied after reducing dimensions using PCA. KMeans was run with 3 clusters and DBSCAN with a minimum sample of 5 and epsilon of 0.5. Clusters were evaluated and compared using silhouette scores. Visualizations of the clusters are provided. Limitations of the clustering algorithms are discussed and potential improvements suggested. The analysis concludes by summarizing insights gained.

Uploaded by

i221435
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views3 pages

Report 3

This report analyzes an airline passenger satisfaction survey dataset using unsupervised machine learning techniques. The data was preprocessed by removing null values, outliers, and encoded categorical features. Feature engineering included normalization and removing highly correlated features. KMeans and DBSCAN clustering were applied after reducing dimensions using PCA. KMeans was run with 3 clusters and DBSCAN with a minimum sample of 5 and epsilon of 0.5. Clusters were evaluated and compared using silhouette scores. Visualizations of the clusters are provided. Limitations of the clustering algorithms are discussed and potential improvements suggested. The analysis concludes by summarizing insights gained.

Uploaded by

i221435
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

MEMOONA WAZIR

22i 1435
Machine Learning assignment #3
Report

a.Introduction: Briefly introduce the problem statement and the


dataset.
Context
This dataset contains an airline passenger satisfaction survey.

Content
Gender: Gender of the passengers (Female, Male)
Customer Type: The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance: The flight distance of this journey
Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient
Ease of Online booking: Satisfaction level of online booking
Gate location: Satisfaction level of Gate location
Food and drink: Satisfaction level of Food and drink
Online boarding: Satisfaction level of online boarding
Seat comfort: Satisfaction level of Seat comfort
Inflight entertainment: Satisfaction level of inflight entertainment
On-board service: Satisfaction level of On-board service
Leg room service: Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes: Minutes delayed when departure
Arrival Delay in Minutes: Minutes delayed when Arrival
TARGET:Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
b.Data Preprocessing: Describe the data preprocessing steps
performed in the analysis.
 I checked and removed null values through mean
 Checked and removed the outliers from the data through IQR
 Changed the grouped data into 0 and 1 through one hot encoding
 Removed name and id column through drop()

c. Feature Engineering: Describe the feature engineering tasks


performed in the analysis.
 Normalized the data using min max
 Checked highly correlated and removed highly correlated features

d. Clustering: Describe the KMeans and DBSCAN algorithms used in


the
analysis and the performance metrics used to evaluate them. Also,
describe the fine-tuning of the clustering algorithms and the
comparison of their performance.

Feature extraction is performed using PCA to reduce the dimensionality of the dataset to 3
principal components using pca = PCA(n_components=3) and pca_data =
pca.fit_transform(data).
KMeans clustering is performed with 3 clusters using kmeans = KMeans(n_clusters=3,
random_state=42) and kmeans_labels = kmeans.fit_predict(pca_data). The silhouette score
metric is calculated using kmeans_silhouette = silhouette_score(pca_data, kmeans_labels).
DBSCAN clustering is performed with a minimum of 5 samples per cluster and an epsilon value
of 0.5 using dbscan = DBSCAN(eps=0.5, min_samples=5) and dbscan_labels =
dbscan.fit_predict(pca_data). The silhouette score metric is calculated using dbscan_silhouette
= silhouette_score(pca_data, dbscan_labels).

e. Results: Describe the results of the analysis and interpret the clusters
obtained.
f. Visualization: Include the visualization plots of the clusters obtained.

g. Limitations and Future Work: Identify the limitations and drawbacks


of

the clustering algorithms and suggest possible improvements.


h. Conclusion: Provide a summary of the analysis and the insights
obtained from it.

You might also like