0% found this document useful (0 votes)
13 views8 pages

26

Uploaded by

huongpqk24414
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

26

Uploaded by

huongpqk24414
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

26

November 10, 2024

0.1 Exercise 26: Flights Dataset - Time Series Visualization and Analysis
[169]: import pandas as pd

[171]: df=pd.read_csv('flights.csv')

[173]: df

[173]: year month passengers


0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121
.. … … …
139 1960 August 606
140 1960 September 508
141 1960 October 461
142 1960 November 390
143 1960 December 432

[144 rows x 3 columns]

0.2 1. Load and overview


[210]: import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

[177]: flights = sns.load_dataset("flights")


print(flights.head())

year month passengers


0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121

1
0.3 2. Resample Data to Monthly
[215]: flights['month'] = flights['month'].astype(str)
flights['date'] = pd.to_datetime(flights['year'].astype(str) + '-' +␣
↪flights['month'] + '-01')

monthly_passengers = flights.groupby('date')['passengers'].mean().resample('M').
↪mean()

print(monthly_passengers)

date
1949-01-31 112.0
1949-02-28 118.0
1949-03-31 132.0
1949-04-30 129.0
1949-05-31 121.0

1960-08-31 606.0
1960-09-30 508.0
1960-10-31 461.0
1960-11-30 390.0
1960-12-31 432.0
Freq: M, Name: passengers, Length: 144, dtype: float64

0.4 3.Yearly Aggregation


[181]: yearly_passengers = flights.groupby('year')['passengers'].sum()
print(yearly_passengers)

year
1949 1520
1950 1676
1951 2042
1952 2364
1953 2700
1954 2867
1955 3408
1956 3939
1957 4421
1958 4572
1959 5140
1960 5714
Name: passengers, dtype: int64

2
0.5 4.6-month rolling average
[219]: monthly_passengers_rolling = monthly_passengers.rolling(window=6).mean()
print(monthly_passengers_rolling)

date
1949-01-31 NaN
1949-02-28 NaN
1949-03-31 NaN
1949-04-30 NaN
1949-05-31 NaN

1960-08-31 519.166667
1960-09-30 534.000000
1960-10-31 534.000000
1960-11-30 520.333333
1960-12-31 503.166667
Freq: M, Name: passengers, Length: 144, dtype: float64

0.6 5. Line plot for Monthly Trends


[185]: import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(monthly_passengers.index, monthly_passengers, label='Monthly␣
↪Passengers')

plt.plot(monthly_passengers_rolling.index, monthly_passengers_rolling,␣
↪label='6-Month Rolling Average', color='lightcoral')

plt.title('Monthly Passengers with 6-Month Rolling Average')


plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.legend()
plt.show()

3
[186]: #Conclusion: The line plot illustrates the trend of monthly passengers over␣
↪time, showcasing fluctuations throughout the year.

#The 6-month rolling average smooths out seasonal variations and highlights an␣
↪overall upward trend in passenger numbers, indicating growing demand for air␣

↪travel.

#Peaks in passenger numbers are evident during certain months, suggesting␣


↪seasonal travel patterns.

0.7 6. Monthly Passenger Distribution


[225]: plt.figure(figsize=(12, 6))
sns.boxplot(x='month', y='passengers', data=flights)
plt.title('Monthly Passenger Distribution')
plt.xlabel('Month')
plt.ylabel('Number of Passengers')
plt.show()

4
[190]: #Conclusion: The boxplot shows the distribution of passenger counts across␣
↪different months.

#It reveals that some months have significantly higher passenger counts, while␣
↪others have lower numbers.

#The presence of outliers in certain months indicates that there are periods of␣
↪exceptionally high travel, which could be linked to holidays or vacation␣

↪seasons.

#The median line within each box provides insight into typical passenger␣
↪numbers for each month.

0.8 7. Lagged Passenger


[228]: flights['lagged_passengers'] = flights['passengers'].shift(6)
print(flights.head(12))

year month passengers date lagged_passengers


0 1949 Jan 112 1949-01-01 NaN
1 1949 Feb 118 1949-02-01 NaN
2 1949 Mar 132 1949-03-01 NaN
3 1949 Apr 129 1949-04-01 NaN
4 1949 May 121 1949-05-01 NaN
5 1949 Jun 135 1949-06-01 NaN
6 1949 Jul 148 1949-07-01 112.0
7 1949 Aug 148 1949-08-01 118.0
8 1949 Sep 136 1949-09-01 132.0
9 1949 Oct 119 1949-10-01 129.0
10 1949 Nov 104 1949-11-01 121.0

5
11 1949 Dec 118 1949-12-01 135.0

0.9 Step 8: Trend Line for Annual Data


[231]: plt.figure(figsize=(12, 6))
plt.plot(yearly_passengers.index, yearly_passengers, marker='o',␣
↪color='hotpink')

plt.title('Yearly Passenger Trend')


plt.xlabel('Year')
plt.ylabel('Total Passengers')
plt.grid()
plt.xticks(yearly_passengers.index)
plt.show()

[233]: #The line graph shows the steady and significant growth in annual passenger␣
↪numbers between 1949 and 1960.

#It can be clearly seen that each year the number of passengers increased␣
↪compared to the previous year.

#In particular, in the late 50s, the growth rate tended to increase faster,␣
↪showing the strong development of the passenger transport industry during␣

↪this period.

#This growth can be explained by many factors, including economic development,␣


↪the expansion of transportation networks and the increase in people's income.

↪ Overall, the chart shows a very positive trend, reflecting people's growing␣

↪demand for travel and the success of the transportation industry in meeting␣

↪that demand.

6
0.10 9.Histogram of Monthly Passengers
[244]: plt.figure(figsize=(12, 6))
plt.hist(monthly_passengers, bins=12, color='pink', alpha=0.7)
plt.title('Histogram of Monthly Passengers')
plt.xlabel('Number of Passengers')
plt.ylabel('Frequency')
plt.grid()
plt.show()

[201]: #Conclusion: The histogram provides a visual representation of the frequency␣


↪distribution of monthly passenger counts.

#It indicates that most months have passenger counts concentrated around␣
↪certain values, with fewer months experiencing very high or very low counts.

#This distribution can inform airlines about typical passenger loads, aiding in␣
↪capacity planning and resource allocation.

0.11 Step 10: Area Plot for Yearly Totals


[239]: plt.figure(figsize=(12, 6))
plt.fill_between(yearly_passengers.index, yearly_passengers, color='deeppink',␣
↪alpha=0.6)

plt.title('Area Plot for Yearly Total Passengers')


plt.xlabel('Year')
plt.ylabel('Total Passengers')
plt.grid()
plt.show()

7
[240]: #Conclusion: The area plot effectively visualizes the cumulative total of␣
↪passengers per year, emphasizing the growth trend in air travel.

#The filled area allows for a quick visual comparison between years,␣
↪highlighting significant increases or decreases in passenger counts.

#This visualization helps to convey the overall growth narrative of the airline␣
↪industry, making it easier to identify trends over time.

You might also like