0% found this document useful (0 votes)
21 views11 pages

Time Arima 002

Uploaded by

natthaweeilac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

Time Arima 002

Uploaded by

natthaweeilac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Certainly!

Here’s an extended version of the previous guide that includes steps for building and
forecasting with an ARIMA model.

### Steps to Make Time-Series Data Stationary and Forecast using ARIMA

1. **Load the Data**


2. **Preprocessing**
3. **Exploratory Data Analysis (EDA)**
4. **Time-Series Decomposition**
5. **Stationarity Testing**
6. **Differencing and Transformations**
7. **Autocorrelation and Partial Autocorrelation**
8. **Model Building and Forecasting using ARIMA**

Let's go through each step in detail.

### 1. Load the Data

Start by loading the Microsoft stock price data from a CSV file.

```python
import pandas as pd

# Load the data


df = pd.read_csv('MSFT_2015_2021.csv', parse_dates=['Date'], index_col='Date')

# Display the first few rows of the dataset


print(df.head())
```
### 2. Preprocessing

Handle missing values and ensure the data is sorted by date.

```python
# Check for missing values
print(df.isnull().sum())

# Sort the data by date


df = df.sort_index()
```

### 3. Exploratory Data Analysis (EDA)

Perform EDA to understand the data better.

```python
import matplotlib.pyplot as plt

# Plot the closing prices


plt.figure(figsize=(10, 6))
plt.plot(df['Close'], label='Close Price')
plt.title('Microsoft Stock Price (2015-2021)')
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.legend()
plt.show()

# Calculate moving averages


df['MA20'] = df['Close'].rolling(window=20).mean()
df['MA50'] = df['Close'].rolling(window=50).mean()

# Plot moving averages


plt.figure(figsize=(10, 6))
plt.plot(df['Close'], label='Close Price')
plt.plot(df['MA20'], label='20-Day MA')
plt.plot(df['MA50'], label='50-Day MA')
plt.title('Microsoft Stock Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```

### 4. Time-Series Decomposition

Decompose the time series to observe the trend, seasonality, and residuals.

```python
from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series


decomposition = seasonal_decompose(df['Close'], model='multiplicative')

# Plot the decomposed components


plt.figure(figsize=(12, 8))
decomposition.plot()
plt.show()
```
### 5. Stationarity Testing

Check if the time series is stationary using the Augmented Dickey-Fuller (ADF) test.

```python
from statsmodels.tsa.stattools import adfuller

# Perform the ADF test


result = adfuller(df['Close'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
for key, value in result[4].items():
print('Critial Values:')
print(f' {key}, {value}')
```

### 6. Differencing and Transformations

If the series is not stationary, apply differencing and transformations.

```python
# Apply log transformation to stabilize variance
df['Price_log'] = np.log(df['Close'])

# Apply first-order differencing to remove trend


df['Price_log_diff'] = df['Price_log'].diff()

# Plot the differenced series


plt.figure(figsize=(10, 6))
plt.plot(df['Price_log_diff'], label='Log Transformed and Differenced Series')
plt.title('Log Transformed and Differenced Series')
plt.xlabel('Date')
plt.ylabel('Log Price Difference')
plt.legend()
plt.show()

# Perform the ADF test on the differenced series


result = adfuller(df['Price_log_diff'].dropna())
print('ADF Statistic after differencing:', result[0])
print('p-value after differencing:', result[1])
for key, value in result[4].items():
print(f'Critical Value ({key}): {value}')
```

### 7. Autocorrelation and Partial Autocorrelation

Plot ACF and PACF to identify the order of ARIMA models.

```python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Plot ACF and PACF


plt.figure(figsize=(12, 6))
plt.subplot(121)
plot_acf(df['Price_log_diff'].dropna(), ax=plt.gca(), lags=30)
plt.subplot(122)
plot_pacf(df['Price_log_diff'].dropna(), ax=plt.gca(), lags=30)
plt.show()
```
### 8. Model Building and Forecasting using ARIMA

Build ARIMA models based on the ACF and PACF plots and use them for forecasting.

```python
from statsmodels.tsa.arima.model import ARIMA

# Determine the order of ARIMA(p, d, q) based on ACF and PACF plots


p = 1 # Example value
d = 1 # Example value
q = 1 # Example value

# Build and fit the ARIMA model


model = ARIMA(df['Price_log'].dropna(), order=(p, d, q))
fit_model = model.fit()

# Print model summary


print(fit_model.summary())

# Make predictions
start = len(df)
end = len(df) + 30 # Forecast for the next 30 days
pred = fit_model.get_forecast(steps=30)
pred_ci = pred.conf_int()

# Plot the results


plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Actual')
plt.plot(np.exp(pred.predicted_mean), label='Forecast')
plt.fill_between(pred_ci.index, np.exp(pred_ci.iloc[:, 0]), np.exp(pred_ci.iloc[:, 1]), color='k',
alpha=0.1)
plt.title('Microsoft Stock Price Forecast')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```

### Conclusion
This end-to-end guide covers loading, preprocessing, EDA, time-series decomposition,
stationarity testing, differencing, transformations, and model building with ARIMA for
forecasting Microsoft stock price data from 2015-2021.
### Why ARIMA Cannot Support Non-Stationary Time-Series Data

**ARIMA** stands for AutoRegressive Integrated Moving Average. The ARIMA model is a
widely used tool in time series forecasting that requires the data to be stationary. Let’s delve into
the reasons why ARIMA cannot support non-stationary time-series data and why stationarity is
crucial for ARIMA models.

#### 1. **Model Assumptions**

ARIMA models are based on several key assumptions, one of the most critical being stationarity.
Stationarity means that the statistical properties of the time series—such as mean, variance, and
autocorrelation—are constant over time. Here’s why this assumption is important:

1. **Consistency in Parameters**: For ARIMA models to make accurate predictions, the


parameters estimated from the historical data need to remain consistent over time. Non-
stationary data, which has trends or varying mean and variance, would violate this assumption,
leading to unreliable predictions.

2. **Predictability**: Stationary data is more predictable because it follows a consistent pattern.


Non-stationary data can have unpredictable changes, making it difficult for the ARIMA model to
identify a stable structure to base its predictions on.

#### 2. **Mathematical Basis**

The ARIMA model combines three components:


- **AutoRegressive (AR)**: Relies on past values to predict future values.
- **Integrated (I)**: Involves differencing the data to make it stationary.
- **Moving Average (MA)**: Relies on past forecast errors to predict future values.

- **Autoregressive Component**: The AR part assumes a linear relationship between an


observation and a specified number of lagged observations. For non-stationary data, this linear
relationship would not hold consistently.
- **Moving Average Component**: The MA part assumes that forecast errors are a linear
combination of past forecast errors. For non-stationary data, the patterns in the errors would not
be consistent, violating this assumption.
- **Integration Component**: The "I" in ARIMA represents differencing the data to achieve
stationarity. This step transforms non-stationary data into stationary data so that AR and MA
models can be effectively applied.

#### 3. **Statistical Properties**

For the statistical properties of ARIMA to hold, the data must be stationary:
- **Constant Mean**: The mean of the series should not be a function of time.
- **Constant Variance**: The variance of the series should not change over time.
- **Autocorrelation**: The autocorrelation structure should be constant over time.

When the time series is non-stationary, these properties are not constant, making it impossible for
ARIMA to model the data accurately.

#### 4. **Example and Solution**

**Non-Stationary Data Example**: Suppose you have a time series representing stock prices,
which typically show trends (upwards or downwards). Such data is non-stationary because the
mean and variance change over time.

**Solution**:
1. **Differencing**: Apply differencing to remove the trend and achieve stationarity.
2. **Transformation**: Use log transformation or other techniques to stabilize the variance.

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller

# Load the data


df = pd.read_csv('MSFT_2015_2021.csv', parse_dates=['Date'], index_col='Date')

# Log transformation to stabilize variance


df['Price_log'] = np.log(df['Close'])

# First-order differencing to remove trend


df['Price_log_diff'] = df['Price_log'].diff().dropna()

# Check for stationarity


result = adfuller(df['Price_log_diff'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])

# Build and fit the ARIMA model


model = ARIMA(df['Price_log_diff'], order=(1, 0, 1))
fit_model = model.fit()
print(fit_model.summary())
```

### Conclusion

- **Stationarity is Essential**: The ARIMA model assumes that the underlying data is stationary.
Non-stationary data violates this assumption, leading to unreliable model parameters and
predictions.
- **Differencing and Transformation**: To use ARIMA on non-stationary data, it is necessary to
apply differencing and possibly transformations to achieve stationarity.
- **Mathematical and Statistical Reasons**: The reliance on past values and errors, and the need
for constant statistical properties, make stationarity a prerequisite for ARIMA models.

You might also like