Time Arima 002
Time Arima 002
Here’s an extended version of the previous guide that includes steps for building and
forecasting with an ARIMA model.
### Steps to Make Time-Series Data Stationary and Forecast using ARIMA
Start by loading the Microsoft stock price data from a CSV file.
```python
import pandas as pd
```python
# Check for missing values
print(df.isnull().sum())
```python
import matplotlib.pyplot as plt
Decompose the time series to observe the trend, seasonality, and residuals.
```python
from statsmodels.tsa.seasonal import seasonal_decompose
Check if the time series is stationary using the Augmented Dickey-Fuller (ADF) test.
```python
from statsmodels.tsa.stattools import adfuller
```python
# Apply log transformation to stabilize variance
df['Price_log'] = np.log(df['Close'])
```python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
Build ARIMA models based on the ACF and PACF plots and use them for forecasting.
```python
from statsmodels.tsa.arima.model import ARIMA
# Make predictions
start = len(df)
end = len(df) + 30 # Forecast for the next 30 days
pred = fit_model.get_forecast(steps=30)
pred_ci = pred.conf_int()
### Conclusion
This end-to-end guide covers loading, preprocessing, EDA, time-series decomposition,
stationarity testing, differencing, transformations, and model building with ARIMA for
forecasting Microsoft stock price data from 2015-2021.
### Why ARIMA Cannot Support Non-Stationary Time-Series Data
**ARIMA** stands for AutoRegressive Integrated Moving Average. The ARIMA model is a
widely used tool in time series forecasting that requires the data to be stationary. Let’s delve into
the reasons why ARIMA cannot support non-stationary time-series data and why stationarity is
crucial for ARIMA models.
ARIMA models are based on several key assumptions, one of the most critical being stationarity.
Stationarity means that the statistical properties of the time series—such as mean, variance, and
autocorrelation—are constant over time. Here’s why this assumption is important:
For the statistical properties of ARIMA to hold, the data must be stationary:
- **Constant Mean**: The mean of the series should not be a function of time.
- **Constant Variance**: The variance of the series should not change over time.
- **Autocorrelation**: The autocorrelation structure should be constant over time.
When the time series is non-stationary, these properties are not constant, making it impossible for
ARIMA to model the data accurately.
**Non-Stationary Data Example**: Suppose you have a time series representing stock prices,
which typically show trends (upwards or downwards). Such data is non-stationary because the
mean and variance change over time.
**Solution**:
1. **Differencing**: Apply differencing to remove the trend and achieve stationarity.
2. **Transformation**: Use log transformation or other techniques to stabilize the variance.
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
### Conclusion
- **Stationarity is Essential**: The ARIMA model assumes that the underlying data is stationary.
Non-stationary data violates this assumption, leading to unreliable model parameters and
predictions.
- **Differencing and Transformation**: To use ARIMA on non-stationary data, it is necessary to
apply differencing and possibly transformations to achieve stationarity.
- **Mathematical and Statistical Reasons**: The reliance on past values and errors, and the need
for constant statistical properties, make stationarity a prerequisite for ARIMA models.