Open In App

Time Series using LightGBM

Last Updated : 26 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Time series forecasting is a method used to predict future values based on past data points collected over time. This type of data appears in many real-life applications such as predicting sales, stock prices, weather conditions or traffic patterns. One tool for time series forecasting is LightGBM which is a fast and efficient machine learning algorithm developed by Microsoft. It stands for Light Gradient Boosting Machine and is based on gradient boosting which builds a strong prediction model from many smaller models called decision trees.

What is Time Series Data?

Time series data is a sequence of observations collected over time usually in regular intervals like:

  • Daily stock prices
  • Hourly temperature readings
  • Monthly electricity usage

The key characteristic of time series data is that order matters. The value at time t depends on what happened before time t. For example, if we are forecasting tomorrow’s weather we would use the data from today, yesterday and the days before that.

Why use LightGBM for Time Series

LightGBM is a great choice for time series forecasting because it handles missing data well, works efficiently with large datasets and supports a wide variety of features such as weather conditions, holidays and special events. It also allows the use of custom loss functions, which can be helpful when optimizing for specific forecasting goals. One of LightGBM’s biggest advantages is its fast training and prediction speed, making it suitable for real-time or large-scale forecasting tasks. Although it's not specifically designed for time series, with the right feature engineering and data transformation, LightGBM can deliver highly accurate and reliable forecasts.

Preparing Data for LightGBM

Since LightGBM is not built for time series, we need to manually add features that represent the time structure. Here’s how to prepare time series data:

1. Create Lag Features

Lag features represent past values of the series. For example:

  • lag\_1 = value at time t-1
  • lag\_2 = value at time t-2

This helps the model understand how the current value depends on the past.

Python
import pandas as pd
import numpy as np

# Create sample time series data
date_range = pd.date_range(start='2022-01-01', periods=100, freq='D')
values = np.random.randn(100)  # random values

# Create the DataFrame
df = pd.DataFrame({'date': date_range, 'value': values})

df['lag_1'] = df['value'].shift(1)
df['lag_2'] = df['value'].shift(2)

2. Create Rolling Statistics

Rolling features include moving averages or standard deviations over a time window:

Python
df['rolling_mean_3'] = df['value'].rolling(3).mean()
df['rolling_std_3'] = df['value'].rolling(3).std()

These features show trends or seasonality.

3. Add Date-Based Features

You can extract useful features from the date, such as:

  • Day of the week (Monday, Tuesday, etc.)
  • Month
  • Is it a weekend?
Python
df['day_of_week'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month
df['is_weekend'] = df['day_of_week'].isin([5,6]).astype(int)

4. Remove Missing Values

Because lag and rolling features create NaN values in the beginning, you’ll need to drop them:

Python
df = df.dropna()

Building a Time Series Model with LightGBM

Now that the data is ready, we can build the model.

Step 1: Installing LightGBM

You can install it using pip:

Python
pip install lightgbm

Step 2: Splitting Data

For time series we must not shuffle the data. Instead split it by time:

Python
train = df[df['date'] < '2022-04-01']
test = df[df['date'] >= '2022-04-01']

Step 3: Defining Features and Target

Python
features = ['lag_1', 'lag_2', 'rolling_mean_3', 'day_of_week', 'month', 'is_weekend']
target = 'value'

X_train = train[features]
y_train = train[target]

X_test = test[features]
y_test = test[target]

Step 4: Training LightGBM Model

Python
import lightgbm as lgb

model = lgb.LGBMRegressor()
model.fit(X_train, y_train)

Step 5: Making Predictions

Python
predictions = model.predict(X_test)

Evaluating the Model

To check how good your model is, use metrics like:

Example:

Python
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

print(f"MAE: {mae}")
print(f"RMSE: {rmse}")

Output:

MAE: 0.3466166706030231
RMSE: 0.4253669139921471

Plotting the Forecast

It’s always helpful to see how predictions look compared to actual values:

Python
import matplotlib.pyplot as plt

plt.figure(figsize=(6, 4))
plt.plot(test['date'], y_test, label='Actual')
plt.plot(test['date'], predictions, label='Predicted')
plt.xticks(test['date'], rotation=45)

plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Forecasting with LightGBM")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.show()

Output:

lgb
Time Series Forecasting with LightGBM

Limitations of LightGBM for Time Series

While LightGBM is a great tool, it has some limitations for time series:

  • It does not model time directly you have to create time-based features yourself.
  • It cannot forecast multiple steps ahead easily. For that you need to predict one step at a time and feed it back into the model.
  • It does not handle long-term seasonality as well as some traditional models.

Still with the right feature engineering, LightGBM often beats traditional models on real-world datasets.

When to Use LightGBM for Time Series

LightGBM is a good choice when:

  • You have lots of data.
  • You want to include many external features like weather, holidays, events, etc.
  • You want faster training and prediction.
  • You need a strong baseline for performance.

However if your data has very strong seasonality or trends and you don’t have many features, models like ARIMA or Prophet might be better.


Next Article
Article Tags :
Practice Tags :

Similar Reads