Time Series using LightGBM

Last Updated : 26 May, 2025

Time series forecasting is a method used to predict future values based on past data points collected over time. This type of data appears in many real-life applications such as predicting sales, stock prices, weather conditions or traffic patterns. One tool for time series forecasting is LightGBM which is a fast and efficient machine learning algorithm developed by Microsoft. It stands for Light Gradient Boosting Machine and is based on gradient boosting which builds a strong prediction model from many smaller models called decision trees.

What is Time Series Data?

Time series data is a sequence of observations collected over time usually in regular intervals like:

Daily stock prices
Hourly temperature readings
Monthly electricity usage

The key characteristic of time series data is that order matters. The value at time t depends on what happened before time t. For example, if we are forecasting tomorrow’s weather we would use the data from today, yesterday and the days before that.

Why use LightGBM for Time Series

LightGBM is a great choice for time series forecasting because it handles missing data well, works efficiently with large datasets and supports a wide variety of features such as weather conditions, holidays and special events. It also allows the use of custom loss functions, which can be helpful when optimizing for specific forecasting goals. One of LightGBM’s biggest advantages is its fast training and prediction speed, making it suitable for real-time or large-scale forecasting tasks. Although it's not specifically designed for time series, with the right feature engineering and data transformation, LightGBM can deliver highly accurate and reliable forecasts.

Preparing Data for LightGBM

Since LightGBM is not built for time series, we need to manually add features that represent the time structure. Here’s how to prepare time series data:

1. Create Lag Features

Lag features represent past values of the series. For example:

lag\_1 = value at time t-1
lag\_2 = value at time t-2

This helps the model understand how the current value depends on the past.

Python

import pandas as pd
import numpy as np

# Create sample time series data
date_range = pd.date_range(start='2022-01-01', periods=100, freq='D')
values = np.random.randn(100)  # random values

# Create the DataFrame
df = pd.DataFrame({'date': date_range, 'value': values})

df['lag_1'] = df['value'].shift(1)
df['lag_2'] = df['value'].shift(2)

2. Create Rolling Statistics

Rolling features include moving averages or standard deviations over a time window:

Python

df['rolling_mean_3'] = df['value'].rolling(3).mean()
df['rolling_std_3'] = df['value'].rolling(3).std()

These features show trends or seasonality.

3. Add Date-Based Features

You can extract useful features from the date, such as:

Day of the week (Monday, Tuesday, etc.)
Month
Is it a weekend?

Python

df['day_of_week'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month
df['is_weekend'] = df['day_of_week'].isin([5,6]).astype(int)

4. Remove Missing Values

Because lag and rolling features create NaN values in the beginning, you’ll need to drop them:

Python

df = df.dropna()

Building a Time Series Model with LightGBM

Now that the data is ready, we can build the model.

Step 1: Installing LightGBM

You can install it using pip:

Python

pip install lightgbm

Step 2: Splitting Data

For time series we must not shuffle the data. Instead split it by time:

Python

train = df[df['date'] < '2022-04-01']
test = df[df['date'] >= '2022-04-01']

Step 3: Defining Features and Target

Python

features = ['lag_1', 'lag_2', 'rolling_mean_3', 'day_of_week', 'month', 'is_weekend']
target = 'value'

X_train = train[features]
y_train = train[target]

X_test = test[features]
y_test = test[target]

Step 4: Training LightGBM Model

Python

import lightgbm as lgb

model = lgb.LGBMRegressor()
model.fit(X_train, y_train)

Step 5: Making Predictions

Python

predictions = model.predict(X_test)

Evaluating the Model

To check how good your model is, use metrics like:

MAE (Mean Absolute Error): average of absolute differences between predicted and actual values.
RMSE (Root Mean Square Error): square root of the average of squared errors.

Example:

Python

from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

print(f"MAE: {mae}")
print(f"RMSE: {rmse}")

Output:

MAE: 0.3466166706030231
RMSE: 0.4253669139921471

Plotting the Forecast

It’s always helpful to see how predictions look compared to actual values:

Python

import matplotlib.pyplot as plt

plt.figure(figsize=(6, 4))
plt.plot(test['date'], y_test, label='Actual')
plt.plot(test['date'], predictions, label='Predicted')
plt.xticks(test['date'], rotation=45)

plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Forecasting with LightGBM")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.show()

Output:

lgb — Time Series Forecasting with LightGBM

Limitations of LightGBM for Time Series

While LightGBM is a great tool, it has some limitations for time series:

It does not model time directly you have to create time-based features yourself.
It cannot forecast multiple steps ahead easily. For that you need to predict one step at a time and feed it back into the model.
It does not handle long-term seasonality as well as some traditional models.

Still with the right feature engineering, LightGBM often beats traditional models on real-world datasets.

When to Use LightGBM for Time Series

LightGBM is a good choice when:

You have lots of data.
You want to include many external features like weather, holidays, events, etc.
You want faster training and prediction.
You need a strong baseline for performance.

However if your data has very strong seasonality or trends and you don’t have many features, models like ARIMA or Prophet might be better.

Time Series using LightGBM

vandita2t53

Improve

Article Tags :

Practice Tags :

Machine Learning

Time Series using LightGBM

What is Time Series Data?

Why use LightGBM for Time Series

Preparing Data for LightGBM

1. Create Lag Features

2. Create Rolling Statistics

3. Add Date-Based Features

4. Remove Missing Values

Building a Time Series Model with LightGBM

Step 1: Installing LightGBM

Step 2: Splitting Data

Step 3: Defining Features and Target

Step 4: Training LightGBM Model

Step 5: Making Predictions

Evaluating the Model

Plotting the Forecast

Limitations of LightGBM for Time Series

When to Use LightGBM for Time Series

Similar Reads

Thank You!

What kind of Experience do you want to share?