Time Series using LightGBM
Last Updated :
26 May, 2025
Time series forecasting is a method used to predict future values based on past data points collected over time. This type of data appears in many real-life applications such as predicting sales, stock prices, weather conditions or traffic patterns. One tool for time series forecasting is LightGBM which is a fast and efficient machine learning algorithm developed by Microsoft. It stands for Light Gradient Boosting Machine and is based on gradient boosting which builds a strong prediction model from many smaller models called decision trees.
What is Time Series Data?
Time series data is a sequence of observations collected over time usually in regular intervals like:
- Daily stock prices
- Hourly temperature readings
- Monthly electricity usage
The key characteristic of time series data is that order matters. The value at time t depends on what happened before time t. For example, if we are forecasting tomorrow’s weather we would use the data from today, yesterday and the days before that.
Why use LightGBM for Time Series
LightGBM is a great choice for time series forecasting because it handles missing data well, works efficiently with large datasets and supports a wide variety of features such as weather conditions, holidays and special events. It also allows the use of custom loss functions, which can be helpful when optimizing for specific forecasting goals. One of LightGBM’s biggest advantages is its fast training and prediction speed, making it suitable for real-time or large-scale forecasting tasks. Although it's not specifically designed for time series, with the right feature engineering and data transformation, LightGBM can deliver highly accurate and reliable forecasts.
Preparing Data for LightGBM
Since LightGBM is not built for time series, we need to manually add features that represent the time structure. Here’s how to prepare time series data:
1. Create Lag Features
Lag features represent past values of the series. For example:
- lag\_1 = value at time t-1
- lag\_2 = value at time t-2
This helps the model understand how the current value depends on the past.
Python
import pandas as pd
import numpy as np
# Create sample time series data
date_range = pd.date_range(start='2022-01-01', periods=100, freq='D')
values = np.random.randn(100) # random values
# Create the DataFrame
df = pd.DataFrame({'date': date_range, 'value': values})
df['lag_1'] = df['value'].shift(1)
df['lag_2'] = df['value'].shift(2)
2. Create Rolling Statistics
Rolling features include moving averages or standard deviations over a time window:
Python
df['rolling_mean_3'] = df['value'].rolling(3).mean()
df['rolling_std_3'] = df['value'].rolling(3).std()
These features show trends or seasonality.
3. Add Date-Based Features
You can extract useful features from the date, such as:
- Day of the week (Monday, Tuesday, etc.)
- Month
- Is it a weekend?
Python
df['day_of_week'] = df['date'].dt.dayofweek
df['month'] = df['date'].dt.month
df['is_weekend'] = df['day_of_week'].isin([5,6]).astype(int)
4. Remove Missing Values
Because lag and rolling features create NaN values in the beginning, you’ll need to drop them:
Python
Building a Time Series Model with LightGBM
Now that the data is ready, we can build the model.
Step 1: Installing LightGBM
You can install it using pip:
Python
Step 2: Splitting Data
For time series we must not shuffle the data. Instead split it by time:
Python
train = df[df['date'] < '2022-04-01']
test = df[df['date'] >= '2022-04-01']
Step 3: Defining Features and Target
Python
features = ['lag_1', 'lag_2', 'rolling_mean_3', 'day_of_week', 'month', 'is_weekend']
target = 'value'
X_train = train[features]
y_train = train[target]
X_test = test[features]
y_test = test[target]
Step 4: Training LightGBM Model
Python
import lightgbm as lgb
model = lgb.LGBMRegressor()
model.fit(X_train, y_train)
Step 5: Making Predictions
Python
predictions = model.predict(X_test)
Evaluating the Model
To check how good your model is, use metrics like:
Example:
Python
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"MAE: {mae}")
print(f"RMSE: {rmse}")
Output:
MAE: 0.3466166706030231
RMSE: 0.4253669139921471
Plotting the Forecast
It’s always helpful to see how predictions look compared to actual values:
Python
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 4))
plt.plot(test['date'], y_test, label='Actual')
plt.plot(test['date'], predictions, label='Predicted')
plt.xticks(test['date'], rotation=45)
plt.xlabel("Date")
plt.ylabel("Value")
plt.title("Time Series Forecasting with LightGBM")
plt.legend()
plt.tight_layout()
plt.grid(True)
plt.show()
Output:
Time Series Forecasting with LightGBMLimitations of LightGBM for Time Series
While LightGBM is a great tool, it has some limitations for time series:
- It does not model time directly you have to create time-based features yourself.
- It cannot forecast multiple steps ahead easily. For that you need to predict one step at a time and feed it back into the model.
- It does not handle long-term seasonality as well as some traditional models.
Still with the right feature engineering, LightGBM often beats traditional models on real-world datasets.
When to Use LightGBM for Time Series
LightGBM is a good choice when:
- You have lots of data.
- You want to include many external features like weather, holidays, events, etc.
- You want faster training and prediction.
- You need a strong baseline for performance.
However if your data has very strong seasonality or trends and you don’t have many features, models like ARIMA or Prophet might be better.
Similar Reads
Regression using LightGBM In this article, we will learn about one of the state-of-the-art machine learning models: Lightgbm or light gradient boosting machine. After improvising more and more on the XGB model for better performance XGBoost which is an eXtreme Gradient Boosting machine but by the lightgbm we can achieve simi
15+ min read
Train a model using LightGBM Light Gradient Boosting Machine (LightGBM) is an open-source and distributed gradient boosting framework that was developed by Microsoft Corporation. Unlike other traditional machine learning models, LightGBM can efficiently large datasets and has optimized training processes. LightGBM can be employ
11 min read
LightGBM Tree Parameters In the ever-evolving landscape of machine learning, gradient-boosting algorithms have gained significant traction due to their exceptional predictive power and versatility. Among these, LightGBM stands out as a highly efficient and scalable framework. In this article, we will delve into the tree par
5 min read
LightGBM Feature parameters LightGBM (Light gradient-boosting machine) is a gradient-boosting framework developed by Microsoft, known for its impressive performance and less memory usage. In this article, we'll explore LightGBM's feature parameters while working with the Wisconsin Breast Cancer dataset. What is LightGBM?Micros
10 min read
Multiclass classification using LightGBM While solving problems in real life it is very rare that we only come across binary classification problems because there are times when we have to classify within multiple categories for example dealing with the iris problem or the MNIST dataset is one of the common multiclass classification proble
10 min read
LightGBM Gradient-Based Strategy LightGBM is a well-known high-performing model that uses a gradient-based strategy in its internal training process. Gradient-based strategy effectively enhances a model to make it highly optimized, accurate in prediction, and memory efficient which unlocks an easy way to handle complex and large re
9 min read
Binary classification using LightGBM In this article, we will learn about one of the state-of-the-art machine learning models: Lightgbm or light gradient boosting machine. After improvising more and more on the XGB model for better performance XGBoost which is an eXtreme Gradient Boosting machine but by the lightgbm we can achieve simi
12 min read
LightGBM Learning Control Parameters In this article, we will delve into the realm of LightGBM's learning control parameters, understanding their significance and impact on the model's performance. What is LightGBM? LightGBM is a powerful gradient-boosting framework that has gained immense popularity in the fields of machine learning a
6 min read
Accelerating LightGBM: Harnessing Parallel and GPU Training LightGBM, a popular gradient boosting framework, is celebrated for its speed and efficiency. However, to truly harness its power for large datasets or complex models, we can leverage parallel and GPU training. In this article we will explore these techniques, illuminating how they accelerate LightGB
5 min read