Transport Demand Prediction using Regression

Last Updated : 21 Aug, 2024

Transport demand prediction is a crucial aspect of transportation planning and management. Accurate demand forecasts enable efficient resource allocation, improved service planning, and enhanced customer satisfaction. Regression analysis, a statistical method for modeling relationships between variables, is widely used for predicting transport demand. This article delves into the methodologies, challenges, and applications of regression models in transport demand prediction.

Table of Content

Understanding Transport Demand Prediction
Role of Regression Analysis in Predicting Transport Demand
Implementing Regression Model for Transport Demand Prediction

Problem Statement
Approach to the Model
Step 1: Transport Demand: Data Loading and Preprocessing
Step 2: Data Preprocessing
Step 3: Build the Model
Step 4: Predictions and Evaluation of the Model
Application of Regression Analysis in Your Model

Understanding Transport Demand Prediction

Transport demand prediction involves estimating future demand for transportation services based on historical data and various influencing factors. It is essential for optimizing routes, scheduling, and resource allocation in public transport systems. Accurate predictions can lead to cost savings, improved service quality, and better infrastructure planning.

Key Factors Influencing Transport Demand

Several factors influence transport demand, including:

Economic Indicators: Economic growth, measured by GDP or regional output, significantly impacts transport demand. Economic downturns can reduce demand, while growth can increase it.
Demographic Factors: Population size, age distribution, and urbanization levels affect transport demand. Urban areas with high population densities typically have higher public transport usage.
Transport Infrastructure: The availability and quality of transport infrastructure, such as roads, railways, and public transit systems, influence demand. Improved infrastructure can induce demand by offering better service quality.
Technological Advancements: Innovations in transport technology, such as intelligent transport systems (ITS), can affect demand by improving service efficiency and reliability.
Social and Cultural Factors: Cultural events, holidays, and social behaviors impact transport patterns. For instance, peak travel times often coincide with holidays or major events.

Role of Regression Analysis in Predicting Transport Demand

Regression analysis is a powerful statistical tool used to model and understand the relationship between a dependent variable (in this case, transport demand measured by the number of seats sold) and one or more independent variables (such as travel date, travel time, origin, destination, vehicle type, etc.).

It enables transportation planners and data scientists to identify patterns in historical data and use these patterns to make predictions about future demand.

Why Regression Analysis?

The need for regression analysis in transport demand prediction stems from its ability to:

Quantify Relationships: Regression helps quantify the relationship between transport demand and various influencing factors like time of travel, routes, payment methods, and vehicle types.
Capture Trends: It can identify trends and patterns in historical data, such as peak travel times, popular routes, or the impact of holidays and weekends on demand.
Provide Predictive Power: By establishing a mathematical model that connects demand to key variables, regression analysis allows for accurate forecasting of future transportation needs.
Model Complexity: In simple cases, linear regression can be sufficient. However, transport demand is often affected by non-linear relationships between variables (e.g., the impact of peak hours on demand might not increase linearly). This is where more advanced regression models like Random Forests, Gradient Boosting, or XGBoost become useful. These models capture more complex interactions between the features.

Implementing Regression Model for Transport Demand Prediction

To build a practical model for transport demand prediction we will follow a structured approach.

Problem Statement

The goal is to predict the number of seats sold for each ride on specific routes, dates, and times using historical data from Mobiticket. This prediction will help optimize resource allocation and improve service planning for public transport in Nairobi.

Approach to the Model

The dataset includes variables such as ride_id, seat_number, payment_method, travel_date, travel_time, travel_from, travel_to, car_type, and max_capacity. Our objective is to use these features to predict the number of seats sold (seat_number). Steps to Build the Model:

Data Preprocessing:
- Feature Engineering: Create new features such as day of the week, hour of the day, or whether the travel date falls on a weekend or holiday.
- Handling Categorical Variables: Convert categorical variables such as payment_method, travel_from, travel_to, and car_type into numerical representations using one-hot encoding or label encoding.
- Handling Dates: Extract useful information from travel_date and travel_time (e.g., day, month, hour).
- Normalization/Standardization: Standardize features like max_capacity to improve model performance.
Modeling:
- Train-Test Split: Split the data into training and test sets to evaluate model performance.
- Model Selection: Start with a simple regression model like Linear Regression, and then explore more complex models like Random Forest, Gradient Boosting, or XGBoost to capture non-linear relationships.
Model Evaluation: Evaluate the model's performance using appropriate metrics.

Step 1: Transport Demand: Data Loading and Preprocessing

Let's create a synthetic dataset that resembles the structure described for predicting the number of seats sold for each ride. This dataset will include features like ride_id, seat_number, payment_method, travel_date, travel_time, travel_from, travel_to, car_type, and max_capacity.

Python

import numpy as np
import pandas as pd
from datetime import timedelta, datetime

# Set random seed for reproducibility
np.random.seed(42)

# Parameters
n_samples = 10000  # Number of rides
locations = ['Location_A', 'Location_B', 'Location_C', 'Location_D', 'Location_E']
car_types = ['bus', 'minibus', 'van']
payment_methods = ['cash', 'mobile_payment', 'card']
start_date = datetime(2024, 1, 1)

# Generate data
ride_ids = np.arange(1, n_samples + 1)
travel_dates = [start_date + timedelta(days=np.random.randint(0, 365)) for _ in range(n_samples)]
travel_times = [datetime(2024, 1, 1, np.random.randint(0, 24), np.random.randint(0, 60)).time() for _ in range(n_samples)]
travel_from = np.random.choice(locations, n_samples)
travel_to = np.random.choice(locations, n_samples)
car_type = np.random.choice(car_types, n_samples)
max_capacity = np.random.choice([14, 30, 50], n_samples)
payment_method = np.random.choice(payment_methods, n_samples)

# Calculate seat_number based on some logic
# Example logic: Bus type, capacity, time of day, and payment method affect seat_number
seat_number = (np.random.poisson(lam=10, size=n_samples) 
               + (max_capacity / 2).astype(int) 
               + np.random.randint(0, 5, n_samples)
               - (np.array([t.hour for t in travel_times]) // 3)
               + (payment_method == 'mobile_payment').astype(int) * 5
              ).clip(1, max_capacity)  # Ensure seat_number is between 1 and max_capacity

# Create the DataFrame
data = pd.DataFrame({
    'ride_id': ride_ids,
    'travel_date': travel_dates,
    'travel_time': travel_times,
    'travel_from': travel_from,
    'travel_to': travel_to,
    'car_type': car_type,
    'max_capacity': max_capacity,
    'payment_method': payment_method,
    'seat_number': seat_number
})

data.to_csv("train_revised.csv", index=False)
data.head()

Output:

	ride_id	travel_date	travel_time	travel_from	travel_to	car_type	max_capacity	payment_method	seat_number
0	1	2024-04-12	17:29:00	Location_E	Location_E	van	50	cash	38
1	2	2024-12-14	11:47:00	Location_E	Location_D	bus	30	mobile_payment	27
2	3	2024-09-27	04:19:00	Location_B	Location_B	van	14	card	14
3	4	2024-04-16	11:20:00	Location_E	Location_D	bus	30	cash	28
4	5	2024-03-12	10:08:00	Location_A	Location_E	minibus	50	cash	36

Step 2: Data Preprocessing

Python

# Feature Engineering
data['travel_date'] = pd.to_datetime(data['travel_date'])
data['day_of_week'] = data['travel_date'].dt.dayofweek
data['month'] = data['travel_date'].dt.month
data['hour'] = pd.to_datetime(data['travel_time']).dt.hour

# Drop irrelevant columns
X = data.drop(columns=['ride_id', 'seat_number', 'travel_date', 'travel_time'])
y = data['seat_number']

# Handling Categorical Variables and Scaling
categorical_features = ['payment_method', 'travel_from', 'travel_to', 'car_type']
numerical_features = ['max_capacity', 'day_of_week', 'month', 'hour']

categorical_transformer = OneHotEncoder(drop='first')
numerical_transformer = StandardScaler()

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features),
    ]
)

Step 3: Build the Model

Python

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Pipeline for the model
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', RandomForestRegressor(random_state=42))
])
pipeline.fit(X_train, y_train)

Step 4: Predictions and Evaluation of the Model

Python

# Predict on the test set
y_pred = pipeline.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R²): {r2}")

Output:

Mean Squared Error (MSE): 8.306493985761055
R-squared (R²): 0.9101232043264215

Application of Regression Analysis in Your Model

In the model building, regression analysis is applied to the Mobiticket dataset to predict the number of seats sold for specific rides. Here’s how regression analysis plays a central role:

Identifying Key Variables: Regression analysis helps determine which features (e.g., time of travel, route, vehicle type) most strongly influence the number of seats sold. For instance, routes from high-demand locations or specific travel times (e.g., morning rush hour) might be more likely to have a higher number of seats sold.
Modeling Demand Patterns: Using regression techniques, your model can learn the demand patterns from historical data, which may include daily, weekly, or seasonal trends. For example, demand may rise on weekends or holidays, which the regression model can capture by incorporating time-based features.
Predicting Future Demand: Once the model is trained, it can predict the number of seats sold for future rides based on known factors like the date, time, and route. These predictions enable transport companies to allocate resources efficiently by scheduling additional vehicles on high-demand routes or adjusting schedules to match predicted demand.
Evaluating Model Performance: Regression models can be evaluated using metrics such as Mean Squared Error (MSE) and R-squared (R²) scores. These metrics help assess how well the model fits the data and how accurately it predicts demand. In your model, an R-squared of 0.91 indicates that your regression model explains 91% of the variance in the number of seats sold, which is a strong fit for predicting transport demand.

Conclusion

Transport demand prediction is a critical tool for optimizing transportation planning and resource management. Through regression analysis, predictive models can leverage historical data and key influencing factors to forecast future demand accurately. By adopting a structured approach to data preprocessing, feature engineering, and model selection, transportation planners can enhance service quality, improve operational efficiency, and better cater to passenger needs.

The presented methodology for predicting the number of seats sold in Nairobi's public transport highlights how data-driven decisions can shape the future of urban mobility, benefiting both service providers and passengers alike.

Transport Demand Prediction using Regression

frisbevhwy

Improve

Article Tags :

Practice Tags :

Machine Learning

Transport Demand Prediction using Regression

Understanding Transport Demand Prediction

Key Factors Influencing Transport Demand

Role of Regression Analysis in Predicting Transport Demand

Why Regression Analysis?

Implementing Regression Model for Transport Demand Prediction

Problem Statement

Approach to the Model

Step 1: Transport Demand: Data Loading and Preprocessing

Step 2: Data Preprocessing

Step 3: Build the Model

Step 4: Predictions and Evaluation of the Model

Application of Regression Analysis in Your Model

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?