Final Project Report
Final Project Report
BELAGAVI
Project Report on
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted By
AISHWARYA 1BY20CS015
KUSHI L 1BY20CS087
LAVANYA N 1BY20CS094
Dr SATISH KUMAR T
ASSOCIATE PROFESSOR
Department of CSE,
BMSIT&M
2023-2024
VISVESVARAYATECHNOLOGICALUNIVERSITY
BELAGAVI
BMS INSTITUTE OF TECHNOLOGY AND MANAGEMENT
YELAHANKA, BENGALURU – 560064
CERTIFICATE
This is to certify that the Project work entitled “Predictive Maintenance for
Automobiles” is a bonafide work carried out by Aishwarya(1BY20CS015), Ananya
Barath(1BY20CS022), Kushi L(1BY20CS087), Lavanya N(1BY20CS094), in partial
fulfillment for the award of Bachelor of Engineering Degree in Computer Science
and Engineering of the Visvesvaraya Technological University, Belagavi during the year
2023-2024. It is certified that all corrections/suggestions indicated for Internal Assessment
have been incorporated in this report. The project report has been approved as it satisfies the
academic requirements in respect of project work for B.E. Degree.
External VIVA-VOCE
1.
2.
ACKNOWLEDGEMENT
We are pleased to present this project report upon its successful completion. This
project would not have been possible without the guidance, assistance, and suggestions of
many individuals. We would like to express our deep sense of gratitude to each and every
person who has contributed to making this project a success.
First and foremost, we extend our heartfelt thanks to Dr. Sanjay H. A, Principal,
BMS Institute of Technology & Management, for his constant encouragement and
inspiration in undertaking this project.
We also express our sincere gratitude to Dr. Thippeswamy G, Professor and Head
of the Department, Computer Science and Engineering, BMS Institute of Technology &
Management, for his unwavering support and motivation throughout this endeavor.
We extend our heartfelt thanks to our project coordinators, Dr. Vidya R Pai and
Dr. Arunakumari B. N, Assistant Professor(s), Department of Computer Science and
Engineering, for their constant support and valuable advice during the project.
Special thanks are due to all the staff members of the Computer Science and
Engineering Department for their help and cooperation.
Finally, we would like to express our gratitude to our parents and friends for their
unwavering encouragement and support throughout the duration of this project.
By,
Aishwarya 1BY20CS015
Ananya Barath 1BY20CS022
Kushi L 1BY20CS087
Lavanya N 1BY20CS094
DECLARATION
Aishwarya 1BY20CS015
Kushi L 1BY20CS087
Lavanya N 1BY20CS094
ABSTRACT
Predictive maintenance (PdM) in the automobile sector assists firms in determining when a
machine or vehicle part requires servicing by applying techniques such as data mining, data
preprocessing, and machine learning. Predictive maintenance for vehicles is a vehicle
maintenance strategy that uses digital data analysis and machine learning algorithms to predict
future breakdowns of vehicle components and systems. It is transforming the automobile
industry through the use of machine learning technologies. Machine learning models may
assess historical data, monitor real-time information, and predict probable component failures
or maintenance needs by combining data from numerous sensors and systems within cars. This
predictive method has various benefits, including reduced downtime, minimized repair costs,
and increased overall safety for both drivers and passengers.
Chapter 1:Introduction………………………………………………………………………1
1.1 Background…………………………………………………………………..…1
5.2 Algorithms……………………………………………………………………...22
5.4 Code…………………………………………………………………….............28
Chapter 6: Testing…………………………………………………………………………....33
7.10 XGBoost…………………………………………………………………..…46
Chapter 8: Conclusion……………………………………………………………………....47
References…………………………………………………………………………….……..49
LIST OF FIGURES
Table 6.4.4 Test cases for Engine health and Battery RUL prediction……………..………...40
Predictive Maintenance for Automobiles Introduction
CHAPTER 1
INTRODUCTION
1.1 Background
Predictive maintenance is crucial in the automobile sector since vehicle emissions and fuel
consumption have a significant detrimental impact on the environment and the economy. In
view of the increased emphasis on sustainability and the need to decrease emission of
greenhouse gases, predictive maintenance can greatly enhance automobile efficiency and
reduce emissions by identifying and resolving any issues before they become critical ones.
The automotive industry relies heavily on predictive maintenance since it lowers operating
costs, streamlines maintenance schedules, and saves time for fleet managers, manufacturers,
and service providers. Cars can avoid unplanned breakdowns and ensure maximum
performance and safety while driving by anticipating maintenance needs and scheduling repairs
or part replacements at the most convenient periods.
Rapid advances in sensor and network technologies have produced a wealth of condition-
monitoring data, which may be used to estimate an equipment's remaining useful life and
prevent breakdowns through the application of artificial intelligence (AI) and complicated
mathematical models. This research evaluates the literature on predictive maintenance in the
automobile sector with an emphasis on AI, statistical inference, and stochastic approaches.
Physical and hybrid models perform well with small amounts of data, while deep learning
methods require larger datasets in order to forecast failures with more precision. Digital twin
technology estimates component lifespans and diagnoses problems to improve vehicle
performance and safety. There are also issues, like the scarcity of real datasets and the difficulty
of evaluating data that is only partially labelled.
The study focuses on predictive maintenance utilising machine learning approaches such as
Random Forest, Support Vector Machines, Artificial Neural Networks, and Gaussian Processes
in the quest for problem identification in automotive engine components. The training data
comes from a simulation testbed that replicates real driving conditions using industry-standard
testing cycles like as NEDC, EUDC, WLTP, and FTP-75. The testbed is used to diagnose
problems in turbocharged petrol engine systems. The purpose of the research is to reduce
pollutants, fuel
consumption, and maintenance costs while satisfying the automaker goals of longevity, safety,
and reliability.
Paper [4]: Use of ML Techniques for Li-Ion Battery Remaining Useful Life Prediction-A
Survey
Automobile stability and safety depend heavily on the performance and dependability of
lithium-ion batteries. Machine learning has been the subject of recent research aimed at
providing an accurate estimate of battery remaining usable life (RUL). Variables like as
voltage, current, and temperature are used as important markers of battery health in data-driven
approaches. Specifically, algorithms such as Decision Tree (DT) and Gradient Boosting (GB)
outperform other methods in RUL prediction on a wide range of performance criteria. These
methods provide excellent robustness, accuracy, and generality when assessing battery health
conditions. Testing confirms their effectiveness and shows how much better they are at
predicting battery RUL.
Paper [5]: Novel Method Based on Stacking Model for Remaining Useful Life Prediction
of Lithium-ion Batteries.
The study emphasises how important lithium-ion batteries are to cars because they are the main
component of energy storage in a variety of gadgets. These batteries deteriorate with repeated
charging, which affects dependability and performance over time. For Battery Management
Systems (BMS), it becomes essential to comprehend Remaining Useful Life (RUL). BMS can
predict faults, optimise maintenance, and extend battery life with accurate RUL prediction. In
order to improve battery health and resilience in the face of demanding operating conditions,
the study explores a variety of methodologies and Machine Learning (ML) techniques for RUL
prediction.
Lithium-ion batteries are essential for energy storage in a variety of industries. For sustainable
operation, it is essential to precisely predict their Remaining Useful Life (RUL). To anticipate
RUL, a unique approach that makes use of stacking models incorporates several health
indicators from battery performance data. For reliable predictions, this method combines
multiple linear regression, random forests, gradient boosting decision trees, and support vector
regression. The higher performance of the stacking model, with an 11.3 RUL prediction error,
Paper [7]: Data-driven strategies for predictive maintenance: Lesson learned from
automotive use case.
Although lithium-ion batteries are known for having a high energy density, their broad use is
hampered by dependability issues. In order to investigate this, a study analyses the first 100
cycles using multistage rapid charging data from MIT-Stanford datasets. Four essential
characteristics that are closely related to battery life are found by correlation research. With
modifications to training data and model types, these attributes are fed into a variety of machine
learning models. With an average error of 92.56 cycles, the top-performing model attains a
prediction error of 9.34%, indicating the effectiveness of this approach in precisely predicting
battery lifespan.
1.3 Motivation
The goal of this program is to address an acknowledged need in the automotive maintenance
sector for improved diagnostic and support systems, particularly in the areas of engine and
battery predictive maintenance. The automotive sector strives to maintain peak performance,
emphasizing the importance of identifying problems early and taking proper action to avoid
costly breakdowns. Outdated and poor reactive maintenance approaches are costly and reduce
vehicle performance. The impreciseness of current diagnostic techniques causes delays in both
maintenance and repair.
As a result, the goal is to apply cutting-edge machine learning and data analysis approaches to
create accurate predictive models of engine health and battery remaining usable life (RUL).
BMSIT & M, Dept of CSE 2023-2024 4
Predictive Maintenance for Automobiles Introduction
Predictive maintenance uses machine learning and data analytics to identify possible faults
before they become big problems, lowering the likelihood of costly repairs and downtime. The
goal is to provide trustworthy diagnostic tools to auto mechanics so that they can spot problems
early
and take appropriate action. This technique aims to reduce emissions and the environmental
impact of vehicle use by optimizing vehicle performance and maintenance schedules, while
also enhancing overall vehicle performance and economy.
The ultimate goal is to reduce operational costs while also improving vehicle dependability.
One of the project's objectives is to develop machine learning models that can accurately
predict when maintenance is required. These models will examine a variety of data sources,
including sensors, telematics systems, and maintenance records. The project's purpose is to
develop predictive maintenance procedures for various automotive parts, including brakes,
engines, and transmissions. The research is expected to produce significant benefits for the
automobile industry, such as greater driving enjoyment, overall sustainability, and efficiency in
vehicle maintenance and repair.
• Algorithm Selection and Training: Utilizing the prepared data, select suitable machine
learning algorithms (such as regression and classification) and train them. To find anomalies in
BMSIT & M, Dept of CSE 2023-2024 5
Predictive Maintenance for Automobiles Introduction
data streams, apply strategies such as machine learning algorithms or statistical methodologies.
• Model Evaluation: To make sure the models are effective, assess them using measures such
as accuracy, precision, recall, and F1-score. To assist users in being ready for maintenance or
replacement, the model should be able to estimate how long a machine will be useful. It ought
to be able to forecast the likelihood of a failure.
1.6 Scope
The project's scope includes developing predictive maintenance solutions for battery remaining
usable life (RUL) estimation and engine condition prediction utilizing machine learning
techniques and sensor data analysis. The initiative intends to optimize resource utilization,
decrease downtime, and lower maintenance costs across many industrial sectors, including
manufacturing, transportation, energy, and aerospace, by concentrating on proactive
maintenance tactics. The project intends to improve overall asset management procedures and
operational efficiency through data-driven decision-making processes.
1.7 Challenges
In order to improve the precision and effectiveness of predictive maintenance solutions and
facilitate the early detection of possible problems, it is imperative to integrate state-of-the-
art sensors, intelligent machinery, and sophisticated business analytics tools.
Even though predictive maintenance solutions have many long-term advantages, many find
themselves faced with the challenge of high upfront costs that necessitate careful
consideration of investment strategies and ROI analysis.
CHAPTER 2
OVERVIEW
Predictive maintenance is emphasized heavily in this project, especially in the area of engine
condition prediction, where supervised machine learning algorithms are used to generate
precise predictions on the condition of automobile engines. By carefully examining vital engine
characteristics such as RPM, oil pressure, temperature, and coolant pressure, the prediction
algorithm may foresee possible defects or indications of a decline in performance, allowing for
the implementation of preventive maintenance.
A crucial component of maintaining electric and hybrid vehicles, battery remaining usable life
(RUL) estimates are also explored in this study. The model can forecast how long a battery will
live by continually monitoring important battery health parameters like voltage and current.
This allows for timely interventions like replacements or maintenance to be performed. In
general, the project seeks to use the advantages of predictive maintenance in order to lower the
total cost of automobile ownership, improve vehicle reliability, encourage environmentally
friendly modes of transportation, and guarantee effective use of battery resources.
CHAPTER 3
REQUIREMENT SPECIFICATION
Domain Requirement: Data on engine health and battery performance must be accessible
from pertinent databases.
Software Requuirement: Installing Python and the required libraries, including NumPy,
Pandas, and Scikit-learn, is a software requirement for machine learning operations.
System Requirement: The predictive maintenance system's many components must establish
seamless communication with one another.
Cost Requirement: Taking into account the initial expenses related to putting the predictive
maintenance system into practice.
Requirement for precision: Reaching a high degree of precision in determining the health of
engines and calculating the batteries' remaining usable lives.
• Preprocessing of Data: Capacity to clean and prepare unprocessed data, including handling
outliers, missing values, and data normalization.
• Feature Engineering: This refers to techniques for feature extraction and selection that help
find pertinent features in data for predictive modeling
• Validation and Training of Models: The ability to use relevant evaluation measures to
validate the performance of machine learning models that have been trained on historical data.
To maximize model accuracy, use hyperparameter adjustment and cross-validation.
Portability: The system can be easily adopted and used on multiple platforms and systems
because it is built to be integrated with current infrastructure and implemented in a variety of
industrial settings.
Scalability: As the size of the dataset and user base increase, the system's ability to manage
growing volumes of sensor data and meet rising computational needs will guarantee its
continued effectiveness.
Flexibility: Users can modify parameters and algorithms to meet particular maintenance
demands and operating requirements thanks to the solution's adjustable configurations and
flexible models.
Security: Throughout the predictive maintenance process, strong security measures are put in
place to safeguard private information and guarantee its availability, confidentiality, and
integrity.
Model Training: It should be possible for users to choose and hone various machine learning
models. Model evaluation and hyperparameter tuning should be possible with the system.
Prediction and Visualisation: Using trained models, users ought to be able to make
predictions. The system ought to produce visual aids that aid users in comprehending model
performance and data-driven insights.
Interactivity and Usability: The user interface should be simple to use and intuitive, making it
appropriate for users with different degrees of technical proficiency. It should be possible for
users to modify and engage with visualisations.
Access to battery performance data: Datasets with measures for battery performance, such
as cycle index, discharge time, voltage levels, and charging duration, are accessible.
Data Preprocessing Expertise: The ability to prepare datasets for analysis using methods
including cleaning, normalization, and feature extraction.
Machine Learning Skills: Proficiency in machine learning algorithms and methodologies for
model training, assessment, and forecasting is referred to as machine learning skills.
CHAPTER 4
DETAILED DESIGN
The Fig 4.1 represents the system architecture for predictive maintenance on engines and
batteries is depicted in the system architecture diagram. Initial data preprocessing include
cleaning, handling of missing values, and normalization of raw datasets containing engine and
battery sensor data. To improve prediction accuracy, feature engineering then takes the
pertinent data and builds new features. Model selection then finds appropriate machine learning
techniques for estimating battery remaining useful life and predicting engine condition. Using
the preprocessed data, the selected models are trained, and their parameters are changed to
maximize their performance. In order to guarantee the dependability and efficiency of the
predictive maintenance system for engine and battery components, the model evaluation
process evaluates the prediction accuracy using pertinent criteria.
Fig 4.2 Represents the work flow model which contains the following modules.
Data Collection
Identifying relevant data sources, such as engine sensor data (e.g., temperature,
pressure, RPM), maintenance logs or records. Historical performance data External
datasets on engine status or battery conditions.
Data Preprocessing
Clean up the data by fixing issues like missing values. Handling outliers: Identify,
eliminate, or rectify any outliers that might have a negative impact on model
performance. Data normalization or scaling: Scale numerical characteristics to a similar
range to prevent larger magnitudes from dominating the model. Perform data
transformations and encoding as needed. Convert categorical variables to numerical
representations using one-hot or label encoding. If the feature scales differ, scale them.
To analyze model performance, divide the data into training and testing sets and use
methods like train-test split or cross-validation.
Feature Engineering
Examine the data to identify any potentially relevant characteristics or modifications. Create
new features based on domain expertise or insights gathered via data study. To extract crucial
information from raw data, generate rolling averages or aggregate data over time intervals.
Select the most relevant characteristics for the target variable and remove those that are
redundant or unnecessary.
Model Selection
A variety of machine learning algorithms must be explored for the engine and battery
predictive maintenance project, including classification (engine condition prediction) and
regression (battery remaining usable life prediction). Random Forest, Logistic Regression, and
Gradient Boosting algorithms are appropriate for engine condition prediction due to their
ability to deal with categorization issues. Random Forest Regression, XGBoost Regression, and
Decision Tree Regression are good algorithms for forecasting the remaining usable life of
batteries.
Fig 4.3 displays the activity diagram and contains the following modules.
Collection of engine and battery data: The initial stage of the procedure involves gathering
data from the engine and battery. Sensor readings like voltage, current, and Engine RPM , Fuel
pressure and so on are included in this data.
Data preprocessing: To get the data ready for the machine learning model, it is first
preprocessed. This entails scaling, eliminating outliers, and cleaning the data.
Splitting the dataset into training and test data: Next, two sets of the preprocessed data are
created: a test set and a training set. The machine learning model is trained on the training set,
and its performance is assessed on the test set.
Training the model using training data: Subsequently the training data is used to train the
machine learning model. The RUL of a battery can be predicted by the model by recognising
patterns in the data.
Making prediction on test data: After the model has been trained, test data predictions are
made using it. Every battery in the test set has its RUL predicted by the model.
Engine health condition: The model predicts whether or not the engine needs maintenance. It
then makes a decision.
Battery RUL Prediction: The battery's replacement status is ascertained by the model based
on its prediction.
The fig 4.4 represents the use case diagram which depicts the interaction between the system
administrator and its functionalities. The process is started by the administrator loading the
dataset, which includes pertinent sensor data for the battery and engine components. The
system can then forecast two primary conditions: the health of the engine and the amount of
battery life left in it, thanks to the admin's application of classification and regression
BMSIT & M, Dept of CSE 2023-2024 17
Predictive Maintenance for Automobiles Detailed Design
algorithms to the dataset. After making a prediction, the administrator monitors a number of
metrics to assess the
predictive maintenance system's performance and accuracy. These metrics aid in directing
decision-making for maintenance actions and offer insights into how effective the algorithms
are.
Fig 4.5 displays the sequence diagram, in which the user begins the process by collecting
various data attributes from automobiles linked to the engine and battery parameters. The
dataset is made up of all these properties. After the information has been collected, it is used to
extract knowledge and perform classification tasks based on the characteristics. During
classification, the dataset's characteristics are classified and labelled using established criteria
or techniques. Following the classification process, classification results are generated using the
dataset's characteristics.
The dataset's performance is evaluated at the conclusion of the classification procedure and
result generation. The categorization process's performance is evaluated using measures such as
recall, accuracy, precision, and other relevant metrics. The user is subsequently notified with
Fig 4.6.1 DFD-0 provides an overview of the system architecture and data flow at a high level.
This graphic depicts how the machine learning model receives input parameters from the
engine dataset, such as temperature measurements, engine RPM, fuel pressure, coolant
pressure, and lubricating oil pressure. These parameters are the qualities or attributes used by
the model to make predictions.
After the data has been entered into the model, the machine learning algorithms process it and
provide output predictions. It demonstrates how the system determines how accurate the
forecasts are. This evaluation step is critical for establishing the predictive maintenance
system's dependability and performance.
The DFD-1 diagram in Figure 4.6.2 displays the engine dataset's detailed process flow inside
the predictive maintenance system. First, the raw input data from the engine dataset is
preprocessed. Following preprocessing, the preprocessed data is passed to the feature
engineering phase. These characteristics are intended to discover significant patterns and
correlations in the data.
The dataset is then separated into two categories, training and testing. The training dataset is
input into the appropriate machine learning algorithms to train the model. This procedure
entails fitting the algorithms to the data in order to uncover patterns and relationships that will
allow for precise predictions. Concurrently, the testing dataset is used for model assessment.
The model's performance and efficiency are evaluated using metrics like as accuracy, precision,
recall, and mean squared error.
CHAPTER 5
IMPLEMENTATION
The programming language used in the project is Python. Python is the endeavor's core
programming language, offering a strong basis for data analysis, preprocessing, and machine
learning activities. Python is known for its simplicity and readability, as well as its
straightforward syntax, which makes code development and maintenance more efficient.
Because of its adaptability, it can be used to apply different algorithms and methodologies,
which makes it appropriate for a wide range of data science and machine learning applications.
Its adaptability allows for the implementation of a variety of methods and methodologies,
making it suited for a wide range of data science and machine learning applications.
Python's dynamic typing and automatic memory management make development processes
more efficient, enabling programmers to solely concentrate on problem solving instead of low-
level minute details. Furthermore, Python's huge standard library includes a wide range of
built-in functions and modules, which improves its capabilities for data manipulation, file
management, and other tasks. The provided code snippets are written in Python. Python is a
high-level, interpreted programming language renowned for its accessibility, readability, and
adaptability. It supports a variety of programming paradigms, including procedural, object-
oriented, and functional programming, making it useful for a wide range of applications. It is
popular in industry, academia, and research due to its ease of use and widespread community
support.
Python offers a wide range of frameworks and libraries for a variety of applications, including
data analysis, machine learning, web development, and scientific computing. The project makes
use of several built-in library functions, including:
NumPy (np): The foundational module for Python scientific computing is called NumPy.
Large multi-dimensional arrays and matrices are supported, and a number of mathematical
operations are available for effective manipulation of these arrays.
Pandas (pd): Pandas is an effective Python data analysis library. It provides data structures that
make data management, exploration, and cleaning simple, such as DataFrame and Series.
TensorFlow (tf): Google developed this open-source machine learning framework. In addition
to support for distributed computation and deployment across several platforms, it offers tools
for creating and refining deep learning models.
Matplotlib (plt): Matplotlib is a feature-rich Python visualization toolkit for static, animated,
and interactive graphics. Numerous other types of plots, charts, and graphs can be produced
using it.
train test split: The Scikit-learn library's train test split function divides a dataset into training
and testing sections. It is frequently used to assess model performance in machine learning.
Pipeline: Scikit-learn's Pipeline module facilitates the chaining of several estimators into a
single pipeline. Building models and processing data sequentially can both benefit from this.
5.2 ALGORITHMS
The following are the algorithms used to predict the Engine condition and Battery RUL
Random Forest Algorithm: A flexible and effective ensemble learning method for both
classification and regression applications is the Random Forest algorithm. In order to lower the
danger of overfitting and increase the accuracy of the model, it works by building numerous
decision trees during the training phase and outputting the average forecast of the individual
trees. The robustness and efficacy of Random Forest are well-known, especially for managing
a variety of applications since it can handle both numerical and categorical features without
requiring a great deal of preprocessing.
The capacity of Random Forest to capture intricate nonlinear correlations and interactions
between characteristics is one of its main advantages. This makes it an excellent choice for
situations in which the underlying data distribution is difficult to separate linearly. Furthermore,
Random Forest gives an inherent measure of feature relevance, which enables users to learn
which features have the most influence on predictions.
Pseudocode
Decision Tree Algorithm: Ideal for both regression and classification problems, the Decision
Tree algorithm is a flexible and easy-to-understand machine learning technique. It creates a
tree-like structure with internal nodes representing features and leaf nodes representing class
Decision trees are especially helpful for comprehending the underlying reasoning behind a
model's predictions because of their high interpretability. The decision path leading to a
specific outcome can be readily traced by following the branches of the tree, which makes it
useful for both explanatory and predictive purposes.
Decision trees' capacity to handle both numerical and categorical data without requiring a lot of
preprocessing is one of its main advantages. They can also handle missing values by using
methods like imputation or surrogate splits, and they are resistant to outliers .Decision trees are
a simple tool for modelling complex decision boundaries because they can capture complex
nonlinear relationships in the data. Equ (1) represents the Gini index.
n
Gini Index=1−∑ ¿ ¿
i=1
¿ 1−¿ (1)
where,
P_+ is the probability of positive class and
P_- is the probability of negative class
Pseudocode
GenDecTree(Engine_Data S, Engine_Features F) Steps:
1. If stopping_condition(Engine_Data S, Engine_Features F) = true then
a. Leaf = createNode()
b. leafLabel = classify(s)
c. return leaf
2. root = createNode()
3. root.test_condition = findBestSplit(Engine_Data S, Engine_Features F)
4. V = {v | v is a possible outcome of root.test_condition}
5. For each value v in V:
a. S = {s | root.test_condition(s) = v and s belongs to Engine_Data S}
b. Child = TreeGrowth(S, F)
c. Add child as descendant of root and label the edge {root → child} as v
6. Return root
XGBoost's performance and speed optimisation is one of its primary features. To improve
computational efficiency and reduce overfitting, it makes use of a number of strategies,
including regularisation, tree pruning, and parallel processing. Because of this, XGBoost works
especially well with large-scale, highly dimensional datasets.
XGBoost works by iteratively adding new decision trees to the model, each one focusing on the
residuals or errors of the previous trees. It optimizes the model's performance by minimizing a
specified loss function through gradient descent, effectively learning from the mistakes of the
previous trees and improving predictive accuracy with each iteration. Equ (2) represents the
XGBoost objective function..
n
L =∑ l ( yi , ^y (ti −1)+ f t ( x i ) ) +Ω(f t )
(t )
(2)
i=1
seen as f(x+△x)
where ,
(t−1)
x= ^y i
The Bagging Regressor operates by bootstrapping—a technique in which samples are selected
at random with replacement—to produce several subsets of the training data. A base regressor
model, like a decision tree or random forest, is then trained using each subset. To produce the
final result during prediction, the Bagging Regressor combines each of these base models'
individual predictions.
Reducing overfitting and variance in the model is one of the main benefits of using a bagging
regression. Bagging helps to capture various facets of the underlying data distribution and
Pseudocode
Input: Data set battery_dataset={(x_1,y_1 ),(x_2,y_2 ),….,(x_n,y_n )};
Basic learning algorithm L;
Number of learning rounds K.
Process:
for k=1,…,K:
〖battery〗_k=Bootstrap(TR) %Generate a bootstrap sample from battery_dataset
h k=L(battery) %Train a base learner h_(k ) from the bootstrap sample
End
K
Output: H (x )=argmax y∈Y ∑ l ( y h k ( x ) ) % the value of l ( a )
k=1
Logistic Regression: When there are only two possible outcomes for the target variable, binary
classification tasks can be solved statistically using logistic regression. Logistic regression is
mostly used for classification, not regression, despite its name. It employs a logistic function,
commonly referred to as the sigmoid function, to model the likelihood that an instance belongs
to a specific class.
The logistic function, which maps the output to a value between 0 and 1, is applied after the
input features are linearly combined with weights in logistic regression. The likelihood that the
instance belongs to one of the two classes is indicated by this value. The instance is usually
classified into the positive class if the probability is greater than a predetermined threshold
(e.g., 0.5); if not, it is classified into the negative class.
The ease of use and interpretability of logistic regression is one of its main benefits. The
relationship between the input features and the binary outcome is clearly understood by using
the model. Furthermore, logistic regression is flexible for handling a variety of data types
because it can handle both numerical and categorical features. Equ (3) Represents the equation
of logistic regression and Equ (4) Represents sigmoid function of logistic regression.
1
f ( x )= −2 (3)
1+e
where,
BMSIT & M, Dept of CSE 2023-2024 27
Predictive Maintenance for Automobiles Implementation
e = base of natural logarithms
value = numerical value one wishes to transform
(b0 +b1 X )
e
y= (b + b X ) (4)
1+b 0 1
where,
x = input value
y = predicted output
Extra Trees Regressor: The Extra Trees Regressor is critical in forecasting the residual usable
life (RUL) of lithium-ion batteries used in automobile vehicles. As a member of the ensemble
learning family, the Extra Trees Regressor works similarly to random forests, generating
numerous decision trees to produce predictions. Unlike standard random forests, Extra Trees
adds unpredictability by setting random thresholds for each feature and building each tree from
the full dataset. This technique increases tree variety, which may reduce overfitting while
enhancing robustness.
One notable advantage of the Extra Trees Regressor is its ability to properly handle large
datasets with noisy or irrelevant characteristics. In our project, where we deal with a large
amount of sensor data collecting various engine health measures, this functionality is vital. The
Extra Trees Regressor, by exploiting randomization in feature selection and tree construction,
may successfully capture complicated correlations between input features and the target
variable, such as engine condition or remaining usable life.
Furthermore, the Extra Trees Regressor strikes a compromise between bias and variance, which
makes it appropriate for the battery health prediction job. With its robust performance and
efficient computational characteristics, the Extra Trees Regressor emerges as a reliable choice
for accurately estimating the remaining useful life of lithium-ion batteries, significantly
improving the predictability and security of battery-powered automobiles.
Data collection: Collect engine parameter data, such as rpm, coolant pressure, fuel pressure,
lubricating oil temperature, and coolant temperature, from sensors mounted in cars or other
Data Preprocessing: To guarantee consistency and dependability in the data, carry out
preprocessing procedures such as addressing missing values, eliminating outliers, and
standardizing characteristics for both the engine and battery datasets.
battery datasets. This study will aid in feature engineering and selection as well as provide
insights into the properties of the data.
Feature Engineering: Create new features for both datasets that could improve the models'
capacity for prediction. Features such as rolling statistics or ratios between parameters could be
helpful for the engine dataset. Features that capture patterns of degradation, like trend analysis
or time-series decomposition, can be built for the battery dataset.
Model Selection: Determine the best machine learning models for both tasks depending on the
nature of the data and the situation at hand. Decision Trees, Random Forests, and Support
Vector Machines may all be used to anticipate engine conditions. For battery RUL prediction,
regression models like as Random Forest Regression, Gradient Boosting Regression, and Long
Short-Term Memory (LSTM) networks for time-series data may be appropriate.
Hyperparameter Tuning: Employ approaches like as grid search or random search to increase
model performance and generalization.
Model Evaluation: Utilize appropriate evaluation metrics like accuracy, precision, recall, F1-
score, mean squared error (MSE), root mean squared error (RMSE), or mean absolute error
(MAE) to assess the performance of the trained models for engine condition prediction and
battery RUL prediction.
5.4 CODE
The following code snippets is to predict the Engine condition
3. Data Preprocessing : create a custom transformer to add new attributes based on engine
features.
class AttributesAdder(BaseEstimator, TransformerMixin):
4. Pipeline for adding new properties and standardizing features to automate data preparation
engine_prep_pipe = Pipeline([
("attr_adder", AttributesAdder()),
("std_scaler", StandardScaler())
])
engine_data_prepared = engine_prep_pipe.fit_transform(engine_features.values)
BMSIT & M, Dept of CSE 2023-2024 30
Predictive Maintenance for Automobiles Implementation
engine_data_prepared[0, :]
log_reg.fit(X_train, y_train)
validation = log_reg.predict(X_test)
score = sum(validation == y_test)
print(f"Score: {score / len(y_test)}")
2. Bagging Regressor
br = BaggingRegressor(random_state=2301)
br.fit(X_train, y_train)
print(br.score(X_train, y_train))
print(br.score(X_test, y_test))
br_pred = br.predict(X_test)
br_rmse = np.sqrt(mean_squared_error(y_test, br_pred))
print(br_rmse)
CHAPTER 6
TESTING
6.1 Testing in Machine Learning
Machine learning is the method of using algorithms and analytics to make a machine learn on
its own without being explicitly programmed. Computers rely on an algorithm that employs a
mathematical framework. In the context of Machine Learning models, the term "testing" is
largely used to assess model performance in terms of accuracy/precision. It should be
emphasized that the term "testing" has various meanings for traditional software development
and machine learning model creation.
As mentioned before, typical unit/integration testing does not work with machine learning
models, so they undergo testing based on accuracy and prediction.
Accuracy: It is one parameter used to evaluate classification methods. Informally, accuracy
refers to the percentage of correct predictions made by our model. Formally, accuracy is
defined as follows:
Accuracy = the number of correct forecasts divided by the total number of forecasts
Accuracy = TP + TN/TP + TN + FP + FN
Where
TP stands for True Positives,
TN for True Negatives,
FP for False Positives, and
FN for False Negatives.
Precision and recall are frequently employed as metrics to assess categorization methods.
Precision (also known as positive predictive value) is the proportion of relevant examples
among the retrieved instances, whereas recall (also known as sensitivity) is the fraction of the
total number of relevant instances that were actually recovered.
Precision equals TP/TP + FP.
Recall is TP/TP + FN.
A test dataset is one that is separate from the training dataset but has the same probability
distribution as the training dataset. If a model fits well on the training dataset, it should likewise
fit well on the test dataset. Thus, by comparing the expected and observed values, we may
determine how well our model works.
Machine learning testing starts off with data preparation, which involves cleaning and
preprocessing the engine health and battery RUL datasets. We divided the data into training and
testing sets to ensure that both sets were descriptive of the full distribution. The models, such as
BMSIT & M, Dept of CSE 2023-2024 35
Predictive Maintenance for Automobiles Testing
logistic regression for engine condition prediction and regression models for battery RUL
prediction, are evaluated using testing data after they have been trained on the data used for
training. Performance indicators including as accuracy, precision, recall, and F1-score are used
to evaluate the models' prediction ability.
Overall, testing in this project takes an intensive approach to ensuring the reliability, accuracy,
and interpretability of the machine learning models used for engine state prediction and battery
RUL calculation. It intend to develop robust prediction models that may successfully support
predictive maintenance efforts in automotive systems through rigorous testing and evaluation.
The aims and purposes of the testing procedure throughout the project or system development
lifecycle are included in the testing objectives. The software or product being tested's
functionality, quality, and dependability are all intended to be ensured by these goals. These
usually involve confirming that the program satisfies predetermined requirements, locating
errors or malfunctions, evaluating performance in different scenarios, guaranteeing usability
and accessibility, and confirming adherence to rules and guidelines. Evaluation of the
software's security, scalability, and interoperability as well as an assessment of its general
preparedness for deployment and use in real-world circumstances are additional testing
objectives. In the end, testing goals help to reduce risks, raise user satisfaction, and increase the
software's quality and dependability.
The project's testing goals include a comprehensive assessment of the battery RUL prediction
algorithms and engine condition models. This entails carefully examining how well predictions
match known data, confirming that the models can generalize well to new datasets, and
evaluating how robust the models are to fluctuations and noise in the input data. The models are
evaluated using metrics such as accuracy, precision, recall, and error rates in order to determine
which methods are the best fit. Furthermore, extensive real-world simulation tests and
deployment readiness evaluations are carried out to guarantee the dependability and usefulness
of the models in operational contexts. The project's goal is to provide reliable and accurate
predictive models for battery RUL estimate and engine condition through these testing
methodologies.
The following are the testing objectives for the engine condition and the battery RUL .
Prediction correctness: By contrasting the predicted values with the actual conditions and
RUL values from the test dataset, assess the correctness of the battery RUL and engine
Model Generalization: Evaluate the models' capacity for generalization by putting them to the
test on unobserved data. To check if the models can accurately forecast fresh, unseen data,
divide the dataset into training and test sets. Then, assess the models' performance on the test
set.
Robustness Testing: Determine whether the models are resilient by varying or adding noise to
the input data and assessing how well they work. This helps ascertain whether the models'
predicted accuracy can be considerably affected by changes in ambient factors, sensor readings,
or battery usage habits.
Metrics of Performance: To assess the efficacy of the models and pinpoint areas for
development, compute and examine performance metrics such as accuracy, precision, recall,
F1-score, mean squared error (MSE), root mean square error (RMSE), and mean absolute error
(M AE).
Unit Testing: In order to make sure the software works as intended, each function, method, or
module is tested separately at this phase. The primary goal of unit tests is to confirm that brief,
discrete segments of code are valid. To verify that each unit functions independently of the rest
of the system, developers create unit tests. This guarantees that every component functions as
intended and aids in the early detection of bugs during the development process.
System Testing: A critical stage of the software development lifecycle is system testing,
during which the system as a whole is assessed to make sure it satisfies requirements and
operates as intended. During this testing step, the behavior of the system as a whole is
evaluated instead of that of its constituent parts or units. Testing the integrated system's
interfaces, data flows, and interactions with other people or systems is known as system testing.
System testing assesses how well the complete predictive system functions and behaves in
terms of predicting engine status and battery RUL. In order to make sure that all of the parts—
predictive models, user interfaces, and data preprocessing—integrate as planned, testing is done
at this point. System tests verify that the program satisfies predetermined requirements and
operates correctly in a range of scenarios by examining end-to-end scenarios and user
interactions. During this testing step, the behavior of the system is verified holistically to make
sure it satisfies user expectations and makes accurate predictions.
Integration Testing: A vital phase of developing software is integration testing, which verifies
the smooth operation and interface between separate units or components. Its main objective is
to make sure that the integrated components function as a whole, identifying problems with
data transfer, communication protocols, and integration points. Integration testing uses several
techniques, such as top-down or bottom-up testing, to identify bugs in software connections
early on, reducing risks and making sure the program satisfies both functional as well as non-
functional criteria.
This process ensures smooth operation and accurate data interchange by confirming the
interactions and data flow between various software components. The integration of different
algorithms, preliminary data processing procedures, and other modules involved in predicting
engine status and battery RUL would be validated by integration testing in our project. In order
to verify the overall functionality of our predictive system, these tests concentrate on
identifying any problems, such as inappropriate designs or communication faults between
various elements of the system.
Regression testing verifies that recent software upgrades or updates have not caused any
unforeseen side effects or regressions in functionality. In our project, regression testing is
rerunning previously conducted test cases to identify any differences in anticipated engine
condition or battery RUL induced by recent code modifications. This testing process
contributes to the stability and reliability of the predictive system by discovering and dealing
with any issues that arise during development or upgrades, to make sure the software maintains
to work as planned throughout time.
Result Successful
• Model Creation: Creates models using algorithms and datasets. The expected
output is a successfully developed model with a successful outcome.
Table 6.4.2: Test cases for Model creation
Test Case Sl. No 2
Test Name Model creation
Result Successful
Result Successful
• Engine Health and Battery RUL Prediction: Predicts the engine health and
Remaining Useful Life (RUL) of the battery based on the trained models. The
expected outcome is to evaluate the health status and RUL of the battery and the
outcome is successful.
Table 6.4.4: Test cases for Engine health and Battery RUL prediction
Test Case Sl. 4
No
Test Name Engine Health and Battery RUL Prediction
Result Successful
CHAPTER 7
EXPERIMENT RESULTS
Sl. Cycle Discha Decrement Max. Min. Time at Time Charging RUL
no Index rge 9.9-9.35V Voltage Voltage 11.41V (s) constant time(s)
Time (s) Discharg. Charg. current
(s) (V) (V) (s)
0 1 7138.11 3167.05 10.093 8.83 15017.19 18578.98 29643.32 3058.44
std 322.35 8.99+04 4.12e+04 0.25 0.34 25089.67 6.78e+04 7.14e+04 886.76
min 1.00 2.39e+01 -1.09e+06 8.36 8.31 -312.40 1.64e+01 1.64e+01 0.00
25% 271.00 3.21e+03 8.79e+02 10.57 9.59 5030.16 7.05e+03 2.15e+04 761.86
50% 560.00 4.28e+03 1.20e+03 10.74 9.82 8059.19 1.05+04 2.28e+04 1515.47
75% 833.00 5.24e+03 1.65e+03 10.92 10.07 11244.51 1.37e+04 2.41e+04 2307.58
max 1134.00 2.63e+06 1.11e+06 12.00 12.04 674126.38 2.42e+06 2.42e+06 3116.20
Data for various battery operation cycles, such as discharge duration, voltage decreases, and
residual usable life (RUL), are shown in fig 7.1.With respect to time constants, charging time,
maximum and minimum voltage, and other characteristics, each row denotes a cycle. The
information sheds light on battery depreciation and performance across several cycles.
The overview of data for several battery performance characteristics, such as cycle index,
discharge duration, voltage decreases, and remaining usable life (RUL), is shown in Fig. 7.2. It
error.
Sl. no Engine Lub oil Fuel Coolant Lub oil Coolant Engine
rpm pressure pressure pressure temp temp Condition
The Random Forest Regressor is an ensemble learning system that utilizes decision trees. It
involves training numerous trees on arbitrary subsets of data and averaging their predictions. It
is renowned for its extreme adaptability, resilience to overfitting, and capacity to work with big,
highly dimensional datasets.
The fig 7.7 summarise the statistics of engine dataset. The dataset's seven variables—Engine
rpm, Lub oil pressure, Fuel pressure, Coolant pressure, Lub oil temperature, Coolant
temperature, and Engine Condition—are summarized in this summary statistics table. For every
variable, it provides the following information: count, mean, standard deviation, minimum,
25th percentile (Q1), median (50th percentile), 75th percentile (Q3), and maximum. These
statistics provide information about the distribution, variability, and central tendency of the
data.
Engine Lub oil Fuel Coolant Lub Coolant Engine Coolant Oil
rpm pressur pressure pressure oil temp Condition Efficiency Efficiency
e temp
0 700 2.49 11.79 3.17 84.14 81.63 1 0.11 0.000017
The feature engineering with the two additional columns is shown in fig. 7.8.To compute the
Coolant Efficiency column, divide 1 by the "Engine rpm" and multiply the resulting number by
the Coolant temperature column. To calculate the "Oil Efficiency" column, divide 1 by the sum
of the "Engine rpm" and "lub oil temp" columns. These computed efficiency metrics offer more
information about how well the coolant and lubricating oil systems work to support engine
operation and maintain proper temperatures.
This code initializes a RandomForestClassifier model using its default settings. The features
(X_train) and matching target labels (y_train) are then used to train it. Following training, the
model is applied to predict the target labels for the validation set (X_test), and the difference
between the predictions and the actual labels (y_test) yields the accuracy score. Lastly, a
performance evaluation of the trained model on the validation set is provided using the
accuracy score.
An XGBoost model is first trained with default settings in this snippet of code, and its accuracy
is assessed on a test set. GridSearchCV is then used to search over a predetermined grid of
hyperparameters in order to optimize the model's performance through hyperparameter tuning.
After determining the optimal hyperparameters, a new XGBoost model is trained with them.
Lastly, the tuned model's accuracy is evaluated on the test set, offering an understanding of the
enhancement made possible by hyperparameter optimization.
CHAPTER 7
CONCLUSION
Through the use of machine learning and data-driven methodologies, the predictive
maintenance project for cars seeks to transform the automotive sector. The project improves
vehicle performance, safety, and reliability by predicting potential failures and maintenance
needs by analysing sensor data from engine and battery systems.
In order to predict engine health, the system examines variables like engine RPM, fuel pressure,
lubricating oil pressure, and temperature. It then looks for anomalies and sends out early
warnings for maintenance or repairs. In contrast, the battery management system data is used
by the system to predict the battery's remaining useful life (RUL) and plan proactive
maintenance, which guarantees optimal battery performance and dependability.
A methodical approach is employed in the project, which includes phases for data collection,
preprocessing, feature engineering, model selection, training, and evaluation. Several machine
learning algorithms, including XGBoost, Random Forest, Logistic Regression, and Bagging
Regressor, are utilised in the development of precise predictive models for battery RUL
prediction and engine health prediction.
The automotive sector stands to gain from the application of these predictive maintenance
models in the form of decreased repair costs, decreased downtime, and enhanced driver and
passenger safety. The project lays the groundwork for an ecosystem of automotive maintenance
that is more dependable and efficient by utilising data analytics and machine learning.
CHAPTER 8
FUTURE ENHANCEMENTS
Predictive maintenance automation: It can reduce manual labour and streamline maintenance
workflows by implementing automated maintenance scheduling and intervention systems based
on predictive models. Optimising operational efficiency and cost-effectiveness can be achieved
through integration with current automotive maintenance systems and procedures.
Extension to Fleet Management: Logistics firms and operators of commercial vehicles may
profit from the addition of predictive maintenance features to fleet management software. By
maximising vehicle utilisation, reducing downtime, and optimising fleet operations, predictive
maintenance models can reduce costs and raise customer satisfaction.
REFERENCES
[1] Andreas Theissler , Judith Pérez-Velázquez, Marcel Kettelgerdes, Gordon Elger Elsevier-
Reliability Engineering & System Safety (2021) Predictive maintenance enabled by machine
learning: Use cases and challenges in the automotive industry.
[2] I. Fabio Arena , Mario Collotta , Liliana Luca , Marianna Ruggieri and Francesco Gaetano
Termine Statistical and Stochastic Approaches for Predictive Maintenance in the Context of
Industry 4.0 (2021) Predictive Maintenance in the Automotive
[3] Iron Tessaro,Viviana Cocco Mariani and Leandro dos Santos Coelho First International
Electronic Conference on Actuator Technology: Materials, Devices and Applications(2020)
Machine Learning Models Applied to Predictive Maintenance in Automotive Engine
Components.
[4] Tiwari, A., Varshini, C. R. A., Jha, A., Annamalai, K. R., Deepa, K., & Sailaja, V. (2023).
Use of ML Techniques for Li-Ion Battery Remaining Useful Life Prediction-A Survey. In 2023
Fifth International Conference on Electrical, Computer and Communication Technologies
(ICECCT).
[5] Li, Z., Shi, Q., Xia, J., Wang, K., & Jiang, K. (2023). Novel Method Based on Stacking
Model for Remaining Useful Life Prediction of Lithium-ion Batteries. In 2023 26th
International Conference on Electrical Machines and Systems (ICEMS).
[6] Iron Tessaro,Viviana Cocco Mariani and Leandro dos Santos Coelho First International
Electronic Conference on Actuator Technology: Materials, Devices and Applications(2020)
Machine Learning Models Applied to Predictive Maintenance in Automotive Engine
Components.
[7] Danilo Giordano, Flavio Giobergia, Eliana Pastor, Antonio La Macchia, Tania Cerquitelli
Elsevier: Computers in Industry (2021) Data-driven strategies for predictive maintenance:
Lesson learned from automotive use case.
[8] Wu, Z, Jia, J, Liu, Y, Qi, Q, Yin, L, & Xiao, W. (2022). Prediction of Battery Remaining
Useful Life Based on Multi-dimensional Features and Machine Learning. In 2022 4th
International Conference on Smart Power & Internet Energy Systems (SPIES).
[9] Aydin, O., & Guldamlasioglu, S. (2017). Using LSTM networks to predict engine
condition on large scale data processing framework. In 2017 4th International Conference on
Electrical and Electronic Engineering (ICEEE).
[10] Sahasrabudhe, N., Asegaonkar, R., Deo, S., Umredkar, S., & Mundada, K. (2020).
Experimental Analysis of Machine Learning Algorithms used in Predictive Maintenance. In
2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT)
IEEE.
[12] S. Jafari and Y.-C. Byun, "Optimizing Battery RUL Prediction of Lithium-Ion Batteries
Based on Harris Hawk Optimization Approach Using Random Forest and LightGBM," in IEEE
[13] J. Hu, Y. Lu, and B. Lin, "RUL Prediction for Lithium-ion Batteries Using Combination
Forecasting based on SVR and LSTM," in 2021 China Automation Congress (CAC), IEEE
[14] Gou, B., Xu, Y., & Feng, X. (2020). State-of-Health Estimation and Remaining-Useful-
Life Prediction for Lithium-Ion Battery Using a Hybrid Data-Driven Method. IEE
[15] Wu, Y., Li, W., Wang, Y., & Zhang, K. (2019). Remaining Useful Life Prediction of
Lithium-Ion Batteries Using Neural Network and Bat-Based Particle Filter. IEEE