Machine Learning Based Battery Pack Health Prediction Using Real-World Data
Machine Learning Based Battery Pack Health Prediction Using Real-World Data
Energy
journal homepage: www.elsevier.com/locate/energy
Machine learning based battery pack health prediction using real-world data
Yin-Yi Soo, Yujie Wang ∗, Haoxiang Xiang, Zonghai Chen
Department of Automation, University of Science and Technology of China, Hefei, 230027, China
Keywords: The complex operational conditions in real-world electric vehicles (EVs) contribute to the complexity of
Electric vehicles managing and maintaining battery packs. Adding to these challenges is the intricate task of modeling the
Battery pack inconsistent coupling among individual cells within these packs. This study addresses the ongoing challenges
Real-world operating
in modeling lithium-ion battery (LIB) cells within packs and estimating their state of health (SOH) for practical
Inconsistent coupling
applications. This research proposed a PCA-CNN-Transformer method to model and predict the SOH model of
Health prediction
real-world EV. Three main contributions are presented: a novel approach to defining an attenuation SOH model
based on delivered energy, a methodology utilizing Principal Component Analysis (PCA) for cell modeling,
and an SOH estimation model employing CNN-Transformer architecture. To address both pack and cell-level
modeling, a hierarchical feature extraction approach is proposed. The health features extracted from both
levels are assessed using grey relational analysis, showing a strong correlation with LIB SOH, exceeding
0.70. The proposed cell modeling method significantly reduces data size by 96%, enhancing computational
efficiency. Furthermore, the integration of 1D-CNN in the SOH estimation model overcomes the limitations
of the attention mechanism, achieving a MAE with 0.0406 and r-square of 0.9327, improved the original
transformer network performance by 10.95%. This study also examines and discusses the performance of the
informer and transformer models, elaborating why the informer model underperformed in this dataset.
∗ Corresponding author.
E-mail addresses: [email protected] (Y.-Y. Soo), [email protected] (Y. Wang), [email protected] (H. Xiang), [email protected]
(Z. Chen).
https://doi.org/10.1016/j.energy.2024.132856
Received 28 April 2024; Received in revised form 6 June 2024; Accepted 16 August 2024
Available online 17 August 2024
0360-5442/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Y.-Y. Soo et al. Energy 308 (2024) 132856
capacity analysis (ICA) method for pack-level battery SOH estimation in NASA LIB [27]. This research utilizes CNN for parameter optimization
the real world while considering cell inconsistency [8]. The definition on experimental data but does not extend the research to LIB pack and
of SOH based on battery impedance is difficult when the battery has real-world data. However, despite the Transformer network emerging
been used for some time due to an irreversible aging process. To as a promising new RNN-based model for capacity estimation, its
overcome the weakness of SOH based on internal resistance, Chen attention mechanism has limitations in capturing spatial relationships
et al. proposed a LIB SOH based on ohmic internal resistance which within datasets. This is because the attention mechanism utilized in
was evaluated from the correlation between capacity and internal re- transformers assigns weights to all input positions, making it difficult
sistance [9]. Nevertheless, traditional SOH definitions face constraints to specifically focus on local variables or spatial relationships. To
due to the computational limitations of BMS and are not easy to address this challenge in estimating the SOH of real-world LIB packs,
employ in real applications. Addressing this concern, Xu et al. proposed this study proposes a modified transformer network. The proposed
a battery pack SOH based on maximum energy storage [10]. Hong method aims to overcome the limitations of the attention mechanism
et al. formulated several SOH definitions for battery packs based on and demonstrate effectiveness in real-world SOH estimation.
incremental capacity, mileage, open circuit voltage, and time required
to charge [11]. Tian et al. introduced an attenuation SOH model for bat- 1.1.3. Features modeling
tery packs considering monthly average temperature and mileage [12]. Health features (HFs) extraction is a fundamental process in the
These diverse SOH definitions extend beyond the conventional capacity data-driven method when estimating battery SOH. However, the in-
and impedance-based models, encompassing attributes like mileage and consistency of battery cells in packs presents significant challenges
energy. However, implementing these models often requires complex and complexities as the varying aging progress of individual battery
and extensive data, which may not be readily available in real-world cells complicates SOH estimation. Current research categorizes HFs
scenarios. To address this gap and enable real-world SOH estimation extraction into two main methods: mechanism-based and machine
based on limited charging data, a novel attenuation SOH definition is learning-based. Mechanism-based approaches rely on understanding
proposed to capture changes in battery characteristics related to both battery behavior and extracting HFs based on electrochemical princi-
capacity and internal resistance. ples. One popular method is incremental capacity analysis (ICA) [28,
29]. Liu et al. introduced a comprehensive comparison scheme of ICA
1.1.2. SOH estimation for real-world pack data based on total voltage [30]. Yao et al. further
The estimation of batteries SOH can be generally divided into the utilize the increment features from ICA to train the deep learning model
model-parameter method and the data-driven method. The concept based on the partial segment of charging data [31]. Additionally, some
of the model-parameter method involves integrating a representative mechanism-based approaches involve formulating algorithms based on
model with a parameter estimation algorithm to gauge battery SOH. battery characteristics. Fan et al. proposed a simplified algorithm that
Typically, the representative model is a physical model that necessitates suitable for modeling series parallel-connected battery pack simulation
prior knowledge of the battery’s physical aspects. Examples of such with non-uniform parameters [32]. Tian et al. introduce a consis-
models include the equivalent circuit model, empirical model, and elec- tency evaluation for LIB cells based on multi-feature weighted [33].
trochemical model [13–15]. Generally, model parameter optimization Chang et al. create a consistency model with a Copula-based con-
is a strong approach which able to improve the model performance sistency method to model the inconsistency coupling of battery cells
intuitively and automatically by parameter optimization algorithms. in pack [34]. However, Chang’s assumption of a normal distribution
However, model parameter optimization also has several weaknesses for battery cells is challenged by transitions from normal to Weibull
such as local optimum, model over-fitting, and computational cost. distributions [2].
Data-driven methods are popular in SOH models due to their ad- Machine learning-based approaches do not require intricate electro-
vantages of flexibility and being model-free [16]. Data-driven methods chemical knowledge but tend to be computationally intensive. Jorge
for SOH estimations usually utilize the collected data and investigate introduced a wrapper for window exogenous LSTM to extract features
hidden useful information by various data analysis methods such as from current, voltage, and temperature curves [35]. Xiong introduced
support vector machine [17], gaussian process regression [18], artifi- gaussian process regression for feature extraction on the IC curve and
cial neural network (ANN) [19,20]. Xiang et al. introduce a two-level constructed an SOH model based on LSTM [36]. Guo et al. extract
battery health diagnosis model using relaxation voltage and present HFs from charging profiles using principal component analysis (PCA)
a novel gaussian mixture ensemble learning method in SOH estima- based on grey relational analysis (GRA), one advantage of GRA is its
tion [21]. Change et al. proposed a gaussian process regression for ability to capture non-linear correlation [37]. Yang et al. model the
parameter prediction in LIB pack to overcome the inconsistency in battery pack inconsistency by proposing a hierarchical framework for
battery cells [22]. Song et al. apply a feed-forward neural network and capacity estimation, which generalized the matrix for battery pack into
genetic algorithm for two real-world data of EV with a big data platform pack-level and cell-level [38]. While this method effectively models and
for SOH estimation [7]. LSTM is also one of the popular RNN-based extracts correlated features for capacity estimation, its static evaluation
methods in SOH estimation with its strong generalization capabilities based on average and standard deviation might not fully capture the dy-
and training stability [23]. Yayan et al. proposed a stacked Bi-LSTM namic nature of battery behavior in real-world scenarios. Additionally,
model structure in batteries SOH prediction which enables a robust regardless of the above mechanism-based or machine learning-based
training phase [24]. Zou et al. combine the convolutional network and method, these modeling methods do not alleviate the computational
informer network to extract spatial and temporal features from input burden of extracting HFs for large battery packs, resulting in high
data, then implement this to the state of charge (SOC) estimation [25]. computational costs.
However, the conventional RNN-based approach struggled with lengthy
input and output sequences and lacked flexibility in handling variable 1.2. Gap analysis and contributions
input sequence lengths. To resolve the current weakness of the RNN-
based approach, the attention model transformer stands out as an Researchers still encounter two major challenges in estimating SOH
intriguing data-driven network. for real-world LIB packs. First of all, the lack of comprehensive and
A Transformer network is an attention model that replaces the standard data for real-world EVs presents difficulties in formulating
recurrent layers in encoder–decoder architectures with multi-headed the LIB pack SOH. Recent research tried to extend the real-world LIB
self-attention [26]. The innovation of the transformer network resolves SOH definition based on mileage and temperature [11,12], but imple-
the issue of handling scalable input sequence length and long input and menting these models often requires complex and extensive data that
output length. Gu et al. proposed a CNN-Transformer SOH model for may not be readily available in practical situations. Moreover, current
2
Y.-Y. Soo et al. Energy 308 (2024) 132856
research predominantly focuses on SOH based on capacity without 4. A new data-driven hybrid model named CNN-Transformer is
assessing or evaluating the impact of internal resistance on battery proposed to achieve the SOH estimation model for the LIB
SOH. Secondly, modeling and feature extraction for LIB cells in packs pack. Unlike the traditional RNN-based methods, this research
is complicated and computationally expensive. Although researchers applied a new Transformer network combined with CNN feature
have tried to model and extract features from battery cells [30,38], extraction for real-world LIB pack capacity estimation. The pro-
these methods were either computationally expensive or only tested in posed model is evaluated and compared with the latest informer
experimental conditional. Conversely, effective dimensional reduction network and other ANN models such as LSTM, Bi-LSTM, and
techniques such as PCA have been employed in extracting features for GRU. The proposed model proved to show better results than
LIB single cell [37]. The PCA method is useful in modeling complex and the other ANN model. The evaluation metrics are mean absolute
huge data which is specifically suitable for LIB cell-level data. While error (MAE), mean squared error (MSE), and root mean square
PCA has been utilized in numerous studies to model the SOH of LIB, its error (RMSE). The results of the research prove that the proposed
application to LIB packs and modeling at the cellular level within the CNN-Transformer can deal with long-term dependencies.
pack has yet to be explored to the best of our knowledge. This paper
introduces a novel approach by incorporating PCA specifically for LIB 1.2.1. Paper organization
cell levels, rather than considering all features of LIB cells. It proposes a
The remainder of the paper is organized as follows. Section 2 intro-
hierarchical feature extraction model that categorizes LIB pack features
duces the battery datasets used in this paper, including the attenuation
into cell-level and pack-level components.
SOH model proposed in Section 2.2. Section 3 illustrates the principle
In terms of SOH model prediction, transformer networks have been
of the proposed method in detail, including modeling the inconsistency
utilized in various research. However, a persistent challenge lies in
between battery cells in the pack in Section 3.1 and the SOH estimation
the limitations of their attention mechanism to effectively capture
model in Section 3.2. Experimental verification is carried out and
spatial and hierarchical relationships. Researchers have sought to ad-
discussed in Section 4. The conclusions of the paper are summarized
dress this by integrating CNN into transformer architectures, aiming
to overcome these limitations [27]. However, the practical application in Section 5.
of this CNN-Transformer fusion remains untested. This study pioneers
the application of the CNN-Transformer network to real-world LIB 2. Battery pack dataset and SOH estimation
pack data. Moreover, a comparative analysis of the performance of
the CNN-Transformer network against the latest informers network is 2.1. Dataset description
conducted. Additionally, this research also investigates the impact of
CNN enhancement and the performance of informers networks. The data employed in this research was collected from the battery
To address the aforementioned research gaps in real-world battery module of a fully electric bus line 18 in Hefei City between 2012 and
pack SOH estimation, this study focuses on developing a novel attenua- 2013. This electric vehicle represents the world’s first new energy bus
tion SOH definition based on energy and models battery cell parameters route, which began on January 23, 2010. The battery module comprises
while significantly reducing computational costs. Additionally, a modi- 608 cells interconnected in parallel and series setups. Each cell is a
fied transformer network is proposed to overcome the limitations of the LiFePO4 battery with a 10Ah capacity and a rated voltage of 3.2 V.
attention mechanism and demonstrate effectiveness in real-world SOH The battery module is organized into 152 series-connected groups, each
estimation. The key contributions of this work are outlined as follows: consisting of 4 cells in parallel.
The battery management system (BMS) monitors and records the
1. An attenuation SOH with energy delivered during charging is module’s current and voltage onto a secure digital memory card. The
proposed. This model reveals the impact of increased internal BMS utilizes a distributed architecture with a Controller Area Network
resistance in lithium-ion battery packs SOH and only requires (CAN) network consisting of one master and seven slaves. Each cluster
minimal charging profile information, such as charged energy, incorporates two battery management chips (LTC 6802) that gather
state of charge (SOC), and time. The proposed method suc- data on 24 battery voltages and 4 battery temperature nodes. The LEM
cessfully retrieves and models the battery SOH during random
CAB300 sensor measures the bus current and transmits it to the master
charging profile data for random timestamp and SOC.
via the CAN network. Recorded data includes timestamps, total charged
2. A method for hierarchical feature extraction from both cell-
energy, total voltage, total current, pack voltage, state of charge (SOC),
level and pack-level data within a large interconnected LIB
maximum/minimum voltage, and other parameters. Data is sampled
pack is introduced. This method partitions the LIB pack data
every second, collected from December to August. The electric vehicle
into pack-level and cell-level segments. The proposed approach
operates in four states during operation: acceleration, braking, parking,
offers a streamlined and computationally efficient means of
and constant speed driving, resulting in three battery pack operation
extracting information from both levels through a combina-
modes: discharging, charging, and standby. Braking generates back
tion of data aggregation and machine learning techniques. The
electromotive force directed to the battery pack due to the presence of
health features (HFs) extracted from both pack-level and cell-
level data are assessed using grey relational analysis, demon- an energy feedback device in the EV. The dataset utilized is presented
strating their correlation with battery SOH. This method ef- in Table 1.
fectively integrates information from both pack and cell levels
into the SOH model, achieving high information retrieval with 2.2. Attenuation SOH model
minimal computational cost.
3. The Principal component analysis (PCA) is implemented to This paper introduces a novel definition of SOH attenuation, which
model the inconsistent coupling among individual LIB cells relies on the energy delivered during charging. Unlike the conven-
within a pack using real-world data. Utilizing a clustering ma- tional SOH evaluation method involving internal resistance, which
chine learning technique, PCA was applied for feature extraction often struggles with determining the end of life (EoL), the proposed
at the cell level, effectively condensing the vast dataset of attenuation SOH model only requires minimal charging profile in-
individual LIB cells. This approach effectively captured infor- formation such as total charged energy, timestamp, and SOH. This
mation from a substantial amount of cell-level data, resulting model effectively reveals the influence of internal resistance on SOH
in a remarkable 96% reduction in data size, thus enhancing estimation. First of all, the charging data of the EV is being extracted
computational efficiency for the SOH estimation model. The pro- from the dataset based on its current. The current and voltage data
posed PCA method also compares with several LIB cell modeling extracted from the charging records of 152 cells is illustrated in Fig. 1.
methods such as 1D-CNN and data aggregation. The charging pattern observed in the BMS data demonstrates constant
3
Y.-Y. Soo et al. Energy 308 (2024) 132856
Table 1
BMS data.
Time Total charged Total voltage Current/A SOC V1/mV ... V152/mV
energy/Wh /mV
17:46:16 172 564.8 −98.1 70.8 3358 ... 3372
17:47:15 181 566 −98.2 71.2 3366 ... 3379
17:48:14 190 566.6 −98.2 71.6 3399 ... 3382
17:50:11 208 567.5 −98.3 72.4 3405 ... 3388
17:51:10 218 567.9 −98.3 72.8 3408 ... 3390
17:52:08 227 568.3 −98.3 73.2 3409 ... 3393
... ... ... ... ... ... ... ...
charging attributes, yet it encounters frequent current fluctuations, as At the onset of the dataset, the electric vehicle (EV) is active during
illustrated in Fig. 1(a). These recurrent fluctuations in current result in the winter season, characterized by temperatures ranging from −10 ◦ C
continuous total voltage variations across the 152 individual cells, as to 10 ◦ C. The initial operation of the EV during winter prompts a
illustrated in Fig. 1(b). The battery pack state of health can be defined sudden spike in SOH degradation. However, with the subsequent rise
in formula (1). and stabilization of temperatures between approximately 30 ◦ C and
50 ◦ C, the degradation of the LIB pack SOH becomes consistent and
𝐸𝑐 declines steadily. In short, the depicted SOH degradation patterns in
𝑆𝑂𝐻 = (1)
𝐸0 Fig. 2 illustrate the significant influence of environmental temperature.
where SOH is the battery pack state of health, 𝐸𝑐 is the charging energy
delivered in cycle c, and 𝐸0 is the maximum charging energy delivered 3. Methodology
when the battery is fresh. The charging energy delivered is calculated
as below formula: A data-driven approach known as PCA-CNN-Transformer is pro-
𝐸𝑠𝑜𝑐𝑒𝑛𝑑 − 𝐸𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 𝛥𝐸 posed to model the LIB pack cell and pack level data. The proposed
𝐸𝑐 = 𝑡 = 𝑡 (2) method is designed for continuous-time systems and is tailored to
𝑠𝑜𝑐𝑒𝑛𝑑 𝑠𝑜𝑐
∫𝑡 (𝑠𝑜𝑐𝑒𝑛𝑑 − 𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 )𝑑𝑡 ∫𝑡 𝑒𝑛𝑑 𝛥𝑆𝑂𝐶𝑑𝑡
𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 characterize the nonlinear degradation of SOH in real-world LIB pack
where 𝐸𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 and 𝐸𝑠𝑜𝑐𝑒𝑛𝑑 is the starting record and ending record of the datasets. The research flow is illustrated in Fig. 3. Initially, the charging
total charged energy, the difference of these two value is represented data obtained from the BMS undergo extraction and analysis to assess
as 𝛥𝐸 which is the total energy charged in cycle c. 𝑆𝑂𝐶𝑠𝑡𝑎𝑟𝑡 is the SOC current and voltage. Subsequently, the SOH of the LIB pack is computed
record of BMS initially and 𝑆𝑂𝐶𝑒𝑛𝑑 is the ending SOC after charged. 𝑡 using the attenuation SOH method outlined in Section 3.2. Following
is the charging time for cycle 𝑐 which calculated from 𝑡𝑠𝑜𝑐𝑠𝑡𝑎𝑟𝑡 and 𝑡𝑠𝑜𝑐𝑒𝑛𝑑 . this features from both the pack-level and cell-level are extracted using
The calculated SOH is next smooth by using the Kalman filter. the hierarchical feature extraction method discussed in Section 3.1.
4
Y.-Y. Soo et al. Energy 308 (2024) 132856
These extracted features are then input into the SOH estimation model Table 2
Features extracted from pack-level.
as discussed in Section 3.2.
Features Aggregation function Feature extracted
3.1. Modeling inconsistency between battery cells in pack Maximum Temperature Maximum Maximum Temperature
Minimum Temperature Minimum Minimum Temperature
Total Voltage Mean Average Total Voltage
Throughout the cyclic aging process, the degradation of SOH is Current Mean Average Current
influenced not only by pack-level attributes but also by cell-level char- Maximum Voltage Maximum Maximum Voltage
acteristics. However, extracting features from both pack-level and cell- Minimum Voltage Minimum Minimum Voltage
level data presents distinct challenges. Additionally, the large number
of battery cells within a pack significantly increases the computational
costs of the extraction process. To address these challenges, this paper
original variables, the objective is to transform the data into a lower-
proposes a hierarchical feature extraction method from both cell-level
dimensional space while maximizing its variance. Initially, the original
and pack-level data as shown in Fig. 4.
data is standardized, followed by the computation of the covariance
matrix S:
3.1.1. Data aggregation for pack-level features extraction
1
Six pack-level features from BMS comprise the total voltage, max- 𝑆= 𝑋 ∗𝑇 𝑋 ∗ (6)
imum/minimum temperature, maximum/minimum voltage, and cur- 𝑛−1
rent. These attributes at the pack level are organized by cycle and where 𝑋 ∗ is the standardized data matrix. The eigenvectors and eigen-
aggregated using functions such as maximum, minimum, and average. values of 𝑋 ∗ denoted respectively as 𝑢𝑖 and 𝜆𝑖 , and 𝑖 = 1, … , 𝑚 are
Assume that x is the attribute that undergoes aggregation function, obtained using:
𝑥𝑐 is the extracted feature based on cycle c and 𝑥𝑖𝑐 represents the
𝑆𝑢𝑖 = 𝜆𝑖 𝑢𝑖 (7)
𝑖th observation of the feature x for cycle c. The aggregate function of
maximum, minimum, and mean is stated as below formula (3), (4) and where S is an n 𝑥 n matrix, 𝑢𝑖 is an n 𝑥 1 vector, and 𝜆𝑖 is a scalar. The
(5) respectively: principal component scores can be computed as:
5
Y.-Y. Soo et al. Energy 308 (2024) 132856
6
Y.-Y. Soo et al. Energy 308 (2024) 132856
7
Y.-Y. Soo et al. Energy 308 (2024) 132856
Table 4 Table 5
Network parameters and grid search range. Comparison of different modeling methods for LIB cell-level.
Network Parameter Range Modeling method GRC
1D-CNN Principal Component Analysis
batch size 16 (8, 32) PCA 1 0.8250
Filters 64 (32, 128) PCA 2 0.8146
Pool Size 2 (2, 8) PCA 3 0.7919
epochs 20 – PCA 4 0.7891
PCA 5 0.7610
Transformer
PCA 6 0.7573
Batch size 3 (2, 8)
Epochs 100 – 1D-CNN
Num heads 2 (1, 4) CNN Feature 0.6739
Embedding dim 16 (8, 32)
Data Aggregation
Ffn units 32 (16, 64)
Average V1 0.1023
Learning rate 0.001 (0.0005, 0.005)
Average V2 0.09251
Average V152 0.1054
where 𝑥̂𝑡 is the final prediction result, and 𝑊𝑝 , 𝐻 ℎ and 𝑏𝑝 is the weight,
input and bias respectively. data into a smaller size of data. PCA conducts cumulative explained
The attention mechanism processes the entire sequence equally with variance ratio analysis to ascertain the optimal number of components
Q, K, and V, but does not focus specifically on local variables. This required for a more comprehensive representation of the dataset. Con-
poses challenges in capturing local variables hierarchically and spatial versely, 1D-CNN encounters challenges in determining the output size
relationships for transformer networks. To resolve the constraints in- for extracted features, resulting in elevated computational costs when
herent in the attention mechanism of the transformer, the initial raw employing grid search to determine the number of components for
data will undergo 1D-CNN feature extraction before being fed into the output. As a result, PCA successfully models and incorporates informa-
transformer model. tion from individual cells within the LIB pack, even in the presence
A CNN consists of three main layers: input layers, hidden layers, of inconsistent coupling between them. Moreover, PCA significantly
and output layers. The input layer facilitates the transfer of original reduces computational costs by decreasing the data input size for LIB
data to the first hidden layer. The output layer generates the desired cell-level by 96%.
outputs for subsequent processing. The hidden layers consist of the fully
connected layer, the max-pooling layer, and the convolutional layer. 4.2. Enhancing battery SOH estimation models with 1D-CNN
The utilization of 1D-CNN in this context aims to address the limitations
of the attention mechanism in the transformer network. Consequently, The results of various ANN models with or without 1D-CNN are
a temporal feature will be extracted from the LIB pack dataset and fed shown in Table 6 and Fig. 8. The performance of all ANN models
into the transformer network. The parameters for both the 1D-CNN experienced notable improvements through the utilization of 1D-CNN
and transformer network are optimized using grid search, and their feature extraction. This enhancement arises from the capability of 1D-
respective parameters are detailed in Table 4. The data will be split into CNN to extract hierarchical and spatial relationships within the dataset,
80% train and 20% test data. The proposed method is being evaluated enhancing their predictive capabilities. Specifically, the performance of
LSTM increased by 15.77%, Bi-LSTM by 97.96%, GRU by 11.02%, and
with other ANN models with MAE, MAPE, RMSE, and 𝑅2 , which are
transformer by 10.95% in terms of r-square value. However, the perfor-
given as:
mance of CNN-Informers fails to improve and instead declines after the
integration of the 1D-CNN method. This decline can be attributed to the
1∑
𝑛
MAE = |𝑦 − 𝑦̂𝑖 | (19) poor performance of the informer network within this dataset. Both the
𝑛 𝑖=1 𝑖 informer network and CNN-Informers demonstrate poor performance in
this dataset, which will be elaborated in Section 4.3.
100 ∑ || 𝑦𝑖 − 𝑦̂𝑖 ||
𝑛
MAPE = (20) It is noteworthy that the proposed method applies 1D-CNN to
𝑛 𝑖=1 || 𝑦𝑖 ||
extract spatial relationships from the dataset and feeds them into the
√
√ 𝑛 encoder of ANN models as additional feature inputs. This approach dif-
√1 ∑
RMSE = √ (𝑦 − 𝑦̂𝑖 )2 (21) fers from that of other researchers [25], who applied multi-layer CNN
𝑛 𝑖=1 𝑖
extraction and fed it into the position encoding layer of the Informer
∑𝑛
(𝑦𝑖 − 𝑦̂𝑖 )2 network. The proposed method in this study serves as a simplified
𝑅2 = 1 − ∑𝑖=1
𝑛 (22) version of CNN enhancement for prediction models, which is easy
̄2
𝑖=1 (𝑦𝑖 − 𝑦)
to implement and can be broken down into several processes during
where 𝑦𝑖 is the actual values, 𝑦̂𝑖 are the predicted values. 𝑦̄ is the mean real-world applications. Furthermore, the feature extraction using 1D-
of 𝑦𝑖 and n is the number of testing samples. CNN requires less computational cost compared to multi-layer CNN
approaches.
4. Result and discussion Among all the ANN models, CNN-Transformer emerged as the most
effective SOH estimation model when applied to real-world datasets
4.1. Comparison of modeling method for LIB cell-level as illustrated in Fig. 9. This is attributed to the robust capabilities
of transformers, which effectively handle long-term dependencies, ac-
The proposed approach employs PCA as a machine-learning tech- commodate varying input sequence lengths, and excel in modeling
nique to model and extract valuable features from individual cells. the degradation of real-world lithium-ion battery packs. Furthermore,
Additionally, this study compares the features obtained through 1D- the fusion of CNN and Transformer overcomes the limitations of the
CNN and data aggregation methods at the cell level, as illustrated in original transformer’s attention mechanism, thereby enhancing model
Table 5 and Fig. 7. PCA demonstrated superior performance over other performance. The proposed 1D-CNN feature extraction technique ef-
machine learning feature extraction methods in capturing highly corre- ficiently captures pertinent features from both the overall LIB pack
lated features from individual cells. This superiority can be attributed and individual cell levels, facilitating their integration into the SOH
to the strength of PCA in dimensional reduction for huge volumes of estimation model.
8
Y.-Y. Soo et al. Energy 308 (2024) 132856
Table 6 beyond time-series forecasting. Its modular design and effective self-
SOH estimation result.
attention mechanism make it highly adaptable and suitable for a wide
MAE MSE RMSE R-Square range of applications, including capturing the non-linearity of LIB pack
LSTM 0.0659 0.0084 0.0915 0.7936 SOH degradation models.
Bi-LSTM 0.1281 0.0299 0.1728 0.2632
Besides, the specialized design of the informer for long-sequence
GRU 0.0614 0.0070 0.0839 0.8262
Transformer 0.0587 0.0065 0.0804 0.8406
forecasting tasks limits its applicability to other types of tasks or
Informers 0.5522 0.4541 0.6738 −9.9039 datasets where long-range dependencies are not a significant factor. In
CNN-LSTM 0.0445 0.0033 0.0574 0.9187 the LIB dataset used in this study, the data sequences were reduced
CNN-Bi-LSTM 0.1280 0.0194 0.1393 0.5209 significantly after modeling and aggregating the cell-level and pack-
CNN-GRU 0.0470 0.0034 0.0579 0.9172
level features by cycle. This resulted in smaller data sequences and
CNN-Transformer 0.0406 0.0027 0.0523 0.9327
CNN-Informers 0.6548 0.5181 0.7198 −12.7955 lengths, where long-range dependencies were not significant for the
SOH model. Consequently, applying the informer failed to capture the
SOH degradation trend effectively. On the contrary, the transformer
network, with its greater versatility and effective self-attention mecha-
4.3. Comparison between CNN-transformers and informers network nism, enables it to capture both local and global dependencies in a more
scalable manner. The enhancement of 1D-CNN further improved its
performance by addressing the limitations of the attention mechanism,
As demonstrated in Table 6 and Fig. 9, the Informer network failed
as discussed in Section 4.2.
to capture the degradation trend of the SOH dataset and exhibited
poor performance in this study. This can be attributed to two primary 5. Conclusions
factors: the limited versatility of the informer network and the nature of
the LIB SOH data. In contrast, the transformer network offers greater In conclusion, modeling lithium-ion battery (LIB) cells within packs
versatility and has been widely adopted and applied to various tasks and estimating State-of-Health (SOH) for real-world LIB packs presents
9
Y.-Y. Soo et al. Energy 308 (2024) 132856
ongoing challenges for researchers. Many researchers have imple- The proposed PCA-CNN-Transformer method works well in real-
mented various models for LIB pack SOH estimation in experimental world SOH model estimation. However, this work required a significant
environments, but the performance of these LIB models in real-world amount of high-quality and accurate data. Besides, the effect of battery
scenarios remains untested. This paper aims to model the real-world charging behavior such as depth of discharge and charging speed, have
LIB pack in terms of cell level and pack level, then predict the SOH not been accounted in the SOH model. Other than that, integrating the
model by considering real-world factors such as temperature. This charging behavior of the LIB pack into SOH model predictions should
paper proposed a novel definition of SOH attenuation based on energy be considered.
delivered which only required a minimal charging profile. Besides,
this paper proposed a PCA-CNN-Transformer model for real-world SOH CRediT authorship contribution statement
model estimation. This paper introduces a novel approach to address
the inconsistency among battery cells within packs by implementing Yin-Yi Soo: Writing – original draft, Visualization, Validation, Soft-
hierarchical feature extraction based on both LIB cell-level and pack- ware, Methodology, Investigation, Formal analysis, Data curation. Yu-
level data. Principal Component Analysis (PCA) is utilized at the battery jie Wang: Writing – review & editing, Supervision, Project adminis-
cell level to model and extract relevant features. Following feature
tration, Methodology, Investigation, Funding acquisition, Conceptual-
extraction, the battery cell-level dataset undergoes efficient reduction,
ization. Haoxiang Xiang: Writing – review & editing, Methodology,
achieving a remarkable 96% reduction from 152 cell attributes to 6
Investigation. Zonghai Chen: Writing – review & editing, Supervision.
meaningful PCA features. All extracted features undergo correlation
analysis with the GRC and demonstrate a strong correlation with
Declaration of competing interest
battery SOH, with GRC values exceeding 0.70. CNN-Transformer is pro-
posed as the SOH estimation model for real-world datasets. The fusion
of CNN and Transformer overcomes limitations of the original trans- The authors declare that they have no known competing finan-
former’s attention mechanism, achieve a MAE with 0.0406 and r-square cial interests or personal relationships that could have appeared to
of 0.937, further improving the transformer model performance by influence the work reported in this paper.
10.95%. The suggested PCA-CNN-Transformer model offers advantages
for both government and business sectors, enhancing public services Data availability
and policy-making. This improvement can drive progress across diverse
fields, including public services, policy-making, and competitiveness. Data will be made available on request.
10
Y.-Y. Soo et al. Energy 308 (2024) 132856
Acknowledgments [20] Lu J, Xiong R, Tian J, Wang C, Sun F. Deep learning to estimate lithium-
ion battery state of health without additional degradation experiments. Nature
Commun 2023;14(1):2760.
This work was supported by the National Natural Science Founda-
[21] Xiang H, Wang Y, Zhang X, Chen Z. Two-level battery health diagnosis using
tion of China (Grant No. 62373340). encoder-decoder framework and Gaussian mixture ensemble learning based on
relaxation voltage. IEEE Trans Transp Electrif 2023.
References [22] Chang C, Wu Y, Jiang J, Jiang Y, Tian A, Li T, et al. Prognostics of the state
of health for lithium-ion battery packs in energy storage applications. Energy
[1] Yang Z, Huang H, Lin F. Sustainable electric vehicle batteries for a sus- 2022;239:122189.
tainable world: Perspectives on battery cathodes, environment, supply chain, [23] Li K, Wang Y, Chen Z. A comparative study of battery state-of-health estimation
manufacturing, life cycle, and policy. Adv Energy Mater 2022;12(26):2200383. based on empirical mode decomposition and neural network. J Energy Storage
[2] Schuster SF, Brand MJ, Berg P, Gleissenberger M, Jossen A. Lithium-ion cell- 2022;54:105333.
to-cell variation during battery electric vehicle operation. J Power Sources [24] Yayan U, Arslan AT, Yucel H. A novel method for SoH prediction of bat-
2015;297:242–51. teries based on stacked LSTM with quick charge data. Appl Artif Intell
[3] Edge JS, O’Kane S, Prosser R, Kirkaldy ND, Patel AN, Hales A, et al. Lithium 2021;35(6):421–39.
ion battery degradation: What you need to know. Phys Chem Chem Phys [25] Zou R, Duan Y, Wang Y, Pang J, Liu F, Sheikh SR. A novel convolutional informer
2021;23(14):8200–21. network for deterministic and probabilistic state-of-charge estimation of lithium-
[4] Feng F, Hu X, Hu L, Hu F, Li Y, Zhang L. Propagation mechanisms and diagnosis ion batteries. J Energy Storage 2023;57:106298. http://dx.doi.org/10.1016/j.est.
of parameter inconsistency within Li-ion battery packs. Renew Sustain Energy 2022.106298.
Rev 2019;112:102–13. [26] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.
[5] Naguib M, Kollmeyer P, Emadi A. Lithium-ion battery pack robust state of Attention is all you need. In: Advances in neural information processing systems,
charge estimation, cell inconsistency, and balancing: Review. IEEE Access vol. 30, 2017.
2021;9:50570–82. http://dx.doi.org/10.1109/ACCESS.2021.3068776. [27] Gu X, See KW, Li P, Shan K, Wang Y, Zhao L, et al. A novel state-of-health
[6] Klein M, Tong S, Park J. In-plane nonuniform temperature effects on estimation for the lithium-ion battery using a convolutional neural network and
the performance of a large-format lithium-ion pouch cell. Appl Energy transformer model. Energy 2023;262:125501.
2016;165:639–47. [28] He J, Wei Z, Bian X, Yan F. State-of-health estimation of lithium-ion batteries
[7] Song L, Zhang K, Liang T, Han X, Zhang Y. Intelligent state of health estimation using incremental capacity analysis based on voltage–capacity model. IEEE
for lithium-ion battery pack based on big data analysis. J Energy Storage Trans Transp Electrif 2020;6(2):417–26. http://dx.doi.org/10.1109/TTE.2020.
2020;32:101836. http://dx.doi.org/10.1016/j.est.2020.101836. 2994543.
[8] She C, Zhang L, Wang Z, Sun F, Liu P, Song C. Battery state-of-health [29] Zheng L, Zhu J, Lu DD-C, Wang G, He T. Incremental capacity analysis and
estimation based on incremental capacity analysis method: Synthesizing from differential voltage analysis based state of charge and capacity estimation
cell-level test to real-world application. IEEE J Emerg Sel Top Power Electron for lithium-ion batteries. Energy 2018;150:759–69. http://dx.doi.org/10.1016/
2023;11(1):214–23. j.energy.2018.03.023.
[9] Chen L, Lü Z, Lin W, Li J, Pan H. A new state-of-health estimation method for [30] Liu P, Wu Y, She C, Wang Z, Zhang Z. Comparative study of incremental
lithium-ion batteries through the intrinsic relationship between ohmic internal capacity curve determination methods for lithium-ion batteries considering the
resistance and capacity. Measurement 2018;116:586–95. http://dx.doi.org/10. real-world situation. IEEE Trans Power Electron 2022;37(10):12563–76. http:
1016/j.measurement.2017.11.016. //dx.doi.org/10.1109/TPEL.2022.3173464.
[10] Zhang X, Wang Y, Liu C, Chen Z. A novel approach of battery pack state of [31] Yao J, Han T. Data-driven lithium-ion batteries capacity estimation based on
health estimation using artificial intelligence optimization algorithm. J Power deep transfer learning using partial segment of charging/discharging data. Energy
Sourc 2018;376:191–9. 2023;271:127033.
[11] Hong J, Wang Z, Chen W, Wang L, Lin P, Qu C. Online accurate state of [32] Fan X, Zhang W, Wang Z, An F, Li H, Jiang J. Simplified battery pack modeling
health estimation for battery systems on real-world electric vehicles with variable considering inconsistency and evolution of current distribution. IEEE Trans Intell
driving conditions considered. J Clean Prod 2021;294:125814. Transp Syst 2021;22(1):630–9. http://dx.doi.org/10.1109/TITS.2020.3010567.
[12] Tian J, Liu X, Li S, Wei Z, Zhang X, Xiao G, et al. Lithium-ion battery health [33] Tian J, Wang Y, Liu C, Chen Z. Consistency evaluation and cluster analysis
estimation with real-world data for electric vehicles. Energy 2023;270:126855. for lithium-ion battery pack in electric vehicles. Energy 2020;194:116944. http:
[13] Kim T, Kim H, Ha J, Kim K, Youn J, Jung J, et al. A degenerated equivalent //dx.doi.org/10.1016/j.energy.2020.116944.
circuit model and hybrid prediction for state-of-health (SOH) of PEM fuel cell. [34] Chang C, Wu Y, Jiang J, Jiang Y, Tian A, Li T, et al. Prognostics of the state
In: 2014 international conference on prognostics and health management. IEEE; of health for lithium-ion battery packs in energy storage applications. Energy
2014, p. 1–7. 2022;239:122189. http://dx.doi.org/10.1016/j.energy.2021.122189.
[14] Marcicki J, Canova M, Conlisk AT, Rizzoni G. Design and parametrization [35] Jorge I, Mesbahi T, Samet A, Boné R. Time series feature extraction for lithium-
analysis of a reduced-order electrochemical model of graphite/LiFePO4 cells for ion batteries state-of-health prediction. J Energy Storage 2023;59:106436. http:
SOC/SOH estimation. J Power Sources 2013;237:310–24. //dx.doi.org/10.1016/j.est.2022.106436.
[15] Xu Z, Wang J, Lund PD, Zhang Y. Co-estimating the state of charge and health [36] Xiong X, Wang Y, Li K, Chen Z. State of health estimation for lithium-ion
of lithium batteries through combining a minimalist electrochemical model and batteries using Gaussian process regression-based data reconstruction method
an equivalent circuit model. Energy 2022;240:122815. during random charging process. J Energy Storage 2023;72:108390. http://dx.
[16] Hu X, Jiang J, Cao D, Egardt B. Battery health prognosis for electric vehicles doi.org/10.1016/j.est.2023.108390.
using sample entropy and sparse Bayesian predictive modeling. IEEE Trans Ind [37] Guo P, Cheng Z, Yang L. A data-driven remaining capacity estimation approach
Electron 2015;63(4):2645–56. for lithium-ion batteries based on charging health feature extraction. J Power
[17] Pang X, Zhao Z, Wen J, Jia J, Shi Y, Zeng J, et al. An interval prediction approach Sources 2019;412:442–50. http://dx.doi.org/10.1016/j.jpowsour.2018.11.072.
based on fuzzy information granulation and linguistic description for remaining [38] Yang S, Zhang C, Chen H, Wang J, Chen D, Zhang L, et al. A hierarchical
useful life of lithium-ion batteries. J Power Sources 2022;542:231750. enhanced data-driven battery pack capacity estimation framework for real-world
[18] Kong J-z, Yang F, Zhang X, Pan E, Peng Z, Wang D. Voltage-temperature health operating conditions with fewer labeled data. J Energy Chem 2024;91:417–32.
feature extraction to improve prognostics and health management of lithium-ion [39] Tritham C, Lekawat L, Arrayangkool A, Viwatwongkasem C, Satitvipawee P,
batteries. Energy 2021;223:120114. Soontornpipit P. A comparison between correlation and grey relational for
[19] Tian Y, Dong Q, Tian J, Li X, Li G, Mehran K. Capacity estimation of lithium- big data and analytics. In: 2018 international electrical engineering congress.
ion batteries based on optimized charging voltage section and virtual sample IEECON, IEEE; 2018, p. 1–4.
generation. Appl Energy 2023;332:120516. [40] Liu H, Mi X, Li Y. Smart deep learning based wind speed prediction model using
wavelet packet decomposition, convolutional neural network and convolutional
long short term memory network. Energy Convers Manage 2018;166:120–31.
11