Exploratory Data Analysis For Electric Vehicle Driving Range Prediction: Insights and Evaluation
Exploratory Data Analysis For Electric Vehicle Driving Range Prediction: Insights and Evaluation
Debani Prasad Mishra1, Prince Kumar1, Priyanka Rai1, Ayush Kumar1, Surender Reddy Salkuti2
1Department of Electrical Engineering, International Institute of Information Technology Bhubaneswar, Odisha, India
2Department of Railroad and Electrical Engineering, Woosong University, Daejeon, Republic of Korea
Corresponding Author:
Surender Reddy Salkuti
Department of Railroad and Electrical Engineering, Woosong University
17-2, Jayang-Dong, Dong-Gu, Daejeon, Republic of Korea
Email: [email protected]
1. INTRODUCTION
Over the years, the adoption of battery electric vehicles (BEVs) has been growing, but a major
hindrance to their promotion and usage is the issue of inaccurate display of residual power. This problem
contributes to range anxiety among drivers, caused by uncertainties in battery performance and other factors.
The goal of this study is to tackle this problem by creating a model that can precisely predict the driving
range of BEVs.This study introduces advanced machine learning (ML) techniques for accurately estimating
the mileage of electric vehicles (EVs) by considering both internal and external factors. These factors include
the use of heating, average speed, air conditioning, energy consumption, and route type. With better battery
technology and the demand for minimal or zero-emission vehicles, EVs are a strong contender to take the
place of combustion engine-powered engines. Despite these vehicles' advantages, the general public has not
given them much popularity. Due to the limited infrastructure for charging and consequently shorter driving
range, BEV drivers may have range anxiety, or worry that the battery capacity may deplete before reaching
their destination [1]. To minimize range anxiety and increase the usability of EVs, applications are needed
that help drivers reach their destinations safely without wasting a great deal of time or money. These
applications' primary goals are to improve and accurately predict an efficient driving range. Drivers often
save as much as twenty percent of the charge in their batteries as a precautionary measure [2], which has a
negative impact on how efficiently the battery uses energy.
The study of how to increase the capacity of batteries or driving range for EVs is based on the
driving habits of EV users. To be able to optimally utilize battery capacity, Li et al. [3] presented an
integrated distribution model that described the daily trip miles. The outcomes of the tests demonstrated the
way the mixed distribution model was capable of meeting various drivers' demands. Furthermore, Dong and
Lin [4] created the concept of BEV viability by employing a stochastic modelling approach to characterize
the behaviours of BEV drivers. In order to find ways to lessen range anxiety, the comfort levels of drivers
with various driving traits were examined. However, the researchers discovered that the factors are linked
even if the driving behaviour that distinguishes BEVs is stochastic. Brady and O’Mahony [5] used a
stochastic modelling approach after studying the dependency structure between the six variables using a
nonparametric copula function. The result was a daily trip itinerary and billing profile.
The most thorough approach to reducing air pollution is to deploy EVs. Governments are thus
promoting the purchase and usage of these vehicles in place of cars with internal combustion engines [6].
EV sales reportedly increased 72% globally in 2018 in contrast to 2017, and they saw a 2.1% rise in market
share [7]. The small market share of electric vehicles may seem odd given the benefits listed above and the
presence of large companies in the sector, but it is due to a number of factors, the most prominent of which is
their high purchase costs, prolonged charging times when compared to cars powered by fossil fuels, and their
limited range per charge [8]. For data-driven predictions, like those generated by ML systems, a large
training dataset is preferred [9]. A few papers have suggested data sharing between cars and the cloud so that
users might gain from the knowledge of other consumers, ultimately producing forecasts that are more
correct. By gathering data on BEVs' energy usage while navigating a road stretch, Grubwinkler et al. [10]
presented an energetic route map built through crowdsourcing. To collect data from the general public for the
forecasting of vehicle energy consumption, Tseng and Chau [11] used the participatory sensing approach.
Straub et al. [12] proposed an alternative approach to developing an energetic roadmap by collecting driving
profiles from the crowd and using machine learning techniques to fill in gaps in the information. This method
effectively removed any potential limitations in the coverage of data, resulting in a more accurate and reliable
energy roadmap.
In recent years, data-driven methods have become more widely used as an effective way of
estimating consumption and gauging driving range. The rationale is that when compared to more traditional
ways, they are more reliable and cost-effective, and this is because the internet of things innovations have
reduced the costs associated with deployment. To reduce the expenses associated with installing sensors and
transferring data from cars, a considerable amount of information is extracted from the vehicle's network and
transmitted to the cloud. This data may then be processed by machine learning algorithms to offer a variety
of helpful services [13]. One of the main problems with ML is the uneven distribution of the training dataset.
In general, machine learning models' ability to accurately predict outcomes on testing data would suffer if the
distributions of the training and testing sets are different.
2. REGRESSION MODELS
When predicting a target variable that is continuous based on a number of input variables, regression
models are often employed in the analysis of data and ML. In this work, a number of well-liked regression
models for estimating the motor range of electric cars using a variety of input characteristics are investigated.
To formulate the motor range of EVs [14], [15], some of the machine learning algorithms incorporated are -
linear regression, random forest, and deep multi-layer perceptron (MLP). The last two of which are
wholesome techniques. Linear regression algorithm is an ML method that aims to apply relationships to
illustrate the outcome of an event on the basis of data for the independent variables. The observed fitted line
is a straight line that closely approximates the individual data points [16]. The aim of the algorithm is to
reduce the mathematical disparity between the actual values provided by the manufacturer, and predicted
values, and it is given by (1).
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑛 𝑋𝑛 + 𝜀 (1)
Here, the dependent parameter Y stands in for the EV’s driving range. 𝑋1 , 𝑋2 , ..., 𝑋𝑛 are the
independent variables that affect driving range. 𝛽0 , 𝛽1 , 𝛽2 , ..., 𝛽𝑛 are the coefficients of the independent
variables. ε is the disturbance term or error variable in the data. The coefficients β0, β1, β2, ..., βn are
computed to reduce the total squared deviations between the actual and predicted values [17], [18]. MLP is a
neural network made up of several connected layers which change the input dimension into the desired
dimension. Neurons (or nodes) are conjoined to form a neural network in such a manner that some of the
Exploratory data analysis for electric vehicle driving range prediction … (Debani Prasad Mishra)
476 ISSN: 2252-8792
outputs are also feeded as their input. One node serves as an input, one node serves as an output, and there
may be any number of hidden layers, each with any number of nodes.
Deep MLP can capture complex correlations between elements like power, trip distance, energy
consumption, and driving style, leading to more accurate range estimates. By using its deep architecture and
non-linear activation functions, the method offers the potential to uncover odd patterns and correlations in the
data that may be difficult to capture with linear models like linear regression. Four hidden layers, each with
64 neurons, were used to apply the rectified linear unit (ReLU) activation function [19]. The benefit of ReLU
over other activation functions, such as the sigmoid or hyperbolic tangent, is that it enables the network to
learn more rapidly and avoids the saturation issue. The ReLU function is defined as max (0.0, x), where x
provides the input to the activation function. It returns the input value if it is positive, otherwise zero. The
research shows that mini-batches may be handled well during training thanks to the usage of the Adam
optimization algorithm with a 32-batch size. The size of a batch is a reflection of the quantity of samples used
in each iteration. The next algorithm which is used in this report for the comparative study of the driving
range of EVs is random forest (RF) [20], [21]. Random forest regression is a supervised learning approach
that uses collective learning, which integrates predictions from different machine learning models to enhance
the accuracy of predictions in regression situations. To create random forest regression, we imported the
random forest regressor class from the sklearn package, made an instance of it, and assigned it to a variable.
In this scenario, we put the n_estimators argument to 50, which indicates our random forest would consist of
50 trees. Using the fit() method, we train the model by modifying the weights depending on the data values to
boost accuracy. Once the training is complete, our model is ready to generate predictions based on the
learned patterns from the training data.
3.3.1. Univariate
During the univariate analysis, we evaluated individual variables in the dataset to determine their
distributions and features. As an example, we estimated the mean, median, and standard deviation of
Int J Appl Power Eng, Vol. 13, No. 2, June 2024: 474-482
Int J Appl Power Eng ISSN: 2252-8792 477
descriptive statistics for variables like average speed, quantity, and energy consumption rate [26]. We also
visualized the distributions using histograms, box plots, and density plots to identify any outliers or skewness
in the data as shown in Figure 1.
3.3.2. Bivariate
In the bivariate analysis, we concentrated on analyzing the interactions between pairs of variables to
find linkages and dependencies. For instance, we examined if certain parameters had an effect on the range of
electric cars by using scatter plots to visualize the link between trip distance and other characteristics. The
amount beyond 20 kWh is directly proportional to the travel distance. In Figure 2, quantity in the 0 to 20
range may not be able to calculate the journey distance by itself, thus we incorporated some other
characteristics to do so. In EVs with an energy consumption range of 10 to 20, the traveled distance is greater
When the air conditioner is running and the park heating is not turned on, the energy usage is greater.
3.3.3. Multivariate
We studied interactions between three or more variables in the multivariate analysis to comprehend
intricate patterns and relationships [27], [28]. For instance, we visualized the correlation matrix between
variables like trip distance with other variables and auxiliary loads using heat maps in Figure 3. As a result,
Exploratory data analysis for electric vehicle driving range prediction … (Debani Prasad Mishra)
478 ISSN: 2252-8792
we were able to pinpoint the factors that were strongly connected and may be significantly affecting the EV
driving range [29], [30]. It is noticed that very few outliers over 50 to 100 km/h in energy consumption
and the outliers in quantity are in the range of greater than or equal to 40. Few outliers are below 10 and over
80 km/h in average speed. Figure 4 depicts the EV range prediction function's flow pattern. We generated a
clean and relevant dataset for further research and model building by using these data pre-processing
approaches and conducting thorough exploratory data analysis.
Int J Appl Power Eng, Vol. 13, No. 2, June 2024: 474-482
Int J Appl Power Eng ISSN: 2252-8792 479
Here 𝑦𝑡𝑟𝑢𝑒 denotes the true value target variable and 𝑦𝑝𝑟𝑒𝑑 is the predicted value or the output. Lower the
MSE value, the closer is the predicted value to the actual result. The 𝑅2 score is the next evaluation criterion,
which measures how much of the target variable's fluctuation can be accounted for by the model's
characteristics [32]. It provides an indication of how well the model performs in explaining the variability of
the outcome variable and is formulated as (3) and (4).
∑(𝑦𝑡𝑟𝑢𝑒 −𝑦𝑝𝑟𝑒𝑑 )2
𝑅2 (𝑦𝑡𝑟𝑢𝑒 , 𝑦𝑝𝑟𝑒𝑑 ) = 1 − ∑(𝑦𝑡𝑟𝑢𝑒 −𝑦̅)2
(3)
1
𝑦̅ = ∑ 𝑦𝑡𝑟𝑢𝑒 (4)
𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠
The performance goes on increasing as this 𝑅2 score reaches 1. Table 1 displays the results of our
comparative analysis of multiple regression models for EV range prediction. In comparing the EV range's
actual and anticipated values, the graphs provide visual representations of the relationship between the two
variables. Figure 5 displays a scattered plot, showcasing the actual versus predicted values using linear
regression. Moving on to Figure 6, a line plot illustrates the actual versus predicted values using random
forest regression. Lastly, Figure 7 presents a scattered plot depicting the actual versus predicted values
obtained from Deep MLP. By analyzing the distribution and patterns of the data points around the reference
line, one can gain a better understanding of the performance and reliability of the regression models in
capturing the true EV range. These graphs offer insights into the accuracy and performance of the different
regression models in predicting EV range, inviting further exploration and analysis.
Figure 5. Linear regression’s actual vs predicted Figure 6. Random forest’s actual vs predicted
Exploratory data analysis for electric vehicle driving range prediction … (Debani Prasad Mishra)
480 ISSN: 2252-8792
It is evident that all of the models scored rather well for accuracy, with R-squared values ranging
from 0.84 to 0.93. The deep MLP model obtained the highest accuracy rating with an R-squared of 0.93.
With R-squared values of 0.90, the random forest model likewise scored well in terms of accuracy. Despite
being the simplest model, the linear regression model's accuracy score, which was 0.84, was comparatively
lower than other models.
5. CONCLUSION
A comparative analysis of the use of machine learning algorithms for predicting EVs driving range
is carried out in this paper. In order to achieve this, we examined a real-world dataset that included various
factors affecting the EV range. To enhance the quality of our data and facilitate model training, we
incorporated exploratory data analysis techniques during the data pre-processing phase. These methods
allowed us to successfully prepare the data and develop a thorough grasp of it. We then implemented and
assessed the performance of several regression models, which included linear regression, multilayer
perceptron (MLP), and random forest (RF). Finding the best machine learning strategy for precisely
forecasting the range of EVs was the main goal of this work. With the help of this study, we were able to
determine the strategy that provides the best predictive performance for estimating the EV driving range. Our
study yielded insightful results regarding the use of advanced models to forecast the mileage of EVs. We
evaluated the performance of different regression models, including linear regression, random forest, and
deep MLP on a real-world dataset consisting of various factors that affect EV range. Our findings indicated
that the deep MLP and random forest models outperformed the traditional linear regression algorithm, with
higher R2 scores and lower MAE and RMSE values. Future research could focus on incorporating additional
variables, such as battery health and charging infrastructure, traffic patterns, road slope, and driver behaviour
to further enhance the accuracy of EV range prediction models. Furthermore, XGBOOST and LightGBM
methods provide distinct opportunities for researchers and practitioners to develop precise, efficient, and
trustworthy data-driven approaches for EVs energy consumption studies.
ACKNOWLEDGEMENTS
This research work was supported by “Woosong University’s Academic Research Funding - 2023”.
REFERENCES
[1] M. Eisel, I. Nastjuk, and L. M. Kolbe, “Understanding the influence of in-vehicle information systems on range stress – Insights
from an electric vehicle field experiment,” Transportation Research Part F: Traffic Psychology and Behaviour, vol. 43, pp. 199–
211, 2016, doi: 10.1016/j.trf.2016.10.015.
[2] T. Franke, I. Neumann, F. Bühler, P. Cocron, and J. F. Krems, “Experiencing Range in an Electric Vehicle: Understanding
Psychological Barriers,” Applied Psychology, vol. 61, no. 3, pp. 368–391, 2012, doi: 10.1111/j.1464-0597.2011.00474.x.
Int J Appl Power Eng, Vol. 13, No. 2, June 2024: 474-482
Int J Appl Power Eng ISSN: 2252-8792 481
[3] Z. Li, S. Jiang, J. Dong, S. Wang, Z. Ming, and L. Li, “Battery capacity design for electric vehicles considering the diversity of
daily vehicles miles traveled,” Transportation Research Part C: Emerging Technologies, vol. 72, pp. 272–282, 2016, doi:
10.1016/j.trc.2016.10.001.
[4] J. Dong and Z. Lin, “Stochastic Modeling of Battery Electric Vehicle Driver Behavior,” Transportation Research Record:
Journal of the Transportation Research Board, vol. 2454, no. 1, pp. 61–67, 2014, doi: 10.3141/2454-08.
[5] J. Brady and M. O’Mahony, “Modelling charging profiles of electric vehicles based on real-world electric vehicle charging data,”
Sustainable Cities and Society, vol. 26, pp. 203–216, 2016, doi: 10.1016/j.scs.2016.06.014.
[6] Y. Huang, H. Wang, A. Khajepour, H. He, and J. Ji, “Model predictive control power management strategies for HEVs: A
review,” Journal of Power Sources, vol. 341, pp. 91–106, 2017, doi: 10.1016/j.jpowsour.2016.11.106.
[7] M. Bohlsen, “EV Company News For The Month Of January 2019,” Seeking Alpha, 2019, [Online]. Available:
https://seekingalpha.com/amp/article/4237727-ev-company-news-month-january-2019.
[8] J. Hong, S. Park, and N. Chang, “Accurate remaining range estimation for Electric vehicles,” Proceedings of the Asia and South
Pacific Design Automation Conference, ASP-DAC, pp. 781–786, 2016, doi: 10.1109/ASPDAC.2016.7428106.
[9] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. Hoboken, New Jersey, USA: John Wiley & Sons, 2012.
[10] S. Grubwinkler, T. Brunner, and M. Lienkamp, “Range prediction for EVs via crowd-sourcing,” 2014 IEEE Vehicle Power and
Propulsion Conference, VPPC 2014, pp. 1–6, 2014, doi: 10.1109/VPPC.2014.7007121.
[11] C.-M. Tseng and C.-K. Chau, “Personalized Prediction of Vehicle Energy Consumption Based on Participatory Sensing,” IEEE
Transactions on Intelligent Transportation Systems, vol. 18, no. 11, pp. 3103–3113, 2017, doi: 10.1109/TITS.2017.2672880.
[12] T. Straub, M. Nagy, M. Sidorov, L. Tonetto, M. Frey, and F. Gauterin, “Energetic map data imputation: A machine learning
approach,” Energies, vol. 13, no. 4, 2020, doi: 10.3390/en13040982.
[13] B. Zheng, P. He, L. Zhao, and H. Li, “A hybrid machine learning model for range estimation of electric vehicles,” 2016 IEEE
Global Communications Conference, GLOBECOM 2016 - Proceedings, pp. 1–6, 2016, doi: 10.1109/GLOCOM.2016.7841506.
[14] Spritmonitor.de, “Calculate and compare fuel consumption and car costs (in German: Spritverbrauch und Autokosten berechnen
und vergleichen),” Spritmonitor.de. https://spritmonitor.de/.
[15] L. Zhao, W. Yao, Y. Wang, and J. Hu, “Machine Learning-Based Method for Remaining Range Prediction of Electric Vehicles,”
IEEE Access, vol. 8, pp. 212423–212441, 2020, doi: 10.1109/ACCESS.2020.3039815.
[16] J. G. Hayes, R. P. R. De Oliveira, S. Vaughan, and M. G. Egan, “Simplified electric vehicle power train models and range
estimation,” 2011 IEEE Vehicle Power and Propulsion Conference, VPPC 2011, pp. 1–5, 2011, doi:
10.1109/VPPC.2011.6043163.
[17] H. Yu, F. Tseng, and R. McGee, “Driving pattern identification for EV range estimation,” 2012 IEEE International Electric
Vehicle Conference, IEVC 2012, pp. 1–7, 2012, doi: 10.1109/IEVC.2012.6183207.
[18] M. Nilsson, “Electric Vehicles: the Phenomenon of Range Anxiety,” ELVIRE Consortium, pp. 1–16, 2011.
[19] A. Bolovinou, I. Bakas, A. Amditis, F. Mastrandrea, and W. Vinciotti, “Online prediction of an electric vehicle remaining range
based on regression analysis,” 2014 IEEE International Electric Vehicle Conference, IEVC 2014, pp. 1–8, 2014, doi:
10.1109/IEVC.2014.7056167.
[20] H. He, J. Cao, and X. Cui, “Energy optimization of electric vehicle’s acceleration process based on reinforcement learning,”
Journal of Cleaner Production, vol. 248, p. 119302, 2020, doi: 10.1016/j.jclepro.2019.119302.
[21] A. Braun and W. Rid, “Assessing driving pattern factors for the specific energy use of electric vehicles: A factor analysis
approach from case study data of the Mitsubishi i–MiEV minicar,” Transportation Research Part D: Transport and Environment,
vol. 58, pp. 225–238, 2018, doi: 10.1016/j.trd.2017.11.011.
[22] J. Felipe, J. C. Amarillo, J. E. Naranjo, F. Serradilla, and A. Diaz, “Energy Consumption Estimation in Electric Vehicles
Considering Driving Style,” IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 101–106, 2015, doi:
10.1109/ITSC.2015.25.
[23] B. Wang, Y. Wang, K. Qin, and Q. Xia, “Detecting Transportation Modes Based on LightGBM Classifier from GPS Trajectory
Data,” International Conference on Geoinformatics, pp. 1–7, 2018, doi: 10.1109/GEOINFORMATICS.2018.8557149.
[24] H. A. Yavasoglu, Y. E. Tetik, and K. Gokce, “Implementation of machine learning based real time range estimation method
without destination knowledge for BEVs,” Energy, vol. 172, pp. 1179–1186, 2019, doi: 10.1016/j.energy.2019.02.032.
[25] S. Sautermeister, M. Falk, B. Baker, F. Gauterin, and M. Vaillant, “Influence of measurement and prediction uncertainties on
range estimation for electric vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 8, pp. 2615–2626,
2018, doi: 10.1109/TITS.2017.2762829.
[26] C. De Cauwer, J. Van Mierlo, and T. Coosemans, “Energy consumption prediction for electric vehicles based on real-world data,”
Energies, vol. 8, no. 8, pp. 8573–8593, 2015, doi: 10.3390/en8088573.
[27] C. H. Lee and C. H. Wu, “A Novel Big Data Modeling Method for Improving Driving Range Estimation of EVs,” IEEE Access,
vol. 3, pp. 1980–1993, 2015, doi: 10.1109/ACCESS.2015.2492923.
[28] S. R. Salkuti, “Emerging and Advanced Green Energy Technologies for Sustainable and Resilient Future Grid,” Energies, vol. 15,
no. 18, p. 6667, Sep. 2022, doi: 10.3390/en15186667.
[29] V. Sandeep, S. Shastri, A. Sardar, and S. R. Salkuti, “Modeling of battery pack sizing for electric vehicles,” International Journal
of Power Electronics and Drive Systems, vol. 11, no. 4, pp. 1987–1994, 2020, doi: 10.11591/ijpeds.v11.i4.pp1987-1994.
[30] S. R. Salkuti, “Energy Storage and Electric Vehicles: Technology, Operation, Challenges, and Cost-Benefit Analysis,”
International Journal of Advanced Computer Science and Applications, vol. 12, no. 4, pp. 40–45, 2021, doi:
10.14569/IJACSA.2021.0120406.
[31] S. A. Birrell, A. McGordon, and P. A. Jennings, “Defining the accuracy of real-world range estimations of an electric vehicle,”
2014 17th IEEE International Conference on Intelligent Transportation Systems, ITSC 2014, pp. 2590–2595, 2014, doi:
10.1109/ITSC.2014.6958105.
[32] C. Fiori, K. Ahn, and H. A. Rakha, “Power-based electric vehicle energy consumption model: Model development and
validation,” Applied Energy, vol. 168, pp. 257–268, 2016, doi: 10.1016/j.apenergy.2016.01.097.
Exploratory data analysis for electric vehicle driving range prediction … (Debani Prasad Mishra)
482 ISSN: 2252-8792
BIOGRAPHIES OF AUTHORS
Debani Prasad Mishra received the B.Tech. in electrical engineering from the
Biju Patnaik University of Technology, Odisha, India, in 2006 and the M.Tech. in power
systems from IIT, Delhi, India in 2010. He has been awarded the Ph.D. degree in power
systems from Veer Surendra Sai University of Technology, Odisha, India, in 2019. He is
currently serving as Assistant Professor in the Department of Electrical Engineering,
International Institute of Information Technology Bhubaneswar, Odisha. His research interests
include soft computing techniques application in power system, signal processing and power
quality. He can be contacted at email: [email protected].
Int J Appl Power Eng, Vol. 13, No. 2, June 2024: 474-482