Group 24 Miniproject
Group 24 Miniproject
approach
By
Group No.24
(Deepak singh TE-C, roll no 22
)
Under the Guidance of
On
Name: Name:
Date:
Place: Mumbai
TABLE of CONTENTS
List of Figures ..............................................................................................................................
1.1 Motivation........................................................................................................................
1.2 Application.......................................................................................................................
Chapter 4. Implementation..........................................................................................................
7.2 Introduction…………………………………………………………………………….
7.4 Conclusion……………………………………………………………………………..
List of References.....................................................................................................................
LIST OF FIGURES
Data The data for this model is fairly simplified as it has very few missing pieces.
The raw data consists of a training dataset with the features listed above and
their corresponding salaries.
This model can be used as a guide when determining salaries since it shows
reasonable predictions when given information on years of experience.
1.2 Application:-
The goal of this paper is to predict salary of a person after a certain year. The graphical representation of
predicting salary is a process that aims for developing computerized system to maintain all the daily work
of salary growth graph in any field and can predict salary after a certain time period. This application can
take the database for the salary system from the organisation and makes a graph through this information
from the database. It will check the salary fields then import a graph which helps to observe the graphical
representation. And then it can predict a certain time period salary through the prediction algorithm. It can
also be applied in some other effective prediction also.
A prediction is an assumption about a future event. A prediction is sometimes, though not always, is based
upon knowledge or experience. Future events are not necessarily certain, thus confirmed exact data
about the future is in many cases are impossible, a prediction may be useful to help in preparing
plans about probable developments. In this paper salary of an employee of an organization is to be
predicted on basis of previous salary growth rate. Here history of salary has been observed and
then on basis of that salary of a person after a certain period of time it can be calculated
automatically.
It helps to see the growth of any field. It can produce a person’s salary by clustering and predict the
salary through the graph.
1. TECHNOLOGY USED
3.1 Hardware and Software requirement :
Hardware :
Software :
Libraries used:-
pyDrive / Colab Drive:-
->Used to access drive in the colab VM.
Numpy:-
-> NumPy is a library for the Python programming language, adding support for large, multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical functions to
operate on these arrays.
Seaborn / Matplotlib :-
-> Used for data visualization
->Variious scatter plots and different types of graphs can be achieved using this
library.
4. IMPLEMENTATION
We searched a lot for the required dataset .Finally we were able to find the required dataset
from Kaggle. The data for this model is fairly simplified as it has very few missing pieces.
The raw data consists of a training dataset with the features listed above and their
corresponding salaries. Twenty percent of this training dataset was split into a test dataset
with corresponding salaries.
There is also a testing dataset that does not have any salary information available and was
used as a substitute for real-world data.
Information Used To Predict Salaries
Years Experience: How many years of experience
Data preparation :-
After obtaining the dataset , we came to know that there was lot of unnecessary and
redundant data present in the dataset respectively. So we eliminated the unnecessary
elements from the dataset i.e we performed preprocessing of data respectively.
We dropped large number of coloums in order to make our project accurate and sustainable.
We eliminated rows which showed extreme divergence of the data from other rows.
Duplicate values were also eliminated.
How strong the relationship is between two variables (e.g. the relationship between rainfall
and soil erosion).
The value of the dependent variable at a certain value of the independent variable (e.g. the
amount of soil erosion at a certain level of rainfall).
In simple linear regression, we aim to reveal the relationship between a single independent
variable or you can say input, and a corresponding dependent variable or output. We can
discuss this in a simple line as y = β0 +β1x+ε
Here, Y speaks to the output or dependent variable, β0 and β1 are two obscure constants
that speak to the intercept and coefficient that is slope separately, and the error term is ε
Epsilon.
We trained our model using simple linear regression and obtained an accuracy of around
82% .
5 CASE STUDY
7.2 Introduction:-
Vehicle energy consumption and pollutant emissions are key problems for the healthy and
sustainable development of urban transportation. With the continuous growth of car
ownership in China, the energy consumption of its private cars increased 4.2 times, from
13.12 to 68.34 million tons of standard coal, from 2005 to 2015. Based on growth of the
population, GDP, and the proportion of secondary and tertiary industries of China, the trend
of future transportation energy consumption can be predicted. The energy consumption of
private cars will continue to increase before 2020, when it is expected to reach 117.38
million tons of standard coal [1]. Therefore, reducing energy consumption has become one of
the most important challenges in the transportation field.
Among many factors that affect the energy consumption of vehicles, driving behavior plays
an important role. Research conducted by Ford Motor Company [2] shows that improvement
of driving behavior could improve fuel economy by 25% in the short term. Providing drivers
with continuous eco-driving feedback in the long term could lead to a 10% reduction in fuel
consumption. Hiraoka et al. [3] studied the influence of ecological driving behavior on fuel
consumption and found that giving feedback on fuel consumption information to drivers
could improve fuel economy by 10%. In addition, the eco-driving instructions given to
drivers could improve the fuel economy by approximately 15%. Ahn and Rakha [4] analyzed
the influence of drivers’ route choice on vehicle fuel consumption, and the results indicated
that energy consumption and exhaust emissions are significantly reduced by minimizing
high-emission driving behavior. Thus, it is important to study the correlation between driving
behavior and energy consumption and to use driving behavior to predict energy
consumption.
With the rapid development of mobile terminal technology, the application of mobile phone
sensors has been promoted. Mobile phone terminals have been used in the collection of driving
behavior data and for the warning of dangerous driving. Johnson and Trivedi [8] proposed a
system using dynamic time warping (DTW) and smartphone-based sensor fusion to detect
nonaggressive and aggressive driving behavior, which gave audible feedback when it detected
aggressive driving. Guido et al. [9] used the vehicle tracking data from smartphone sensors to
estimate the safety performance of driving (including the deceleration rate to avoid crashes and the
time to collision), and the crash risks in south-bound and north-bound lanes were analyzed. The
application of the mobile phone terminal in driving safety has played an important role in the
evaluation of vehicle fuel consumption. Because driving behavior data collected by mobile
terminals are more detailed and easier to popularize, they lay a foundation for enriching urban
road fuel consumption databases.
At present, the fuel consumption and emission data monitored by the statistical monitoring
platform for the Beijing Municipal Transportation Administration are mostly based on OBD
devices. The data collection objects are mainly taxi drivers, bus drivers, and truck drivers and do
not cover all transportation enterprises. The mobile phone terminal provides a possibility for a
larger scale of data collection. Fuel consumption cannot be directly collected by mobile phone
terminals, but it could be predicted accurately by exploring the correlation between mobile phone
and OBD data. At the same time, the driving behavior data collected by the mobile phone are
influenced by the types, placement, shaking (caused by vehicle vibration), and drivers’ usage of
the phone, resulting in the instability of the driving behavior data, so a lot of calibration work
needs to be done on the data. By constructing a fuel consumption prediction model, the application
of mobile phone data could be used to calculate the fuel consumption of vehicles, which saves the
installation cost of OBD equipment and provides a theoretical basis for traffic management
departments to more accurately monitor urban traffic fuel consumption.
This study proposes a vehicle fuel consumption prediction method based on Global Positioning
System (GPS) data collected from a smartphone. Taxi drivers participated in this experiment. By
matching the driving behavior data of the mobile phone and the fuel consumption data of the OBD
terminal, the driving behavior indexes that affect fuel consumption were screened, and the fuel
consumption prediction models were constructed using machine learning algorithms. The
prediction model of drivers’ individual fuel consumption based on mobile phone data could not
only further improve the real-time monitoring database of fuel consumption with strong error
tolerance but also provide technical support for macro control of urban transportation energy
consumption and effectiveness evaluation of the transportation energy policy.
Since mobile phones cannot obtain the data of vehicles’ fuel consumption directly, the driving behavior
data collected from mobile phones and the fuel consumption collected from OBD were matched, and the
fuel consumption prediction model was built. In the process of model construction, the data collected from
mobile phones and OBD were both applied. After the model was built, larger-scale traffic fuel consumption
was able to be predicted using only the driving behavior data collected from the mobile phones. The
framework of model construction is shown in Figure 1. The steps of fuel consumption prediction are as
follows:
(1)Data collection: natural driving behavior data of multiple drivers were collected based on GPS, linear
accelerometer, gyroscope, and other sensors of mobile phones. At the same time, the real-time vehicle fuel
consumption data were collected by the OBD terminal installed in the vehicle simultaneously
(2)Index extraction: the data of mobile phones and OBD terminals were combined based on time. By
comparing the consistency and difference of driving behavior data of the two terminals, the indexes for
predicting vehicle fuel consumption based on mobile phone data were extracted.
(3)Model construction: the training set and test set were selected randomly, and the fuel consumption
prediction models were built using a back propagation (BP) neural network, a support vector machine, and
a random forest.
(4)Effect evaluation: by building the fuel consumption prediction models several times and comparing the
accuracy and efficiency of the three prediction models using different methods
Prediction Model
BP neural networks, support vector regression (SVR), and random forests are several common
prediction methods with high accuracy and operation efficiency. This study built three types of
prediction models, compared the difference in the prediction results, and finally we chose the best
model for fuel consumption prediction.
BP Neural Network
An artificial neural network (ANN) is an operation model that mimics the process of neurons
transmitting perceptual information to the human brain. This method has the characteristics of
self-learning and high efficiency when processing nonlinear, unstructured, and large sample data.
The error back propagation algorithm (BP neural network) [10] is one of the most widely used
supervised learning algorithms in artificial neural networks. After the weights of the network are
randomly selected, the BP neural network uses the back propagation method to update weights to
minimize loss, and finally the connection weights of the network are determined.
thod to predict vehicle fuel consumption based on mobile terminals is proposed.
As a supervised machine learning algorithm, support vector machines are mainly applied to
classification problems and regression problems [11]. The support vector machine algorithm
transforms nonlinear problems into linear problems in high-dimensional space by constructing
kernel functions, which gives the problem a geometrical explanation.
Random Forest
A random forest (RF) is an effective classification method for prediction and classification . A
random forest is composed of a large number of decision trees. On the basis of decision trees,
random processes are added to the row and column vectors, so as to avoid the potential overfitting
problem of decision trees. For each tree, the training sample is sampled with replacement, and the
out-of-bag (OOB) data in each tree accounts for approximately 37% of the total data. The main
calculation steps of the random forest regression algorithm are as follows:
First of all, k groups of training sample sets were selected by sampling with replacement.
Secondly, m features were randomly selected from n features in each training sample set as
splitting nodes, and k decision trees were generated. The node splitting of each decision tree
adopted the principle of minimum mean square error, which minimizes the sum of mean square
deviations of two groups of datasets after splitting. Finally, the predicted vehicle fuel consumption
was obtained by averaging the predicted value of k decision tree
Literature Survey:-
At present, the fuel consumption and emission data monitored by the statistical
monitoring platform for the Beijing Municipal Transportation Administration are
mostly based on OBD devices. The data collection objects are mainly taxi drivers,
bus drivers, and truck drivers and do not cover all transportation enterprises. The
mobile phone terminal provides a possibility for a larger scale of data collection.
Fuel consumption cannot be directly collected by mobile phone terminals, but it
could be predicted accurately by exploring the correlation between mobile phone
and OBD data. At the same time, the driving behavior data collected by the mobile
phone are influenced by the types, placement, shaking (caused by vehicle
vibration), and drivers’ usage of the phone, resulting in the instability of the driving
behavior data, so a lot of calibration work needs to be done on the data. By
constructing a fuel consumption prediction model, the application of mobile phone
data could be used to calculate the fuel consumption of vehicles, which saves the
installation cost of OBD equipment and provides a theoretical basis for traffic
management departments to more accurately monitor urban traffic fuel
consumption.
This study proposes a vehicle fuel consumption prediction method based on Global Positioning
System (GPS) data collected from a smartphone. Taxi drivers participated in this experiment. By
matching the driving behavior data of the mobile phone and the fuel consumption data of the OBD
terminal, the driving behavior indexes that affect fuel consumption were screened, and the fuel
consumption prediction models were constructed using machine learning algorithms. The
prediction model of drivers’ individual fuel consumption based on mobile phone data could not
only further improve the real-time monitoring database of fuel consumption with strong error
tolerance but also provide technical support for macro control of urban transportation energy
consumption and effectiveness evaluation of the transportation energy policy.
7.4 Conclusion:-
In this study, driving behavior data and fuel consumption data of taxi drivers collected from OBD
and mobile phone terminals, respectively, were matched. The correlation between driving
behavior and fuel consumption was analyzed, and relevant driving behavior indicators affecting
fuel consumption were extracted through the filter-based feature selection method. Using the
seven selected driving behavior indicators (namely, average speed, ASEI, average acceleration,
average deceleration, acceleration time percentage, deceleration time percentage, and cruising
time percentage), three fuel consumption prediction models based on a BP neural network, SVR,
and a random forest were constructed.
The results of model error and the run time comparison analysis show that the three models could
predict fuel consumption accurately, and the random forest model had the highest accuracy and
efficiency, with an RMSE of 0.783 L/100 km, mean absolute percentage error (K) of 6.9%, and
model running time of 0.14 s. This finding is consistent with the research of Wickramanayake and
Bandara [15], which also shows that random forest models are most effective in predicting fuel
consumption based on driving behavior data. The research object of Wickramanayake and
Bandara is the fuel consumption prediction of the bus, and this study focuses on the fuel
consumption of the taxicabs. At the same time, the driving behavior data of this study are collected
from mobile phones with higher flexibility and complexity rather than a fixed GPS device. This
method could predict vehicle fuel consumption with high accuracy and efficiency based on cell
phone data and provide strong support for traffic management departments to monitor the
ecological levels of driving behavior of taxi drivers.
It is worth emphasizing that in the early stage of model construction, driving behavior data
collected by mobile phones and fuel consumption data collected by OBD are applied. After the
prediction model is built, mobile phone data can be directly used to predict the daily fuel
consumption of drivers without installing OBD devices. Application of this method could change
the traditional way of fuel consumption acquisition, and the use of mobile phone data to evaluate
the ecological impacts of individual driving behavior could save the cost of equipment installation.
At the same time, since not all taxi drivers are willing to install OBD devices in their taxicabs, this
method could help increase the user data source, which could greatly improve the database size of
taxi fuel consumption. Therefore, the method in this study could improve the depth, breadth, and
refinement level of fuel consumption monitoring and management of taxi drivers’ driving
behavior, thus laying a theoretical foundation and providing technical support for the city to
reduce fuel consumption.
This study aims to propose a method to predict vehicle energy consumption using mobile phone
data. Although the sample size used in this study is limited, it provides a basis for larger scale and
more accurate fuel consumption prediction. In future research, the collection of samples will be
further expanded, and the fuel consumption under various road conditions, traffic conditions, and
weather conditions would be considered. Through the data enrichment, model optimization, and
improvement of the prediction indicators, the method could lay a theoretical foundation for the
precise energy consumption supervision of taxi enterprises. Meanwhile, since taxicabs are
relatively homogenous, the fuel consumption prediction model in this study was fixed, taking only
taxi drivers as the research object. In future study, more types of vehicles, such as buses and
trucks, could be considered. Differentiated fuel consumption prediction models based on different
vehicle types could be constructed to further improve the monitoring and management of urban
energy consumption.
Transportation is an important factor that affects energy consumption, and driving behavior is one of the
main factors affecting vehicle fuel consumption. This method lays a foundation for monitoring database
improvement and fine management of urban transportation fuel consumption.
LIST OF REFERENCES
1. H. Wang, “Energy consumption in transport: an assessment of changing trend, influencing
factors and consumption forecast,” Journal of Chongqing University of Technology (Social
Science), vol. 7, 2017.View at: Google Scholar
4. K. Ahn and H. Rakha, “The effects of route choice decisions on vehicle energy
consumption and emissions,” Transportation Research Part D: Transport and Environment, vol.
13, no. 3, pp. 151–167, 2008.View at: Publisher Site | Google Scholar
5. K. Hu, J. Wu, and M. Liu, “Modelling of EVs energy consumption from perspective of
field test data and driving style questionnaires,” Journal of System Simulation, vol. 30, no.
11, pp. 83–91, 2018.View at: Google Scholar
6. Z. Xu, T. Wei, S. Easa, X. Zhao, and X. Qu, “Modeling relationship between truck fuel
consumption and driving behavior using data from internet of vehicles,” Computer-Aided
Civil and Infrastructure Engineering, vol. 33, no. 3, pp. 209–219, 2018.View at: Publisher
Site | Google Scholar
7. X.-h. Zhao, Y. Yao, Y.-p. Wu, C. Chen, and J. Rong, “Prediction model of driving energy
consumption based on PCA and BP network,” Journal of Transportation Systems
Engineering and Information Technology, vol. 5, pp. 185–191, 2016.View at: Google
Scholar
10. W. J. Zhang, S. X. Yu, Y. F. Peng, Z. J. Cheng, and C. Wang, “Driving habits analysis on
vehicle data using error back-propagation neural network algorithm,” in Computing,
Control, Information and Education Engineering, vol. 55, CRC Press, Guilin, China,
2015.View at: Google Scholar
11. H. Drucker, J. C. Chris, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression
machines,” in Advances in Neural Information Processing Systems, pp. 155–161, MIT
Press, Cambridge, MA, USA, 1997.View at: Google Scholar
.
.
.