0% found this document useful (0 votes)

10 views33 pages

Group 24 Miniproject

This document describes a project to predict salary using machine learning. A group of 4 students created a model that predicts salary based on years of experience and job type. They obtained a dataset from Kaggle containing salaries and experience levels. They preprocessed the data by removing unnecessary columns and duplicate values. Then they implemented machine learning algorithms and evaluated the models to predict salaries for new data.

Uploaded by

23SECOMPC KaranSingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views33 pages

Group 24 Miniproject

Uploaded by

23SECOMPC KaranSingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Salary prediction using Machine learning

approach
By

Group No.24
(Deepak singh TE-C, roll no 22

Karan singh TE-C , roll no 23

Rahul Yadav TE-C, roll no 61

Amay sharma TE-C, roll no 71

)
Under the Guidance of

Mrs. Shiwani Gupta

Assistant Professor

for the subject

Machine Learning
In

T.E. COMPUTER ENGINEERING

(Academic Year: 2021-22)
CERTIFICATE
This is to certify that
Deepak singh TE-C, roll no 22

Karan singh TE-C , roll no 23

Rahul Yadav TE-C, roll no 61

Amay sharma TE-C, roll no 71

Have satisfactorily completed the requirements of the T.E Capstone

Project Report

Salary prediction using Machine

learning approach

Shiwani Gupta Dr. Harshali Patil

Subject In-charge HOD COMP

Examiners

1. Signature: …………………. 2. Signature: ………………….

Name: Name:
Date:

Place: Mumbai
TABLE of CONTENTS
List of Figures ..............................................................................................................................

Chapter 1. Introduction ...............................................................................................................

1.1 Motivation........................................................................................................................

1.2 Application.......................................................................................................................

Chapter 2. Problem Definition...................................................................................................

Chapter 3. Technology Used........................................................................................................

3.1 Hardware and Software Requirement ............................................................................. .

3.2 Description of libraries used ............................................................................................

Chapter 4. Implementation..........................................................................................................

4.1 Data Description ..............................................................................................................

4.2 Data Preparation...............................................................................................................

4.3 Choice of Model ..............................................................................................................

4.4 Model Training and Validation........................................................................................

4.3 Hyperparameter Tuning and Evaluation............................................................................

Chapter 5. Result and Analysis ...................................................................................................

Chapter 6. Conclusion and Future Scope.................................................................................

Chapter 7. Case Study..............................................................................................................

7.1 Problem Definition…………………………………………………………………….

7.2 Introduction…………………………………………………………………………….

7.3 Executive Summary…………………………………………………………………….

7.4 Conclusion……………………………………………………………………………..

7.5 Future Scope…………………………………………………………………………...

List of References.....................................................................................................................
LIST OF FIGURES

Figure 1. Dataset consisting of various parameters.............................................................................09

Figure 2. Data Preprocessing...............................................................................................................10

Figure 3. Data visualization.................................................................................................................15

Figure 4. Frontend (working of project)..............................................................................................16

Figure 5. Model Architecture designed...............................................................................................16

Figure 6. Recommendation system working.......................................................................................20

1. INTRODUCTION
1.1 Motivation:-
The purpose of this project is to use data transformation and machine learning
to create a model that will predict a salary when given years of experience, job
type. The purpose of this project is to use data transformation and machine
learning to create a model that will predict a salary when given years of
experience, job type.

Data The data for this model is fairly simplified as it has very few missing pieces.
The raw data consists of a training dataset with the features listed above and
their corresponding salaries.

Information Used To Predict Salaries Years Experience: How many years of

experience .

This model can be used as a guide when determining salaries since it shows
reasonable predictions when given information on years of experience.

1.2 Application:-

model.py trains and saves the model to the disk.

model.pkb the pickle model
Run App :-
app.py contains all the requiered for flask and to manage APIs.
Procedure
Open command Prompt and go to given directory and then run python app.py
2. PROBLEM DEFINITION

The goal of this paper is to predict salary of a person after a certain year. The graphical representation of
predicting salary is a process that aims for developing computerized system to maintain all the daily work
of salary growth graph in any field and can predict salary after a certain time period. This application can
take the database for the salary system from the organisation and makes a graph through this information
from the database. It will check the salary fields then import a graph which helps to observe the graphical
representation. And then it can predict a certain time period salary through the prediction algorithm. It can
also be applied in some other effective prediction also.

A prediction is an assumption about a future event. A prediction is sometimes, though not always, is based
upon knowledge or experience. Future events are not necessarily certain, thus confirmed exact data
about the future is in many cases are impossible, a prediction may be useful to help in preparing
plans about probable developments. In this paper salary of an employee of an organization is to be
predicted on basis of previous salary growth rate. Here history of salary has been observed and
then on basis of that salary of a person after a certain period of time it can be calculated
automatically.

It helps to see the growth of any field. It can produce a person’s salary by clustering and predict the
salary through the graph.

1. TECHNOLOGY USED
3.1 Hardware and Software requirement :

Hardware :

4 GB RAM , 4+ CORES , SSD storage .

Software :

IDE - Jupyter Notebook , Flask

Python Data Science S/W stack (pip, conda)

Libraries used:-
pyDrive / Colab Drive:-
->Used to access drive in the colab VM.
Numpy:-
-> NumPy is a library for the Python programming language, adding support for large, multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical functions to
operate on these arrays.
Seaborn / Matplotlib :-
-> Used for data visualization
->Variious scatter plots and different types of graphs can be achieved using this
library.

Pandas — For handling structured data

Scikit Learn — Scikit-learn is a free software machine learning library for the Python programming
language. It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with
the Python numerical and scientific libraries NumPy and Scipy

4. IMPLEMENTATION

4.1 Data Description :-

We searched a lot for the required dataset .Finally we were able to find the required dataset
from Kaggle. The data for this model is fairly simplified as it has very few missing pieces.
The raw data consists of a training dataset with the features listed above and their
corresponding salaries. Twenty percent of this training dataset was split into a test dataset
with corresponding salaries.

There is also a testing dataset that does not have any salary information available and was
used as a substitute for real-world data.
Information Used To Predict Salaries
Years Experience: How many years of experience
Data preparation :-

After obtaining the dataset , we came to know that there was lot of unnecessary and
redundant data present in the dataset respectively. So we eliminated the unnecessary
elements from the dataset i.e we performed preprocessing of data respectively.
We dropped large number of coloums in order to make our project accurate and sustainable.
We eliminated rows which showed extreme divergence of the data from other rows.
Duplicate values were also eliminated.

So we decided to keep only those parameters which were essential.

4.3 Choice of model:-

Machine learning provides a range of Models and Algorithms , for solving problems in various
domains. Hence, it’s instrumental to choose the most appropriate model ,to achieve the best
performance. Simple linear regression is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables:
 One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
 The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Because the other terms are used less frequently today, we'll use the "predictor" and "response" terms
to refer to the variables encountered in this course. The other terms are mentioned only to make you
aware of them should you encounter them. Simple linear regression gets its adjective "simple,"
because it concerns the study of only one predictor variable. In contrast, multiple linear regression,
which we study later in this course, gets its adjective "multiple," because it concerns the study of two
or more predictor variables.

4.3.1 Proposed Work

In this work we are going to predict price of houses depending on certain parameters using
Machine learning. We performed Exploratory Data Analysis, split the training and testing
data, Model Evaluation and Predictions.
Simple linear regression is used to estimate the relationship between two quantitative
variables. You can use simple linear regression when you want to know:

How strong the relationship is between two variables (e.g. the relationship between rainfall
and soil erosion).
The value of the dependent variable at a certain value of the independent variable (e.g. the
amount of soil erosion at a certain level of rainfall).
In simple linear regression, we aim to reveal the relationship between a single independent
variable or you can say input, and a corresponding dependent variable or output. We can
discuss this in a simple line as y = β0 +β1x+ε

Here, Y speaks to the output or dependent variable, β0 and β1 are two obscure constants
that speak to the intercept and coefficient that is slope separately, and the error term is ε
Epsilon.

4.3.2 Model training and validation:-

We trained our model using simple linear regression and obtained an accuracy of around
82% .

5.RESULT AND ANALYSIS

We were able to predict the salaries using Machine Learning. We
made the use of simple linear regression model in order to do so.
We tried many models and out of them simple linear regression turned out to be
most beneficial giving us the accuracy of about 82%.
We tried to improve the accuracy by introducing different models simultaneously.
Finally we made a front end with the help of flutter .

6.CONCLUSION and FUTURE SCOPE

We were able to make a project using Machine Learning which predicts the salaries based
on certain parameters.
Using Flutter we were able to make a front end.
We will try to add more parameters in future and will try to use different models in order to
improve the accuracy of our project.
We will also try to incorporate more functions so that it can be accurately used to predict
salaries in real world.
We will also try to increase the size of data set to be trained and remove redundancy.

6.2 FUTURE SCOPE

The future scope which we plan to implement on our project is as follows:-
● Improve Model Architecture
● Expand Existing Dataset
● More Training of the Model to achieve better accuracy
● Better Evaluation Metric

5 CASE STUDY

7.1 Problem Definition :-

Transportation is an important factor that affects energy consumption, and driving behavior is one
of the main factors affecting vehicle fuel consumption. The purpose of this paper is to improve
fuel consumption monitoring databases based on mobile phone data. Based on the mobile phone
terminals and on-board diagnostic system (OBD) installed in taxis, driving behavior data and fuel
consumption data are extracted, respectively. By matching the driving behavior data collected by a
mobile phone with the fuel consumption data collected by OBD, the correlation between driving
behavior and fuel consumption is explored, so that vehicle fuel consumption could be predicted
based on mobile phone data. The fuel consumption prediction models are built using back
propagation (BP) neural network, support vector regression (SVR), and random forests. The
results show that the average speed, average speed except for idle (ASEI), average acceleration,
average deceleration, acceleration time percentage, deceleration time percentage, and cruising
time percentage are important indicators for fuel consumption evaluation. All three models could
predict fuel consumption accurately, with an absolute relative error less than 10%. The random
forest model is proved to have the highest accuracy and runs faster, making it suitable for wide
application. This method lays a foundation for monitoring database improvement and fine
management of urban transportation fuel consumption.

7.2 Introduction:-
Vehicle energy consumption and pollutant emissions are key problems for the healthy and
sustainable development of urban transportation. With the continuous growth of car
ownership in China, the energy consumption of its private cars increased 4.2 times, from
13.12 to 68.34 million tons of standard coal, from 2005 to 2015. Based on growth of the
population, GDP, and the proportion of secondary and tertiary industries of China, the trend
of future transportation energy consumption can be predicted. The energy consumption of
private cars will continue to increase before 2020, when it is expected to reach 117.38
million tons of standard coal [1]. Therefore, reducing energy consumption has become one of
the most important challenges in the transportation field.

Among many factors that affect the energy consumption of vehicles, driving behavior plays
an important role. Research conducted by Ford Motor Company [2] shows that improvement
of driving behavior could improve fuel economy by 25% in the short term. Providing drivers
with continuous eco-driving feedback in the long term could lead to a 10% reduction in fuel
consumption. Hiraoka et al. [3] studied the influence of ecological driving behavior on fuel
consumption and found that giving feedback on fuel consumption information to drivers
could improve fuel economy by 10%. In addition, the eco-driving instructions given to
drivers could improve the fuel economy by approximately 15%. Ahn and Rakha [4] analyzed
the influence of drivers’ route choice on vehicle fuel consumption, and the results indicated
that energy consumption and exhaust emissions are significantly reduced by minimizing
high-emission driving behavior. Thus, it is important to study the correlation between driving
behavior and energy consumption and to use driving behavior to predict energy
consumption.

At present, there is a significant volume of research on prediction models of energy consumption

based on driving behavior. Hu et al. [5] conducted some real vehicle tests and a questionnaire
survey to study the influence of driving style on the fuel consumption of electric vehicles on urban
roads and constructed a prediction model for the fuel consumption of electric vehicles. Xu et al.
[6] constructed two kinds of truck fuel consumption prediction models using driving behavior data
obtained from the Internet of vehicles. The dynamic relationship between truck fuel consumption
and truck drivers’ driving behavior was described using an energy consumption index, and a
generalized regression neural network model was established to predict truck fuel consumption.
Zhao et al. [7] built a fuel consumption prediction model of urban road sections based on driving
behavior by applying a machine learning algorithm, and the model could intuitively show the
distribution characteristics of fuel consumption in basic sections of the Beijing expressway.
Data sources supporting the studies of fuel consumption prediction are mostly based on the data
collected from the main controller of the vehicle, and an on-board diagnostic system (OBD) in
conjunction with a questionnaire. The controller and OBD are limited by the equipment
installation cost and drivers’ installation willingness, so can only realize small-scale data
management for small areas and with high uncertainty. The data collection form of a questionnaire
also lacks flexibility, and it is difficult to guarantee the quality of the data.

With the rapid development of mobile terminal technology, the application of mobile phone
sensors has been promoted. Mobile phone terminals have been used in the collection of driving
behavior data and for the warning of dangerous driving. Johnson and Trivedi [8] proposed a
system using dynamic time warping (DTW) and smartphone-based sensor fusion to detect
nonaggressive and aggressive driving behavior, which gave audible feedback when it detected
aggressive driving. Guido et al. [9] used the vehicle tracking data from smartphone sensors to
estimate the safety performance of driving (including the deceleration rate to avoid crashes and the
time to collision), and the crash risks in south-bound and north-bound lanes were analyzed. The
application of the mobile phone terminal in driving safety has played an important role in the
evaluation of vehicle fuel consumption. Because driving behavior data collected by mobile
terminals are more detailed and easier to popularize, they lay a foundation for enriching urban
road fuel consumption databases.
At present, the fuel consumption and emission data monitored by the statistical monitoring
platform for the Beijing Municipal Transportation Administration are mostly based on OBD
devices. The data collection objects are mainly taxi drivers, bus drivers, and truck drivers and do
not cover all transportation enterprises. The mobile phone terminal provides a possibility for a
larger scale of data collection. Fuel consumption cannot be directly collected by mobile phone
terminals, but it could be predicted accurately by exploring the correlation between mobile phone
and OBD data. At the same time, the driving behavior data collected by the mobile phone are
influenced by the types, placement, shaking (caused by vehicle vibration), and drivers’ usage of
the phone, resulting in the instability of the driving behavior data, so a lot of calibration work
needs to be done on the data. By constructing a fuel consumption prediction model, the application
of mobile phone data could be used to calculate the fuel consumption of vehicles, which saves the
installation cost of OBD equipment and provides a theoretical basis for traffic management
departments to more accurately monitor urban traffic fuel consumption.

This study proposes a vehicle fuel consumption prediction method based on Global Positioning
System (GPS) data collected from a smartphone. Taxi drivers participated in this experiment. By
matching the driving behavior data of the mobile phone and the fuel consumption data of the OBD
terminal, the driving behavior indexes that affect fuel consumption were screened, and the fuel
consumption prediction models were constructed using machine learning algorithms. The
prediction model of drivers’ individual fuel consumption based on mobile phone data could not
only further improve the real-time monitoring database of fuel consumption with strong error
tolerance but also provide technical support for macro control of urban transportation energy
consumption and effectiveness evaluation of the transportation energy policy.

7.3 Executive Summary-

Since mobile phones cannot obtain the data of vehicles’ fuel consumption directly, the driving behavior
data collected from mobile phones and the fuel consumption collected from OBD were matched, and the
fuel consumption prediction model was built. In the process of model construction, the data collected from
mobile phones and OBD were both applied. After the model was built, larger-scale traffic fuel consumption
was able to be predicted using only the driving behavior data collected from the mobile phones. The
framework of model construction is shown in Figure 1. The steps of fuel consumption prediction are as
follows:
(1)Data collection: natural driving behavior data of multiple drivers were collected based on GPS, linear
accelerometer, gyroscope, and other sensors of mobile phones. At the same time, the real-time vehicle fuel
consumption data were collected by the OBD terminal installed in the vehicle simultaneously
(2)Index extraction: the data of mobile phones and OBD terminals were combined based on time. By
comparing the consistency and difference of driving behavior data of the two terminals, the indexes for
predicting vehicle fuel consumption based on mobile phone data were extracted.
(3)Model construction: the training set and test set were selected randomly, and the fuel consumption
prediction models were built using a back propagation (BP) neural network, a support vector machine, and
a random forest.
(4)Effect evaluation: by building the fuel consumption prediction models several times and comparing the
accuracy and efficiency of the three prediction models using different methods

Prediction Model

BP neural networks, support vector regression (SVR), and random forests are several common
prediction methods with high accuracy and operation efficiency. This study built three types of
prediction models, compared the difference in the prediction results, and finally we chose the best
model for fuel consumption prediction.

BP Neural Network

An artificial neural network (ANN) is an operation model that mimics the process of neurons
transmitting perceptual information to the human brain. This method has the characteristics of
self-learning and high efficiency when processing nonlinear, unstructured, and large sample data.
The error back propagation algorithm (BP neural network) [10] is one of the most widely used
supervised learning algorithms in artificial neural networks. After the weights of the network are
randomly selected, the BP neural network uses the back propagation method to update weights to
minimize loss, and finally the connection weights of the network are determined.
thod to predict vehicle fuel consumption based on mobile terminals is proposed.

Support Vector Regression (SVR)

As a supervised machine learning algorithm, support vector machines are mainly applied to
classification problems and regression problems [11]. The support vector machine algorithm
transforms nonlinear problems into linear problems in high-dimensional space by constructing
kernel functions, which gives the problem a geometrical explanation.

Random Forest

A random forest (RF) is an effective classification method for prediction and classification . A
random forest is composed of a large number of decision trees. On the basis of decision trees,
random processes are added to the row and column vectors, so as to avoid the potential overfitting
problem of decision trees. For each tree, the training sample is sampled with replacement, and the
out-of-bag (OOB) data in each tree accounts for approximately 37% of the total data. The main
calculation steps of the random forest regression algorithm are as follows:

First of all, k groups of training sample sets were selected by sampling with replacement.
Secondly, m features were randomly selected from n features in each training sample set as
splitting nodes, and k decision trees were generated. The node splitting of each decision tree
adopted the principle of minimum mean square error, which minimizes the sum of mean square
deviations of two groups of datasets after splitting. Finally, the predicted vehicle fuel consumption
was obtained by averaging the predicted value of k decision tree
Literature Survey:-
At present, the fuel consumption and emission data monitored by the statistical
monitoring platform for the Beijing Municipal Transportation Administration are
mostly based on OBD devices. The data collection objects are mainly taxi drivers,
bus drivers, and truck drivers and do not cover all transportation enterprises. The
mobile phone terminal provides a possibility for a larger scale of data collection.
Fuel consumption cannot be directly collected by mobile phone terminals, but it
could be predicted accurately by exploring the correlation between mobile phone
and OBD data. At the same time, the driving behavior data collected by the mobile
phone are influenced by the types, placement, shaking (caused by vehicle
vibration), and drivers’ usage of the phone, resulting in the instability of the driving
behavior data, so a lot of calibration work needs to be done on the data. By
constructing a fuel consumption prediction model, the application of mobile phone
data could be used to calculate the fuel consumption of vehicles, which saves the
installation cost of OBD equipment and provides a theoretical basis for traffic
management departments to more accurately monitor urban traffic fuel
consumption.
This study proposes a vehicle fuel consumption prediction method based on Global Positioning
System (GPS) data collected from a smartphone. Taxi drivers participated in this experiment. By
matching the driving behavior data of the mobile phone and the fuel consumption data of the OBD
terminal, the driving behavior indexes that affect fuel consumption were screened, and the fuel
consumption prediction models were constructed using machine learning algorithms. The
prediction model of drivers’ individual fuel consumption based on mobile phone data could not
only further improve the real-time monitoring database of fuel consumption with strong error
tolerance but also provide technical support for macro control of urban transportation energy
consumption and effectiveness evaluation of the transportation energy policy.

7.4 Conclusion:-

In this study, driving behavior data and fuel consumption data of taxi drivers collected from OBD
and mobile phone terminals, respectively, were matched. The correlation between driving
behavior and fuel consumption was analyzed, and relevant driving behavior indicators affecting
fuel consumption were extracted through the filter-based feature selection method. Using the
seven selected driving behavior indicators (namely, average speed, ASEI, average acceleration,
average deceleration, acceleration time percentage, deceleration time percentage, and cruising
time percentage), three fuel consumption prediction models based on a BP neural network, SVR,
and a random forest were constructed.

The results of model error and the run time comparison analysis show that the three models could
predict fuel consumption accurately, and the random forest model had the highest accuracy and
efficiency, with an RMSE of 0.783 L/100 km, mean absolute percentage error (K) of 6.9%, and
model running time of 0.14 s. This finding is consistent with the research of Wickramanayake and
Bandara [15], which also shows that random forest models are most effective in predicting fuel
consumption based on driving behavior data. The research object of Wickramanayake and
Bandara is the fuel consumption prediction of the bus, and this study focuses on the fuel
consumption of the taxicabs. At the same time, the driving behavior data of this study are collected
from mobile phones with higher flexibility and complexity rather than a fixed GPS device. This
method could predict vehicle fuel consumption with high accuracy and efficiency based on cell
phone data and provide strong support for traffic management departments to monitor the
ecological levels of driving behavior of taxi drivers.

It is worth emphasizing that in the early stage of model construction, driving behavior data
collected by mobile phones and fuel consumption data collected by OBD are applied. After the
prediction model is built, mobile phone data can be directly used to predict the daily fuel
consumption of drivers without installing OBD devices. Application of this method could change
the traditional way of fuel consumption acquisition, and the use of mobile phone data to evaluate
the ecological impacts of individual driving behavior could save the cost of equipment installation.
At the same time, since not all taxi drivers are willing to install OBD devices in their taxicabs, this
method could help increase the user data source, which could greatly improve the database size of
taxi fuel consumption. Therefore, the method in this study could improve the depth, breadth, and
refinement level of fuel consumption monitoring and management of taxi drivers’ driving
behavior, thus laying a theoretical foundation and providing technical support for the city to
reduce fuel consumption.

This study aims to propose a method to predict vehicle energy consumption using mobile phone
data. Although the sample size used in this study is limited, it provides a basis for larger scale and
more accurate fuel consumption prediction. In future research, the collection of samples will be
further expanded, and the fuel consumption under various road conditions, traffic conditions, and
weather conditions would be considered. Through the data enrichment, model optimization, and
improvement of the prediction indicators, the method could lay a theoretical foundation for the
precise energy consumption supervision of taxi enterprises. Meanwhile, since taxicabs are
relatively homogenous, the fuel consumption prediction model in this study was fixed, taking only
taxi drivers as the research object. In future study, more types of vehicles, such as buses and
trucks, could be considered. Differentiated fuel consumption prediction models based on different
vehicle types could be constructed to further improve the monitoring and management of urban
energy consumption.

7.5 :- Future Scope:-

Transportation is an important factor that affects energy consumption, and driving behavior is one of the
main factors affecting vehicle fuel consumption. This method lays a foundation for monitoring database
improvement and fine management of urban transportation fuel consumption.

LIST OF REFERENCES
1. H. Wang, “Energy consumption in transport: an assessment of changing trend, influencing
factors and consumption forecast,” Journal of Chongqing University of Technology (Social
Science), vol. 7, 2017.View at: Google Scholar

2. J. N. Barkenbus, “Eco-driving: an overlooked climate change initiative,” Energy Policy,

vol. 38, no. 2, pp. 762–769, 2010.View at: Publisher Site | Google Scholar

3. T. Hiraoka, Y. Terakado, S. Matsumoto, and S. Yamabe, “Quantitative evaluation of eco-

driving on fuel consumption based on driving simulator experiments,” in Proceedings of the 16th
ITS World Congress and Exhibition on Intelligent Transport Systems and Services, Stockholm,
Sweden, September 2009.View at: Google Scholar

4. K. Ahn and H. Rakha, “The effects of route choice decisions on vehicle energy
consumption and emissions,” Transportation Research Part D: Transport and Environment, vol.
13, no. 3, pp. 151–167, 2008.View at: Publisher Site | Google Scholar

5. K. Hu, J. Wu, and M. Liu, “Modelling of EVs energy consumption from perspective of
field test data and driving style questionnaires,” Journal of System Simulation, vol. 30, no.
11, pp. 83–91, 2018.View at: Google Scholar

6. Z. Xu, T. Wei, S. Easa, X. Zhao, and X. Qu, “Modeling relationship between truck fuel
consumption and driving behavior using data from internet of vehicles,” Computer-Aided
Civil and Infrastructure Engineering, vol. 33, no. 3, pp. 209–219, 2018.View at: Publisher
Site | Google Scholar

7. X.-h. Zhao, Y. Yao, Y.-p. Wu, C. Chen, and J. Rong, “Prediction model of driving energy
consumption based on PCA and BP network,” Journal of Transportation Systems
Engineering and Information Technology, vol. 5, pp. 185–191, 2016.View at: Google
Scholar

8. D. A. Johnson and M. M. Trivedi, “Driving style recognition using a smartphone as a

sensor platform,” in Proceedings of the 2011 14th International IEEE Conference on
Intelligent Transportation Systems (ITSC), pp. 1609–1615, Toronto, Canada, October
2011.View at: Publisher Site | Google Scholar

9. G. Guido, A. Vitale, V. Astarita, F. Saccomanno, V. P. Giofré, and V. Gallelli, “Estimation

of safety performance measures from smartphone sensors,” Procedia—Social and
Behavioral Sciences, vol. 54, pp. 1095–1103, 2012.View at: Publisher Site | Google
Scholar

10. W. J. Zhang, S. X. Yu, Y. F. Peng, Z. J. Cheng, and C. Wang, “Driving habits analysis on
vehicle data using error back-propagation neural network algorithm,” in Computing,
Control, Information and Education Engineering, vol. 55, CRC Press, Guilin, China,
2015.View at: Google Scholar

11. H. Drucker, J. C. Chris, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression
machines,” in Advances in Neural Information Processing Systems, pp. 155–161, MIT
Press, Cambridge, MA, USA, 1997.View at: Google Scholar
.
.
.

Logcat
No ratings yet
Logcat
7,710 pages
Assessment 2 UEL CN 7000
No ratings yet
Assessment 2 UEL CN 7000
10 pages
vertopal.com_Final007
No ratings yet
vertopal.com_Final007
35 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
DS final project
No ratings yet
DS final project
20 pages
SALARY PREDICTION DOCUMENT
No ratings yet
SALARY PREDICTION DOCUMENT
30 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
ML Report
No ratings yet
ML Report
20 pages
Salary Prediction-2
No ratings yet
Salary Prediction-2
26 pages
PSP Manual PArt2
No ratings yet
PSP Manual PArt2
202 pages
Assessment 1 - UEL-CN-7000
No ratings yet
Assessment 1 - UEL-CN-7000
3 pages
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
No ratings yet
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
59 pages
Salary_Prediction
No ratings yet
Salary_Prediction
9 pages
AC450 Testing Create Data Base and Download To AC450
100% (1)
AC450 Testing Create Data Base and Download To AC450
25 pages
Batch 1 Publication
No ratings yet
Batch 1 Publication
16 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
AI 53
No ratings yet
AI 53
13 pages
CODE MASTERS
No ratings yet
CODE MASTERS
10 pages
JOB SALARIES PREDICTION SYSTEM
No ratings yet
JOB SALARIES PREDICTION SYSTEM
9 pages
ssrn-3526707
No ratings yet
ssrn-3526707
5 pages
PPSD 1683560645
No ratings yet
PPSD 1683560645
9 pages
House Price Using Machine Learning (1)
No ratings yet
House Price Using Machine Learning (1)
9 pages
KAUSHIK PROJECT
No ratings yet
KAUSHIK PROJECT
13 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
Foundations of Machine Learning
No ratings yet
Foundations of Machine Learning
15 pages
Gladwin Tirkey Research Paper
No ratings yet
Gladwin Tirkey Research Paper
7 pages
Project Report
No ratings yet
Project Report
11 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Thesis
No ratings yet
Thesis
45 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
Volume6_Issue3_Paper10_2022
No ratings yet
Volume6_Issue3_Paper10_2022
6 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Employee Salary Prediction Slides
No ratings yet
Employee Salary Prediction Slides
21 pages
Salary_hike_predictor_synopsis
No ratings yet
Salary_hike_predictor_synopsis
4 pages
nag final
No ratings yet
nag final
3 pages
2[1]
No ratings yet
2[1]
4 pages
shsconf_cdems2023_03013
No ratings yet
shsconf_cdems2023_03013
5 pages
SALARY PREDICTION ABSTRACT
No ratings yet
SALARY PREDICTION ABSTRACT
5 pages
Linear Regression Research Paper
No ratings yet
Linear Regression Research Paper
2 pages
SSRN Id3990877
No ratings yet
SSRN Id3990877
8 pages
Task1
No ratings yet
Task1
5 pages
Internship-Data Science and Machine Learning Using Python
No ratings yet
Internship-Data Science and Machine Learning Using Python
5 pages
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
No ratings yet
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
6 pages
Dnyaneshwar Ds
No ratings yet
Dnyaneshwar Ds
2 pages
Course Project - Machine Learning (DS PGC)
No ratings yet
Course Project - Machine Learning (DS PGC)
6 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Lidar Simulation for Robotic Application
No ratings yet
Lidar Simulation for Robotic Application
112 pages
DSciHomeworkAssignmentV4
No ratings yet
DSciHomeworkAssignmentV4
2 pages
About Classificatio1
No ratings yet
About Classificatio1
5 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Salary Prediction Model Using Principal Component Analysis and Deep Neural Network Algorithm
No ratings yet
Salary Prediction Model Using Principal Component Analysis and Deep Neural Network Algorithm
11 pages
55633946 Final
No ratings yet
55633946 Final
50 pages
Project Synopsis
33% (3)
Project Synopsis
4 pages
7-1-1 Integration Server Built-In Services Reference
100% (2)
7-1-1 Integration Server Built-In Services Reference
704 pages
Machine Learning Models For Salary Prediction Dataset Using Python
No ratings yet
Machine Learning Models For Salary Prediction Dataset Using Python
5 pages
Module 1 - Introduction To Digital Investigation and Forensics
No ratings yet
Module 1 - Introduction To Digital Investigation and Forensics
34 pages
Exp 1
No ratings yet
Exp 1
6 pages
Salary Prediction Using Machine Learning
No ratings yet
Salary Prediction Using Machine Learning
4 pages
Home Automation-Report
No ratings yet
Home Automation-Report
57 pages
Economics Board Paper 2024-25
No ratings yet
Economics Board Paper 2024-25
11 pages
Digital Revolution
No ratings yet
Digital Revolution
20 pages
Data Collection (1)
No ratings yet
Data Collection (1)
4 pages
Siemens Comfort Panel System Diagnostics
No ratings yet
Siemens Comfort Panel System Diagnostics
26 pages
08/06/2021 08:02 PDF - Js Viewer
100% (1)
08/06/2021 08:02 PDF - Js Viewer
47 pages
LTspice Tutorial - The Complete Course
No ratings yet
LTspice Tutorial - The Complete Course
2 pages
Operations
No ratings yet
Operations
20 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Solved SSC CHSL 12th April 2021 Shift-2 Paper With Solutions
No ratings yet
Solved SSC CHSL 12th April 2021 Shift-2 Paper With Solutions
33 pages
Simple and Accurate Measurements: For Results at Your Fingertips
No ratings yet
Simple and Accurate Measurements: For Results at Your Fingertips
16 pages
Babu Resume-Mtech-Alteryx-6.5 Years
100% (1)
Babu Resume-Mtech-Alteryx-6.5 Years
3 pages
IK122 Pro Manual v1.0
No ratings yet
IK122 Pro Manual v1.0
13 pages
Log
No ratings yet
Log
7 pages
Declaration by Paper Setter: Scrutiny of The Question Paper
No ratings yet
Declaration by Paper Setter: Scrutiny of The Question Paper
5 pages
LSB Based Text and Image Steganography Using AES Algorithm
No ratings yet
LSB Based Text and Image Steganography Using AES Algorithm
7 pages
Updated Internship Project Report
No ratings yet
Updated Internship Project Report
20 pages
370A598 Uni Sim Quick Start Guide 12.09.2019 PRINT - Rev 2.2 - WEB
No ratings yet
370A598 Uni Sim Quick Start Guide 12.09.2019 PRINT - Rev 2.2 - WEB
2 pages
Front End Guide
No ratings yet
Front End Guide
3 pages
Ansible Modules For Dell EMC PowerStore
No ratings yet
Ansible Modules For Dell EMC PowerStore
3 pages
Chapter2 Short QAs
No ratings yet
Chapter2 Short QAs
4 pages
Number_Guessing_Game_Project
No ratings yet
Number_Guessing_Game_Project
2 pages
Google - Examcollection.cloud Digital Leader - Pdf.download.2024 May 01.by - Luther.162q.vce
No ratings yet
Google - Examcollection.cloud Digital Leader - Pdf.download.2024 May 01.by - Luther.162q.vce
6 pages
Ritik CV Final 1 240104 191911
No ratings yet
Ritik CV Final 1 240104 191911
2 pages
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
From Everand
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Maicon Melo Alves
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)

Group 24 Miniproject

Uploaded by

Group 24 Miniproject

Uploaded by

Salary prediction using Machine learning

Karan singh TE-C , roll no 23

Rahul Yadav TE-C, roll no 61

Amay sharma TE-C, roll no 71

Mrs. Shiwani Gupta

for the subject

T.E. COMPUTER ENGINEERING

Karan singh TE-C , roll no 23

Rahul Yadav TE-C, roll no 61

Amay sharma TE-C, roll no 71

Have satisfactorily completed the requirements of the T.E Capstone

Salary prediction using Machine

Shiwani Gupta Dr. Harshali Patil

Subject In-charge HOD COMP

1. Signature: …………………. 2. Signature: ………………….

Chapter 1. Introduction ...............................................................................................................

Chapter 2. Problem Definition...................................................................................................

Chapter 3. Technology Used........................................................................................................

3.1 Hardware and Software Requirement ............................................................................. .

3.2 Description of libraries used ............................................................................................

4.1 Data Description ..............................................................................................................

4.2 Data Preparation...............................................................................................................

4.3 Choice of Model ..............................................................................................................

4.4 Model Training and Validation........................................................................................

4.3 Hyperparameter Tuning and Evaluation............................................................................

Chapter 5. Result and Analysis ...................................................................................................

Chapter 6. Conclusion and Future Scope.................................................................................

Chapter 7. Case Study..............................................................................................................

7.1 Problem Definition…………………………………………………………………….

7.3 Executive Summary…………………………………………………………………….

7.5 Future Scope…………………………………………………………………………...

Figure 1. Dataset consisting of various parameters.............................................................................09

Figure 2. Data Preprocessing...............................................................................................................10

Figure 3. Data visualization.................................................................................................................15

Figure 4. Frontend (working of project)..............................................................................................16

Figure 5. Model Architecture designed...............................................................................................16

Figure 6. Recommendation system working.......................................................................................20

Information Used To Predict Salaries Years Experience: How many years of

model.py trains and saves the model to the disk.

4 GB RAM , 4+ CORES , SSD storage .

IDE - Jupyter Notebook , Flask

Python Data Science S/W stack (pip, conda)

Pandas — For handling structured data

4.1 Data Description :-

So we decided to keep only those parameters which were essential.

4.3 Choice of model:-

4.3.1 Proposed Work

4.3.2 Model training and validation:-

5.RESULT AND ANALYSIS

6.CONCLUSION and FUTURE SCOPE

6.2 FUTURE SCOPE

7.1 Problem Definition :-

At present, there is a significant volume of research on prediction models of energy consumption

7.3 Executive Summary-

Support Vector Regression (SVR)

7.5 :- Future Scope:-

2. J. N. Barkenbus, “Eco-driving: an overlooked climate change initiative,” Energy Policy,

3. T. Hiraoka, Y. Terakado, S. Matsumoto, and S. Yamabe, “Quantitative evaluation of eco-

8. D. A. Johnson and M. M. Trivedi, “Driving style recognition using a smartphone as a

9. G. Guido, A. Vitale, V. Astarita, F. Saccomanno, V. P. Giofré, and V. Gallelli, “Estimation

You might also like