0% found this document useful (0 votes)

15 views13 pages

Medicial

This document discusses the application of machine learning (ML) techniques to predict health insurance costs based on various factors such as age, sex, and BMI. It evaluates the performance of nine regression models, with XGBoost Regression achieving the highest accuracy. The study highlights the potential of ML to streamline insurance cost analysis and suggests future improvements through web application development.

Uploaded by

Balivada Chiru venkata satyasai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

Medicial

Uploaded by

Balivada Chiru venkata satyasai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

ABSTRACT

Insurance is a policy that reduces or eliminates the expenses associated with

decreasing returns brought on by various risks. The price of insurance is
influenced by a number of factors. These elements have an impact on how
insurance plans are developed. The efficiency of insurance policy terms in the
insurance industry can be enhanced using machine learning (ML). In this
work, we use individual and local health data to forecast insurance amounts for
various categories of people. To compare the effectiveness of these algorithms,
nine regression models—Linear Regression, XGBoost Regression, Lasso
Regression, Random Forest Regression, Ridge Regression, Decision Tree
Regression, KNN Model, Support Vector Regression, and Gradient Boosting
Regression—were utilized. The models were trained using the dataset, and
some predictions were then made using the training data. The model was then
put to the test and confirmed by contrasting the actual data with what was
predicted to be abundant. These models' accuracy was compared subsequently.
The optimal method to the XGBoost MAE 2381.567, MSE 19806356.6067,
RMSE value 4450.4433, and R
squared value of 0.8681 is provided in this report. Gradient Boosting and
Random Forest, with R squared values of 0.8679 and 0.8382, respectively, are
two further top models.

Keywords— Healthcare; Insurance; Regression, Machine Learning,

Prediction, Data analysis.
INTRODUCTION

A sector that is quickly growing globally is digital health. The number of

digital health businesses has doubled globally during the last five years.
Health insurance faces two significant obstacles in industrialized nations:
growing health care costs and an increase in the number of people without
coverage. A broad-based political movement to address these issues is
emerging as a result of this power. Governments in the area have pledged
hundreds of millions of dollars to advance the digital health industry.
Individual health insurance plays a crucial role in the healthcare system,
particularly for people with rare diseases, for whom medical and preventative
insurance can help cut down on treatment expenses. The world in which we live
is a dangerous and unknowable place. houses, companies, buildings. These
dangers include the potential for disease, death, and loss of assets or goods.
People's happiness and health are fundamental to their existence. However, as
risks cannot always be avoided, the financial industry has developed a number
of products to shield people and businesses from them. These products
employ money to make up for the risks; as a result, the costs of some risks are
reduced or even eliminated. A crucial component of the medical industry is
medical insurance. On the other hand, it is challenging to predict medical
spending because most of the money comes from patients with rare
conditions. Numerous ML algorithms and deep learning approaches are used
for data prediction. The factors of training time and accuracy are looked at.
The bulk of machine learning algorithms only require a brief time of training.
However, the prediction results from these techniques are not particularly
accurate. Deep learning models can also find hidden patterns, but their usage in
real-time is constrained by the training period.

BACKGROUND
A necessary component of the medical industry is medical insurance. On the
other hand, it is challenging to predict medical spending because most of the
money comes from patients. Several ML algorithms and deep learning
techniques are used for data prediction. The factors of training time and
accuracy are evaluated. The lot of machine learning algorithms only require a
brief time of training. However, the prediction results from these approaches are
not very accurate. Deep learning models can also find hidden patterns, but their
usage in real-time is constrained by the training period.
Several regression models were employed implemented in this report,
including Linear Regression, XGBoost Regression, Lasso Regression,
Random Forest Regression, Ridge Regression, Decision Tree Regression,
KNN Model, Support Vector Regression, and Gradient Boosting Regression.
The XGBoost and Gradient Boosting Regression are discovered to calculate
with the highest accuracy of about 86 percent or more. The major objective of
this study is to introduce a new methodology of estimating insurance costs.
LITERATURE REVIEW
This section demonstrates the research being done on information exploration
and machine learning methods. Several articles have addressed the topic of
claim prediction. "Utilizing telematics data to forecast automobile insurance
claims," Jessica Pesantez-Narvaez claimed to be the author on 2019. In a
relatively small number of cases, this study compared the effectiveness of
logistic regression and XGBoost strategies in predicting the occurrence of
accident states. The results and vibrations indicated that logistic regression is
a superior model to XGBoost for the reasons of its interpretability and
predictability.
Without taking into account predicted cost and claim scope, the research listed
above identify claims problems. However, to predict the healthcare costs, we
use advanced statistical methods, ML techniques, and deep neural networks.

METHODOLOGY

A. Dataset Description
We obtained the data set from the Kaggle website [5] in order to calculate the
cost of this model prediction. The data set is split into two categories: training
data and test data, and it has seven attributes as listed in table I. The majority
of the data used is for testing, with just around 20% being used for training.
The training data set is used to create a model that forecasts medical
insurance costs by year, and the test data set is used to assess the regression
model. The table below contains the dataset description.

Table 1: Overview of the Dataset

There were 1338 rows and 7 columns in our data set. The charges variable,
which has a float value, is our aim. Maximum number of individuals in our
dataset range in age from 18 to 22.5, and the majority of them are male. Few
have more than three children, and the majority of them have a BMI between
29.26 and 31.16. In this dataset, four main regions are taken into account:
northeast, northwest, southeast, and southwest. The largest concentration of
smokers is in the southeast, where 1064 out of 1338 people smoke. We'll
investigate our information to determine how the various factors are related.
Our target column in this instance is "charges," which is dependent upon every
other column. We shall first examine our dataset's statistical metrics.

Table 2: Statistical Measurement

B.Data Analysis
There were 1338 rows and 7 columns in our data set. The charges variable,
which has a float value, is our aim. Maximum number of individuals in our
dataset range in age from 18 to 22.5, and the majority of them are male. Few
have more than three children, and the majority of them have a BMI between
29.26 and 31.16. In this dataset, four main regions are taken into account:
northeast, northwest, southeast, and southwest. The largest concentration of
smokers is in the southeast, where 1064 out of 1338 people smoke. Here are
some data visualizations.
Figure-1: Distribution of age value

Figure-2: Sex Distribution

Figure-3: BMI Distribution

Figure-4: Children Counter
In order to determine the link between the variables, we will evaluate our
data. Charges, which is dependent on all the other columns in this example, is
our target column. The statistical metrics of our dataset will first be analyzed.
Figure-5: Checking Smoker and NON-Smoke
Figure-6: Distribution of Charge Value.
Only numerical values are presented. Standard deviations and average values for categorical
variables are absent. In order to pre-process those features, later. The median number is
higher than the average in the "charges" column. It implies that the price of health insurance
is unfairly skewed. Once we make those things visible, we will clearly grasp this. We
therefore begin by displaying the charge column's distribution.

Figure-7: Visualization the relationship between two variables

Figure-8: Plotting age, charges, sex

Figure-9: FacetGrid Children and Charges

C.Data Pre-processing
Three columns are numerical and three are categorical. Our machine
learning model cannot suit the category values because computers
cannot understand this text value. Therefore, we will give those
categories qualities numerical labels. We change "female" to 1 and
"male" to 0 in the "sex" field. We also change the other two columns
to have numerical values. We display our results for conversion in the
table below.

D.Model Specification
The goal of the study is to forecast insurance costs based on a variety
of factors, including age, sex, the number of children, location, BMI,
and whether or not a person smokes. All of these characteristics aid
in our ability to calculate the price of health insurance. Several
regression models are used in this study to calculate the cost of
health insurance. There are two portions to the data. Model testing is
done in the other portion, whereas model training is done in the first.
Data is used for training 80% of the time and testing 20%. We
compute the Mean Absolute Error (MAE), Root Mean Squared Error
(RMSE), R-squared value (RE), and Mean Squared Error (MSE) for
each model to see how accurate it is in predicting costs. We compare
them after generating those numbers for each model since it shows
us the accurate result.

Table 3: Model Performance

Regre R MAE RMSE
ssion
Model square
s
d
Lin 0.744 4267.2 6191.6
ear 7 138 908
Mo
del
XGBo 0.868 2381.5 4450.4
ost 1 670 333
Regre
ssion
Lasso 0.744 4267.1 6191.7
Regre 7 646 253
ssion
Rand 0.837 2747.4 4944.7
om 1 557 328
Fore
st
Regressi
on
Ridge 0.744 4273.4 6190.8
Regre 8 540 000
ssion
Decisi 0.700 3324.3 6708.4
on 3 656 718
Tree
Regre
ssion
K-
Neare 0.039 8592.5 12010.8
st 4 456 927
Neigh
Our data is bors first obtained
via Kaggle. Sup -0.099 6401.6 12851.5 Our dataset is
then port 428 588 visualization
tools, we Vect analyze our
data. The data or is then cleaned
Regressi
such that it on exactly
matches the Gradie 0.867 2383.9 4453.8 machine
learning nt 9 140 285 model. We
Boosti
ng
Regre
ssion
then use our training
data to apply regression
techniques. Our model
will be ready for cost
forecasting after the
data has been tested. The
flowchart that follows
illustrates the entire
process.
I. RESULTS &
DISCUSSION
Table-4 displays our top and bottom regression models. We can anticipate insurance
costs using the model that performs best, according to the findings. In our situation,
XGBoost Regression is the best
regression model while K-
Nearest Neighbors is the worst.
Anyone may calculate their
insurance expenses using the best
model.

Figure-10: Predicted Cost using XGBBoost Regression

Figure-11: Predicted Cost Using K-Nearest Neighbors.
.
Figure-13: Medical Insurance cost prediction system demo result

CONCLUSION
In order to forecast health insurance prices based on provided factors in a
Kaggle site medical cost individual data set, the study combines ML
regression models. Table IV is a list of the outcomes. By predicting insurance
rates based on a variety of factors, insurance policy firms may attract
consumers and save time. Machine learning may significantly reduce these
individual efforts in price analysis since ML models can compute costs
quickly while doing so would take a person a long time. Large volumes of
data can also be handled via machine learning techniques. The work might be
improved in the future by building a web application based on the XGBoost or
Gradient Boosting algorithm and using a larger dataset than that used in this
study.

REFERENCES
[1] "Digital Health 150: The Digital Health Startups Transforming the Future
of Healthcare | CB Insights Research", CB Insights Research, 2022.
[Online]. Available: https://www.cbinsights.com/research/report/digital-
health-startups-redefining-healthcare. [Accessed: 10- Sep- 2022]
[2] J. H. Lee, “Pricing and reimbursement pathways of new ophan drugs in
South Korea: A longitudinal comparison. in healthcare,” Multidisciplinary
Digital Publishing Institute, vol. 9, no. 3, pp. 296, 2021.

[3] Gupta, S., & Tripathi, P. (2016, February). An emerging trend of big data
analytics with health insurance in India. In 2016 International Conference
on Innovation and Challenges in Cyber Security (ICICCS-INBUSH) (pp.
64-69). IEEE
[4] N. Shakhovska, S. Fedushko, I. Shvorob and Y. Syerov, “Development of
mobile system for medical recommendations,” Procedia Computer
Science, vol. 155, pp. 43–50, 2019
[5] Medical Cost Personal Datasets:
https://www.kaggle.com/datasets/mirichoi0218/insuran ce
[6] J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, "Predicting Motor
Insurance Claims Using Telematics Data—XGBoost versus Logistic
Regression, " Risks, vol. 7, no. 2, p. 70, Jun. 2019, doi:
10.3390/risks7020070.
[7] M. hanafy and O. Mahmoud, "Predict Health Insurance Cost by using
Machine Learning and DNN Regression Models", International Journal of
Innovative Technology and Exploring Engineering, vol. 10, no. 3, pp. 137-
143, 2021. Doi: 10.35940/ijitee.c8364.0110321

Mini - Project - Report Health Insurance Price Prediction
50% (2)
Mini - Project - Report Health Insurance Price Prediction
33 pages
Medical Insurance Cost Prediction
100% (2)
Medical Insurance Cost Prediction
16 pages
Analyzing The Amount of Health Insurance Premiums Using Multiple Linear Regression Models
100% (1)
Analyzing The Amount of Health Insurance Premiums Using Multiple Linear Regression Models
24 pages
Medical Insurance Cost Prediction Report Full
100% (1)
Medical Insurance Cost Prediction Report Full
50 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
48 pages
Kafuria Angela D.
No ratings yet
Kafuria Angela D.
55 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Internship Documnet - 1
No ratings yet
Internship Documnet - 1
34 pages
Report
No ratings yet
Report
35 pages
Accurate Predictionof Medical Insurance Pricesusing Machine Learningin Python
No ratings yet
Accurate Predictionof Medical Insurance Pricesusing Machine Learningin Python
28 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
7 pages
Previewpdf
No ratings yet
Previewpdf
45 pages
Latest Trends in Renewable Energy Technologies: Shelly Vadhera Bhimrao S. Umre Akhtar Kalam
No ratings yet
Latest Trends in Renewable Energy Technologies: Shelly Vadhera Bhimrao S. Umre Akhtar Kalam
450 pages
Machine Learning - Project
No ratings yet
Machine Learning - Project
26 pages
Predicting Disease With Machine Learning
No ratings yet
Predicting Disease With Machine Learning
20 pages
Health Insurance Cost Prediction Using IBM Watson
No ratings yet
Health Insurance Cost Prediction Using IBM Watson
27 pages
MLreview Article
No ratings yet
MLreview Article
20 pages
Golu
No ratings yet
Golu
25 pages
Cap 2 Report
No ratings yet
Cap 2 Report
26 pages
Electra X2 1 4 1 InstallGuide RevB
No ratings yet
Electra X2 1 4 1 InstallGuide RevB
62 pages
Computer Graphics Notes
100% (1)
Computer Graphics Notes
153 pages
Medical Insurance Cost
No ratings yet
Medical Insurance Cost
12 pages
PBL Sem 3 Documentation
No ratings yet
PBL Sem 3 Documentation
20 pages
Project
No ratings yet
Project
18 pages
Jds 1022
No ratings yet
Jds 1022
24 pages
0 Complete Operator Training Manual
No ratings yet
0 Complete Operator Training Manual
416 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
16 pages
A Computational Intelligence Approach For Predicti
No ratings yet
A Computational Intelligence Approach For Predicti
13 pages
Research Paper
No ratings yet
Research Paper
14 pages
iTNC 530 - DMC64V - V1.03 - GB
No ratings yet
iTNC 530 - DMC64V - V1.03 - GB
50 pages
CAPESTONE
No ratings yet
CAPESTONE
16 pages
Subpart R - Steel Erection
No ratings yet
Subpart R - Steel Erection
74 pages
Abdul Qadir
No ratings yet
Abdul Qadir
17 pages
Medical Insurance Price Prediction Machine Learning
No ratings yet
Medical Insurance Price Prediction Machine Learning
12 pages
Dental
No ratings yet
Dental
10 pages
Atm Processing
50% (10)
Atm Processing
25 pages
Ist 407 Final Paper
No ratings yet
Ist 407 Final Paper
6 pages
(2024 Issue) DIRDC2-301-PUB24 - 319 - Full Paper - JES - AL
No ratings yet
(2024 Issue) DIRDC2-301-PUB24 - 319 - Full Paper - JES - AL
10 pages
Final Research Paper
No ratings yet
Final Research Paper
10 pages
P4 Project Report
No ratings yet
P4 Project Report
28 pages
201-15-3650,3032-Project Presentation Slide
No ratings yet
201-15-3650,3032-Project Presentation Slide
9 pages
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
0% (1)
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
7 pages
An Ensemble Methods For Medical Insurance Costs Prediction Task
No ratings yet
An Ensemble Methods For Medical Insurance Costs Prediction Task
16 pages
Implementation of Medical Insurance Price Prediction System Using Regression Algorithms
No ratings yet
Implementation of Medical Insurance Price Prediction System Using Regression Algorithms
7 pages
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
No ratings yet
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
7 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
Exp 1
No ratings yet
Exp 1
7 pages
Exer5 Cabugnason
No ratings yet
Exer5 Cabugnason
7 pages
A Project Report
No ratings yet
A Project Report
5 pages
Business Analytics Project Report: Deloitte Insurance, Pricing Strategy Development
No ratings yet
Business Analytics Project Report: Deloitte Insurance, Pricing Strategy Development
4 pages
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
No ratings yet
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
6 pages
SSRN Id4366801
No ratings yet
SSRN Id4366801
4 pages
A Study On Predictive Algorithms in Heal
No ratings yet
A Study On Predictive Algorithms in Heal
7 pages
Toth 2021
No ratings yet
Toth 2021
11 pages
Algorithmic Prediction of Health Care Costs and Di
No ratings yet
Algorithmic Prediction of Health Care Costs and Di
12 pages
Health Insurance Amount Prediction: Nidhi Bhardwaj, Rishabh Anand
No ratings yet
Health Insurance Amount Prediction: Nidhi Bhardwaj, Rishabh Anand
4 pages
Machine Learning in Healthcare Management For Medical Insurance Cost Prediction
No ratings yet
Machine Learning in Healthcare Management For Medical Insurance Cost Prediction
11 pages
Medical
No ratings yet
Medical
4 pages
DM Report
No ratings yet
DM Report
4 pages
Prediction of Health Insurance111 Price U111sing Machine Learning Algorithms
No ratings yet
Prediction of Health Insurance111 Price U111sing Machine Learning Algorithms
6 pages
REASEARCH
No ratings yet
REASEARCH
4 pages
Takeuchi Tb014 Tb016 Parts Sec Wat
100% (60)
Takeuchi Tb014 Tb016 Parts Sec Wat
20 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Project Abstract01
No ratings yet
Project Abstract01
3 pages
Blackbook Product Life Cycle A Nokia Mobile
100% (1)
Blackbook Product Life Cycle A Nokia Mobile
58 pages
Understanding The Data: Objective
No ratings yet
Understanding The Data: Objective
1 page
Es Module-5 Part1
No ratings yet
Es Module-5 Part1
69 pages
Intro and Data Description
No ratings yet
Intro and Data Description
1 page
Ey Reimagining Manufacturing and Supply Chains Investing in Southeast Asia
No ratings yet
Ey Reimagining Manufacturing and Supply Chains Investing in Southeast Asia
52 pages
Guidelines For Vendor Pre-Qualification
No ratings yet
Guidelines For Vendor Pre-Qualification
7 pages
Projetech TTC OOB Asset Management Overview in Maximo 7 6 1 1
No ratings yet
Projetech TTC OOB Asset Management Overview in Maximo 7 6 1 1
47 pages
Predictive Analytics in Health Care Using Machine Learningtools and Techniques
No ratings yet
Predictive Analytics in Health Care Using Machine Learningtools and Techniques
1 page
The State of Marketing Attribution: Shane Murphy
No ratings yet
The State of Marketing Attribution: Shane Murphy
44 pages
Fortified Rice Machine 400kg Per Hour
No ratings yet
Fortified Rice Machine 400kg Per Hour
11 pages
Final - Advanced OS Presentation GroupE
No ratings yet
Final - Advanced OS Presentation GroupE
21 pages
Unlocking Technology Iveco
No ratings yet
Unlocking Technology Iveco
14 pages
Cse232 Exceptions and String Stream
No ratings yet
Cse232 Exceptions and String Stream
33 pages
Linear Programming - Graphical Method
No ratings yet
Linear Programming - Graphical Method
17 pages
Local Storage Data: Android Development With Kotlin v1.0
No ratings yet
Local Storage Data: Android Development With Kotlin v1.0
28 pages
Hero, Hub, Hygein Examples
No ratings yet
Hero, Hub, Hygein Examples
5 pages
Basic 2D Floor Master Sheet
No ratings yet
Basic 2D Floor Master Sheet
4 pages
Tutorial Yoshikoder 0.36 (Incompleto)
No ratings yet
Tutorial Yoshikoder 0.36 (Incompleto)
7 pages
Basic Electrical Engineering 22EEC01
No ratings yet
Basic Electrical Engineering 22EEC01
2 pages
SITUATIONAL AWARENESS The Decade Ahead
No ratings yet
SITUATIONAL AWARENESS The Decade Ahead
1 page
CLT 186 X
No ratings yet
CLT 186 X
8 pages
Journey Builder Entry Sources
No ratings yet
Journey Builder Entry Sources
3 pages
Sample MIS Report Questions
No ratings yet
Sample MIS Report Questions
2 pages
ID Centers-August 2021
No ratings yet
ID Centers-August 2021
2 pages
Thermostat
No ratings yet
Thermostat
2 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet

Medicial

Uploaded by

Medicial

Uploaded by

ABSTRACT

Insurance is a policy that reduces or eliminates the expenses associated with

Keywords— Healthcare; Insurance; Regression, Machine Learning,

A sector that is quickly growing globally is digital health. The number of

Table 1: Overview of the Dataset

Table 2: Statistical Measurement

Figure-2: Sex Distribution

Figure-3: BMI Distribution

Figure-7: Visualization the relationship between two variables

Figure-8: Plotting age, charges, sex

Figure-9: FacetGrid Children and Charges

Table 3: Model Performance

Figure-10: Predicted Cost using XGBBoost Regression

You might also like