0% found this document useful (0 votes)
6 views7 pages

Conference Template A4

Uploaded by

saumya78198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Conference Template A4

Uploaded by

saumya78198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Diabetes Mellitus: Early Detection using Machine

Learning
1st Satish Singh Mekale 2nd Maumita Chakraborty 3rd Chiradeep Mukherjee
Computer Science & Engineering Computer Science & Engineering Computer Science & Engieering
Institute of Engineering & Management Institute of Engineering & Management Institute of Engineering & Management
Kolkata, University of Engineering and Kolkata, University of Engineering and Kolkata, University of Engineering and
Management Management Management
Kolkata, India Kolkata, India
Kolkata, India
[email protected] [email protected]
[email protected]

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Abstract— This research investigates the use of insulin production to be inconsistent. This kind of diabetes
hyperparameter optimization and ensemble learning in the affects 85–90% of all people with a diagnosis of the
early detection of diabetes mellitus (DM). The evaluation of condition. Type 2 diabetes symptoms usually manifest later
existing methodologies for DM prediction is carried out via an
in life, but type 1 diabetes symptoms often manifest during
exhaustive literature analysis, with a focus on highlighting
their strengths and drawbacks. The primary goal is to increase
childhood or adolescent. On the other hand, the clinical
the precision of DM prediction by building a robust ensemble manifestations and duration of the illness may vary widely,
model that incorporates many machine learning techniques. and some individuals might not initially exhibit symptoms
The proposed technique optimizes feature selection and that are consistent with either type 1 or type 2 diabetes.
hyperparameter tuning via the use of evolutionary algorithms, Late-onset instances of type 1 diabetes develop more slowly
leading to increased performance. In order to evaluate the than early-onset cases, although the disease may hit at any
model, it must be compared to many techniques utilizing age. [6]however, the age at which type 2 diabetes is
standards like as accuracy, precision, recall, along with F1 diagnosed is lower than it has ever been, even throughout
score. It is expected that combining genetic algorithm
childhood and adolescence. There are certain situations in
optimization with ensemble learning would significantly
improve DM prediction accuracy. At 90%, the accuracy of the which it may not be possible to arrive at a precise diagnosis
ensemble model was far higher than that of the individual until after a certain length of time has elapsed.
models, which included Random Forest (82%), Logistic Both forms of diabetes are characterised by an increase in
Regression (80%), along with Support Vector Machine (54%). glucose synthesis in the liver and a reduction in glucose
This study advances the area of diabetes prediction and absorption in the muscles and adipose tissue when the
highlights the need of feature selection, hyperparameter condition is not well managed. Individuals who have type 1
tuning, and rigorous ensemble formation in machine learning
diabetes are at danger of having diabetic ketoacidosis, which
models used in healthcare.
is a condition that is brought on by lipolysis, which is a
Keywords— Diabetes Mellitus, Early detection, Ensemble serious disorder that occurs when fat cells are broken down.
learning, Genetic algorithm Individuals who have type 2 diabetes often experience a
slowdown in the breakdown of fat and the creation of
I. INTRODUCTION ketones due to the insulin that is still active
Diabetes is a chronic metabolic disorder affecting the body's Changes in lifestyle, especially in emerging nations, and a
ability to convert food into energy. One feature that rise in type 1 diabetes in children are making diabetes more
distinguishes this disease is high blood glucose levels, widespread globally. [7] Diabetes is a condition that is very
which over time may seriously damage the cardiovascular prevalent across all levels of athletic competition, and its
system, blood vessels, eyes, kidneys, and nerves if ignored. occurrence is growing increasing the number of elite
[1]Type 1 and type 2, along with gestational diabetes, are athletes rises and type 2 diabetes becomes more common in
the three subtypes of the general disease known as diabetes younger individuals. Furthermore, as a result of
[2]. advancements in treatment choices for both forms of
Type 1 diabetes is distinguished by the body producing too diabetes, a greater number of diabetic individuals are now
little insulin, and type 2 diabetes is characterized by able to participate in competitions that are considered to be
insufficient use or manufacture of insulin. [3] Some of the of the highest calibre.
signs of diabetes include an increase in thirst and urination, .
a loss of weight for no apparent reason, and eyesight
impairment. Methods that are shown to be beneficial in A. OBJECTIVES
reducing problems associated with type 2 diabetes are i. The main objective is to develop a prediction
available. Some of them include engaging in regular model towards the early detection of diabetes
physical exercise, consuming a diet rich in nutrients, mellitus using machine learning techniques.
abstaining from smoking, and maintaining healthy levels of ii. The prediction capability of the model may be
blood pressure and cholesterol. (Ghosh et al., 2021) enhanced by using a Genetic Algorithm to
Treatment for diabetes includes regulating blood pressure, select the most relevant features from the dataset.
cholesterol, and blood glucose levels; it also involves eating iii. Integrating many models employing ensemble
a balanced diet and doing frequent exercise. learning techniques to boost forecast precision and
Unusually high blood sugar levels are a hallmark of the fortify the predictive model's resilience
chronic hormonal disorder known as diabetes mellitus. .
[4]This is brought on by a relative or absolute insulin
deficiency in the body. Still, the large majority of diabetes II. BACKGROUND
cases are classified as either type 1 or type 2 diabetes.
Diabetic reasons might be various. A. Genetic Algorithm
The immune system destroying pancreatic cells that produce
One search-based optimization technique is the Genetic
insulin is the hallmark of diabetes type 1. Consequently, this
Algorithm or GA for short that is founded on the theories of
leads to an inadequate supply of insulin, which in turn
genetics and natural selection. It is widely used to identify
causes hyperglycaemia, which is an elevated amount of
optimum or near-optimal solutions to challenging problems,
glucose in the blood. About ten to fifteen percent of those
the resolution of which would normally require a lifetime of
who have diabetes are affected with type 1 diabetes. [5]
effort. It is often used to address optimization problems as
Diabetes type 2 is distinguished from type 1 diabetes by the
well as in the domains of research and machine learning.
presence of peripheral tissue resistance, which causes
•The genetic algorithm determines its results by considering training and testing sets. [13]Research indicates that five-
the chromosomal behaviour and genetic structure of the along with ten-fold cross-validation training of deep neural
population. The underlying ideas of genetic algorithms are networks' characteristics may be used to diagnose diabetes.
as follows. Pima Indian Diabetes (PID) data collection is available in
• Every chromosome provides a unique solution to the issue. the UCI machine learning repository. Experimental results
For this reason, every chromosome makes up the demonstrate that the proposed method performs well in five-
population. fold cross-validation. [14]the study mainly focused on the
•A fitness function may be used to characterize each use of fuzzy support vector machines and F-Score feature
member of the population. Consequently, the solution is in selection to identify and classify DM. The prediction of
raising one's degree of physical fitness. The population's diabetic individuals achieved an astounding 89.02%
fittest individuals are chosen to be the parents of the next accuracy rate. Furthermore, the method optimizes. Accurate
generation's children. This guarantees the population's fuzzy rule counting while maintaining. [15]In locations
continued prosperity. where medical specialists are few, research offers a
•The offspring that result from the mutation will possess proactive diagnostic strategy for diabetes mellitus (DM) that
traits from both their parents. Any change, no matter how attempts to support or augment early disease diagnosis. The
little, to a genetic algorithms structure is called a mutation. results of the experiment show how the diabetes dataset, that
was established, has useful data for predicting the existence
of diabetes mellitus.
B. Ensemble Learning
In order to achieve superior prediction performance than IV. METHODOLOGY
that which could be gained from just one of the component This study's main goal is to predict the early identification
learning algorithms working alone, ensemble approaches of diabetes mellitus. Our approach involves employing a
make use of numerous learning algorithms. Unlike statistical Genetic Algorithm for feature selection, coupled with the
ensembles in statistical mechanics, whose possibilities are refinement of correlated features through hyper parameter
usually infinite, machine learning ensembles only include a tuning using Grid Search CV. Additionally; we leverage
specific, limited collection of possible models; nonetheless, ensemble learning techniques to enhance the accuracy of our
they usually permit a considerably greater degree of predictions.
structure that can exist among those options. Statistical
ensembles in statistical mechanics are typically used to Data set Description
study random phenomena This dataset was originally sourced from the National
III. LITERATURE REVIEW Institutes of Diabetes, Digestive, and Kidney Diseases. The
dataset aims to provide a diagnostic prediction about the
[8] Researched is to use important features, build a machine
learning prediction system, and find the best classifier to presence or absence of diabetes in a patient by using certain
match clinical outcomes. The simplest Bayesian model has diagnostic metrics included within it. These instances were
the maximum accuracy of 82.30%. In addition, the work picked under a number of constraints from a larger database.
generalizes dataset feature selection to improve More exactly, every patient is an older than 21-year-old
classification accuracy. [9]Six domains—datasets, data Pima Indian woman.
preparation, feature extraction, machine learning-based
https://www.kaggle.com/datasets/uciml/pima-indians-
identification, classification, diagnosis, AI-based help, and
diabetes-database/code.
performance evaluation—were investigated in connection to
This data was obtained via direct surveys from patients at
the detection, diagnosis, and self-management of diabetes
the Sylhet Diabetes Hospital in Sylhet, Bangladesh, with
mellitus. Academics working in this topic may find this
website useful since it provides detailed information about physician approval. This dataset includes patients with
DM detection along with self-management strategies. [10] newly diagnosed diabetes or those who are at risk for the
Studied uses machine learning to examine illness risk signs and symptoms of the condition.
variables. Machine learning algorithms efficiently extract https://archive.ics.uci.edu/dataset/529/
information by creating predictive models from diabetes early+stage+diabetes+risk+prediction+dataset
patients' diagnostic medical datasets. These experiments
indicate that C4.5 decision tree outperformed other machine Pre-processing:
learning algorithms in accuracy. [11] This research Missing values are found via data pre-processing, which
presented a machine learning-based diabetes diagnostic then determines whether to impute or remove them.
system. The diabetic data set, an epidemiological dataset Normalizing or standardizing numerical characteristics
constructed from patient history, was used to assess the ensures scale homogeneity. Model training may need
proposed method. The experimental findings statistical numerical encoding of categorical variables. Feature
analysis showed that the suggested strategy may identify engineering creates or transforms features to improve model
diabetes in e-healthcare. [12]employing deep learning neural prediction.
networks recognize diabetic retinopathy in fundus photos.
The research simplifies neural learning using AlexNet CNN. Feature Selection:
The MESSIDOR database has 1200 fundus images. The most relevant parameters for predicting the
Training and testing results formed a confusion matrix. This development of diabetes mellitus are identified during the
research reported 99.3% and 88.3% CNN accuracy for
characteristic selection stage using genetic algorithms (GA). Sensitivity: Recall is the percentage of accurately detected
This requires continuously examining a range of feature positive labels by our computer system; it is also frequently
combinations. Their predictive power assessed, and called sensitivity.
populations must evolve in order to rank the most
informative subsets. The system refines feature selection via (2)
genetic processes including crossover and mutation, finally Precision: Considering the overall number of correct
choosing the subset of characteristics that provide the most projections, one may determine the accuracy of an outlook.
contribution to the prediction job. Our goals are to decrease Prognostic value is another name for this idea.
over fitting, increase model efficiency, and raise diabetes
prediction accuracy by using GA-based feature selection. (3)
F1-Score:A single score called the F1-score combines recall
Train test split: and accuracy.
The user's split ratio determines how the data set is split into
the training and testing sets after all pre-processing and (4)
feature selection using GA is finished. Afterwards, the Specificity: The algorithm has properly classified the
models will be trained on this divided train data, and the negative as specificity.
models will be tested on the test data.

Model Building:
We will develop the model using an Ensemble Learning
technique, which combines many basic models like logistic
regression, decision trees, and support vector machines. In
order to increase overall forecast accuracy and durability,
we will aggregate the predictions of various models using
strategies like bagging, boosting, and stacking. Our
Ensemble Learning approach seeks to offer a strong tool for
early identification of diabetes mellitus by combining the
insights of various models, resulting in proactive healthcare
interventions and better patient outcomes.

Hyperparameter Tuning: Figure 1 Flowchart


Hyperparameter tuning uses methods such as Grid Search
CV to improve machine learning model hyperparameter.
This entails carefully testing with different hyperparameter
settings to determine which combination produces the
optimum performance for each model. By fine-tuning these
parameters, the models may attain higher accuracy and
generalization on previously unknown data, increasing their A. Abbreviations and Acronyms
overall performance in prediction tasks. Define abbreviations and acronyms the first time they are
used in the text, even after they have been defined in the
Train the network: abstract. Abbreviations such as IEEE, SI, MKS, CGS, sc, dc,
The proposed ensemble learning model is trained using the and rms do not have to be defined. Do not use abbreviations
train data, as are other models such logistic regression, in the title or heads unless they are unavoidable.
random forest, and SVM. We evaluate and compare the
performance of the suggested model with that of the existing B. Units
models.  Use either SI (MKS) or CGS as primary units. (SI
Following metrics may be used to assess the model's units are encouraged.) English units may be used as
performance secondary units (in parentheses). An exception would
Performance metrics be the use of English units as identifiers in trade, such
as “3.5-inch disk drive”.
The effectiveness of a technique is assessed with respect to
the confusion matrix's F1-score, sensitivity, accuracy, and  Avoid combining SI and CGS units, such as current
precision. in amperes and magnetic field in oersteds. This often
leads to confusion because equations do not balance
Accuracy: It is the total of all the topics that were
dimensionally. If you must use mixed units, clearly
recognized successfully. state the units for each quantity that you use in an
equation.

(1)  Do not mix complete spellings and abbreviations of


units: “Wb/m2” or “webers per square meter”, not
“webers/m2”. Spell out units when they appear in “compliment”, “discreet” and “discrete”, “principal”
text: “. . . a few henries”, not “. . . a few H”. and “principle”.
 Use a zero before decimal points: “0.25”, not “.25”.  Do not confuse “imply” and “infer”.
Use “cm3”, not “cc”. (bullet list)
 The prefix “non” is not a word; it should be joined to
the word it modifies, usually without a hyphen.
C. Equations
The equations are an exception to the prescribed  There is no period after the “et” in the Latin
specifications of this template. You will need to determine abbreviation “et al.”.
whether or not your equation should be typed using either the  The abbreviation “i.e.” means “that is”, and the
Times New Roman or the Symbol font (please no other abbreviation “e.g.” means “for example”.
font). To create multileveled equations, it may be necessary
to treat the equation as a graphic and insert it into the text An excellent style manual for science writers is [7].
after your paper is styled.
Number equations consecutively. Equation numbers, V. USING THE TEMPLATE
within parentheses, are to position flush right, as in (1), using After the text edit has been completed, the paper is ready
a right tab stop. To make your equations more compact, you for the template. Duplicate the template file by using the
may use the solidus ( / ), the exp function, or appropriate Save As command, and use the naming convention
exponents. Italicize Roman symbols for quantities and prescribed by your conference for the name of your paper. In
variables, but not Greek symbols. Use a long dash rather than this newly created file, highlight all of the contents and
a hyphen for a minus sign. Punctuate equations with commas import your prepared text file. You are now ready to style
or periods when they are part of a sentence, as in: your paper; use the scroll down window on the left of the MS
Word Formatting toolbar.
ab 
A. Authors and Affiliations
Note that the equation is centered using a center tab stop. The template is designed for, but not limited to, six
Be sure that the symbols in your equation have been defined authors. A minimum of one author is required for all
before or immediately following the equation. Use “(1)”, not conference articles. Author names should be listed starting
“Eq. (1)” or “equation (1)”, except at the beginning of a from left to right and then moving down to the next line. This
sentence: “Equation (1) is . . .” is the author sequence that will be used in future citations
and by indexing services. Names should not be listed in
D. Some Common Mistakes columns nor group by affiliation. Please keep your
affiliations as succinct as possible (for example, do not
 The word “data” is plural, not singular. differentiate among departments of the same organization).
 The subscript for the permeability of vacuum 0, and 1) For papers with more than six authors: Add author
other common scientific constants, is zero with names horizontally, moving to a third row if needed for
subscript formatting, not a lowercase letter “o”.
more than 8 authors.
 In American English, commas, semicolons, periods, 2) For papers with less than six authors: To change the
question and exclamation marks are located within default, adjust the template as follows.
quotation marks only when a complete thought or a) Selection: Highlight all author and affiliation lines.
name is cited, such as a title or full quotation. When
quotation marks are used, instead of a bold or italic b) Change number of columns: Select the Columns
typeface, to highlight a word or phrase, punctuation icon from the MS Word Standard toolbar and then select the
should appear outside of the quotation marks. A correct number of columns from the selection palette.
parenthetical phrase or statement at the end of a c) Deletion: Delete the author and affiliation lines for
sentence is punctuated outside of the closing the extra authors.
parenthesis (like this). (A parenthetical sentence is
punctuated within the parentheses.)
B. Identify the Headings
 A graph within a graph is an “inset”, not an “insert”.
The word alternatively is preferred to the word Headings, or heads, are organizational devices that guide
“alternately” (unless you really mean something that the reader through your paper. There are two types:
alternates). component heads and text heads.
 Do not use the word “essentially” to mean Component heads identify the different components of
“approximately” or “effectively”. your paper and are not topically subordinate to each other.
Examples include Acknowledgments and References and,
 In your paper title, if the words “that uses” can for these, the correct style to use is “Heading 5”. Use “figure
accurately replace the word “using”, capitalize the caption” for your Figure captions, and “table head” for your
“u”; if not, keep using lower-cased. table title. Run-in heads, such as “Abstract”, will require you
 Be aware of the different meanings of the to apply a style (in this case, italic) in addition to the style
provided by the drop down menu to differentiate the head
homophones “affect” and “effect”, “complement” and
from the text.
Text heads organize the topics on a relational, REFERENCES
hierarchical basis. For example, the paper title is the primary The template will number citations consecutively within
text head because all subsequent material relates and brackets [1]. The sentence punctuation follows the bracket
elaborates on this one topic. If there are two or more sub- [2]. Refer simply to the reference number, as in [3]—do not
topics, the next level head (uppercase Roman numerals) use “Ref. [3]” or “reference [3]” except at the beginning of a
should be used and, conversely, if there are not at least two sentence: “Reference [3] was the first ...”
sub-topics, then no subheads should be introduced. Styles
named “Heading 1”, “Heading 2”, “Heading 3”, and Number footnotes separately in superscripts. Place the
“Heading 4” are prescribed. actual footnote at the bottom of the column in which it was
cited. Do not put footnotes in the abstract or reference list.
C. Figures and Tables Use letters for table footnotes.
a) Positioning Figures and Tables: Place figures and Unless there are six authors or more give all authors’
tables at the top and bottom of columns. Avoid placing them names; do not use “et al.”. Papers that have not been
in the middle of columns. Large figures and tables may span published, even if they have been submitted for publication,
across both columns. Figure captions should be below the should be cited as “unpublished” [4]. Papers that have been
figures; table heads should appear above the tables. Insert accepted for publication should be cited as “in press” [5].
figures and tables after they are cited in the text. Use the Capitalize only the first word in a paper title, except for
abbreviation “Fig. 1”, even at the beginning of a sentence. proper nouns and element symbols.
For papers published in translation journals, please give
TABLE I. TABLE TYPE STYLES the English citation first, followed by the original foreign-
language citation [6].
Table Table Column Head
Head Table column subhead Subhead Subhead
a [1] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of
copy More table copy
Lipschitz-Hankel type involving products of Bessel functions,” Phil.
a. Sample of a Table footnote. (Table footnote) Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955.
(references)
Fig. 1. Example of a figure caption. (figure caption) [2] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed.,
vol. 2. Oxford: Clarendon, 1892, pp.68–73.
Figure Labels: Use 8 point Times New Roman for Figure [3] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange
labels. Use words rather than symbols or abbreviations when anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds.
writing Figure axis labels to avoid confusing the reader. As New York: Academic, 1963, pp. 271–350.
an example, write the quantity “Magnetization”, or [4] K. Elissa, “Title of paper if known,” unpublished.
“Magnetization, M”, not just “M”. If including units in the [5] R. Nicole, “Title of paper with only first word capitalized,” J. Name
label, present them within parentheses. Do not label axes Stand. Abbrev., in press.
only with units. In the example, write “Magnetization (A/m)” [6] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron
or “Magnetization {A[m(1)]}”, not just “A/m”. Do not label spectroscopy studies on magneto-optical media and plastic substrate
interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August
axes with a ratio of quantities and units. For example, write 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
“Temperature (K)”, not “Temperature/K”. [7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA:
University Science, 1989.
ACKNOWLEDGMENT (Heading 5)
The preferred spelling of the word “acknowledgment” in IEEE conference templates contain guidance text for
composing and formatting conference papers. Please
America is without an “e” after the “g”. Avoid the stilted ensure that all template text is removed from your
expression “one of us (R. B. G.) thanks ...”. Instead, try “R. conference paper prior to submission to the
B. G. thanks...”. Put sponsor acknowledgments in the conference. Failure to remove template text from
unnumbered footnote on the first page. your paper may result in your paper not being published.
We suggest that you use a text box to insert a graphic
(which is ideally a 300 dpi TIFF or EPS file, with all fonts
embedded) because, in an MSW document, this method is
somewhat more stable than directly inserting a picture.
To have non-visible rules on your frame, use the
MSWord “Format” pull-down menu, select Text Box >
Colors and Lines to choose No Fill and No Line.

You might also like