EPD An Integrated Modeling Technique To Classify BC
EPD An Integrated Modeling Technique To Classify BC
Abstract- In the past two decades, Breast Cancer (BC) had found MRI or ultrasound to predict response to neoadjuvant
as second most common death and continues to be prone in low- chemotherapy in breast cancer patients [6- 12].
middle income countries. However, in those days there have been
a lot of technologies developed and implemented in the medical Thus, in this research, we aimed at developing a novel ML
field so far but still unable to cure this disease completely. Thus, model to classify the abnormal cells present in the human breast,
need to be more conscious and design novel techniques that would so that patients will be cured at the initial stage. However, there
be able to avoid unnecessary deaths at the early stages. In this have been a lot of models have been developed, and our
study, we have taken key studies of related cells, and risk factors proposed EPD model will be found as the best alternative for
and design a novel EPD (EDA, PCA, and DT) model to classify doctors. Before doing that there is a need to properly visualize
abnormal cells into either benign (B) or malignant (M). the raw images and select the most appropriate features to get
Furthermore, EPD has been designed by combining three major the best results.
techniques as Exploratory Data Analysis (EDA) to visualize the
raw data, principal component analysis (PCA) to select the most
promising features, and Decision Tree to predict the disease with
these features. These findings show the best novel approach
against BC for doctors as well as healthcare organizations as
compared to individual techniques.
I. INTRODUCTION
BC is the second most common death worldwide and
continues to be prone without any hesitation. Many developed
nations have initiated BC screening programs for their quality
improvement in medical fields. Nevertheless, BC continues to
be the top or second leading cause of cancer death in women in
those nations—including those taking part in screening [1, 2]. Fig. 1. Representation of a human breast mammogram on both left and right-
This demonstrates that too many women are not receiving hand sides
enough mammographic screening. To lower breast cancer
mortality, early detection must be significantly improved. II. MATERIAL & METHOD
Underdiagnoses, or failing to detect disease at an early enough
stage to prevent morbidity and mortality from breast cancer, is A. Material
the main issue with current breast screening programs. A cancer In this research, we have taken Wisconsin Breast Cancer
diagnosis at a metastatic stage can be avoided with early Database (WBCD) from a freely accessible “UCI machine
detection through routine screening. To say that there is a need learning repository” [13]. The dataset provides details on tumour
for technologies that will reduce avoidable deaths now. Figure traits that were calculated from a digitized image of a breast mass
1, shows the breast mammograms on both sides containing obtained by fine-needle aspiration (FNA). Ten features, one for
normal and abnormal cells. each observation, are used to define the tumour's size, density,
However, previously there have been made huge texture, symmetry, and other aspects of the cell nuclei visible in
advancements in healthcare fields through the use of developed the image. For each image, the average, standard deviation, and
machines, applications, and other technologies. In this regard, "worst" mean of these features were calculated, yielding 30
the application of ML and DL technologies also has made a features. The category target feature provides information about
greater impact as well as providing a better diagnosis of the the tumour's nature i.e. benign or malignant.
disease at the early stages. Deep learning (DL) has recently been B. Method
used more and more to diagnose breast cancer and predict
It is essential to make good decisions and support the
treatment outcomes, and the results are optimistic [3–5]. In
particular, several studies have been carried out to use DL on reporting of outcomes given the extent and amount of data
collected in healthcare-related fields. The effective use of data
visualization can affect and facilitate decision-making.
979-8-3503-3936-9/23/$31.00 ©2023 IEEE
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on August 11,2023 at 03:50:53 UTC from IEEE Xplore. Restrictions apply.
651
Additionally, the feature selection technique has become
more advantageous to predict the disease using novel ML
models. A Decision Tree (DT), a tree-like model makes
EPD decisions based on resources from previous nodes at each level
WBCD and provides respective outcomes based on that decisions. This
Database
EDA PCA DT algorithm follows conditional statements to perform this
operation. Combining all these techniques at once as shown in
figure 2, our proposed EPD model enhances the capability to
make a better decision at classifying the disease than individual
Calculate operation.
accuracy_score, 1) EDA: The strength of data visualization lies in its
recall_score and capacity to highlight patterns that might otherwise go unnoticed.
The method of employing visual techniques to study data is
called exploratory data analysis (EDA) [14]. Thus, to perform
Design
statistical analysis and also to find trends and patterns in the raw
Confusion_matrix
images, we have taken the EDA technique as the first step of our
proposed approach. The number of concave points, perimeter,
Plot and area, as well as the nuclear radius and malignancy, all
Decision Tree demonstrate strong positive linear connections, as shown in
figure 3.
Fig. 2. Workflow of the EPD model
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on August 11,2023 at 03:50:53 UTC from IEEE Xplore. Restrictions apply.
652
Fig. 4. Features selection using PCA projection
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on August 11,2023 at 03:50:53 UTC from IEEE Xplore. Restrictions apply.
653
grid. From figure 6, we have seen concave points (CP) are
3) Apply Decision Tree (DT): One of the simplest
skewed, whereas all others are bell-shaped. So, we performed
algorithms is the DT, in which there is a nonlinear relationship
DT based on CP <= 0.049, as shown in figure 7. The left-hand
between the features and the outcome. Scaling is typically not
side shows the true values as “benign” whereas the right one
needed for decision trees. Best parameters, including the depth
shows the false values as “malignant” according to the given
of the tree, split criteria, and the minimum number of samples
condition at each node. That means, if CP <= 0.049, R_mean
for a leaf node, can be found with the use of the GridSearchCV
<= 14.975, and Entropy = 0.154, then only the image will be
function in Python, which thoroughly finds model optimal
classified as “benign” or otherwise classified as “malignant”.
parameters by cross-validated grid-search over a parameter
T F
R_mean <= 14.975 R_mean <= 16.205
E = 0.322 E = 0.58
S = 239 S = 159
V = [225, 14] V = [22, 137]
C=B C=M CP = Concave
Points
E = Entropy
T F T F S = Samples
V = Value
E = 0.154 E = 0.971 E = 0.858 E = 0.0 C = Class
S = 224 S = 15 S = 78 S = 81 B = Benign
V = [219, 5] V = [6, 9] V = [22, 56] V = [0, 81] M = Malignant
C=B C=M C=M C=M T = True
F = False
Fig. 7. Tree representation for specifying class level as either “benign” or “malignant”
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on August 11,2023 at 03:50:53 UTC from IEEE Xplore. Restrictions apply.
654
IV. CONCLUSION [14] S. Prusty, S. Patnaik, and S. K. Dash, “Exploratory Data Analysis on
SARS-CoV-2 Variants in India: especially Omicron (B. 1.1. 529) as of
As discussed, BC is the most common disease and affects 6th December 2021,” In 2022 International Conference on Decision Aid
Sciences and Applications (DASA), 2022, (pp. 94-99). IEEE.
every single woman out of ten worldwide. Thus, early diagnosis
[15] E. Zdravevski, B. Risteska Stojkoska, M. Standl, and H. Schulz,
is much necessary to overcome this mortality these days. In this “Automatic machine-learning based identification of jogging periods
article, we have proposed a model, namely EPD that combining from accelerometer measurements of adolescents under field conditions,”
takes three operations to perform the BC classification. We now PLoS ONE 2017, 12, e0184216.
know from the EDA that area, perimeter, and radius are closely
connected. Because of this, it would be preferable to eliminate
all features from the "worst" samples, including perimeter, area,
and features. That’s why the PCA technique has come into place
to limit unnecessary features. Besides that, DT decides each
level based on previous nodes and for that it uses conditional
statements. Therefore, EPD performs well then every individual
operation. Apart from that from figure 7, we have seen that the
complete model could change if the training set is slightly altered
because the trees are also extremely sensitive to input data noise.
The model's capacity to be understood is hampered by this. To
greatly reduce overfitting in the future, we have worked to
develop techniques like pruning, specifying a minimum number
of samples per leaf, and defining a maximum depth for the tree.
REFERENCE
[1] American Cancer Society. Cancer Facts and Figures 2022. Atlanta, Ga:
American Cancer Society; 2022. https://www.cancer.org/cancer/breast-
cancer/about/how-common-is-breast-cancer.html. Accessed November
30, 2022.
[2] Breast cancer burden in EU-27. ECIS – European Cancer Information
System. https://ecis.jrc.ec.europa.eu/pdf/Breast_cancer_factsheet-
Oct_2020.pdf. Accessed November 30, 2022.
[3] Q. Hu, H. M. Whitney, and M. L. A. Giger, “deep learning methodology
for improved breast cancer diagnosis using multiparametric MRI,” Sci.
Rep. 10, 10536, 2020. https://doi.org/10.1038/s41598-020-67441-4.
[4] S. Prusty, S. K. Dash, and S. Patnaik, “A novel transfer learning technique
for detecting breast cancer mammograms using VGG16 bottleneck
feature,” ECS Transactions, 107(1), 733, 2022.
[5] W. C. Ou, D. Polat, and B. E. Dogan, “Deep learning in breast radiology:
current progress and future directions,” Eur. Radiol. 31, 4872–4885, 2021.
https://doi.org/10.1007/s00330-020-07640-9.
[6] M. El Adoui, S. Drisis, and M. Benjelloun, “Multi-input deep learning
architecture for predicting breast tumor response to chemotherapy using
quantitative MR images,” Int. J. Comput. Assist. Radiol. Surg. 15, 1491–
1500, 2020. https://doi.org/10.1007/s11548-020-02209-9.
[7] S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-
validation on ML classifiers for predicting cervical cancer,” Frontiers in
Nanotechnology, 4, 972421, 2022.
[8] S. Joo, et al., “Multimodal deep learning models for the prediction of
pathologic response to neoadjuvant chemotherapy in breast cancer,” Sci.
Rep. 11, 18800, 2021. https://doi.org/10.1038/s41598-021-98408-8.
[9] Y. H. Qu, et al., “Prediction of pathological complete response to
neoadjuvant chemotherapy in breast cancer using a deep learning (DL)
method,” Thorac. Cancer 11, 651–658, 2020.
https://doi.org/10.1111/1759-7714.13309.
[10] S. G. P. Prusty, and S. Prusty, “Time Series Analysis of SAR-Cov-2 Virus
in India Using Facebook’s Prophet,” In Meta Heuristic Techniques in
Software Engineering and Its Applications: METASOFT 2022 (pp. 72-
81). Cham: Springer International Publishing.
[11] J. Gu, et al., “Deep learning radiomics of ultrasonography can predict
response to neoadjuvant chemotherapy in breast cancer at an early stage
of treatment: A prospective study,” Eur. Radiol. 32, 2099–2109, 2022.
https://doi.org/10.1007/s00330-021-08293-y.
[12] M. Jiang, et al., “Ultrasound-based deep learning radiomics in the
assessment of pathological complete response to neoadjuvant
chemotherapy in locally advanced breast cancer,” Eur. J. Cancer 147, 95–
105, 2021. https://doi.org/10.1016/j.ejca.2021.01.028.
[13]
http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Di
agnostic%29
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on August 11,2023 at 03:50:53 UTC from IEEE Xplore. Restrictions apply.
655