0% found this document useful (0 votes)
7 views

Early Detection of Breast Cancer

This study develops an AI algorithm using a convolutional neural network to detect breast cancer in MRI scans up to one year earlier than traditional methods, potentially improving early detection in high-risk women. The AI model was trained on a dataset of 3029 MRI scans and demonstrated a sensitivity of 30% in identifying cancers, with a ROC curve area of 0.72. The findings suggest that AI can enhance the re-evaluation of benign MRIs, leading to earlier cancer detection as datasets and imaging quality improve.

Uploaded by

naveed afroz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Early Detection of Breast Cancer

This study develops an AI algorithm using a convolutional neural network to detect breast cancer in MRI scans up to one year earlier than traditional methods, potentially improving early detection in high-risk women. The AI model was trained on a dataset of 3029 MRI scans and demonstrated a sensitivity of 30% in identifying cancers, with a ROC curve area of 0.72. The findings suggest that AI can enhance the re-evaluation of benign MRIs, leading to earlier cancer detection as datasets and imaging quality improve.

Uploaded by

naveed afroz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

]] ]] Original Investigation

]]]]]]

Early Detection of Breast Cancer in


MRI Using AI
Lukas Hirsch, Yu Huang, Hernan A. Makse, Danny F. Martinez, Mary Hughes, Sarah Eskreis-Winkler,
Katja Pinker, Elizabeth A. Morris, Lucas C. Parra, Elizabeth J. Sutton

Rationale and Objectives: To develop and evaluate an AI algorithm that detects breast cancer in MRI scans up to one year before
radiologists typically identify it, potentially enhancing early detection in high-risk women.
Materials and Methods: A convolutional neural network (CNN) AI model, pre-trained on breast MRI data, was fine-tuned using a
retrospective dataset of 3029 MRI scans from 910 patients. These contained 115 cancers that were diagnosed within one year of a
negative MRI. The model aimed to identify these cancers, with the goal of predicting cancer development up to one year in advance. The
network was fine-tuned and tested with 10-fold cross-validation. Mean age of patients was 52 years (range, 18–88 years), with average
follow-up of 4.3 years (range 1–12 years).
Results: The AI detected cancers one year earlier with an area under the ROC curve of 0.72 (0.67–0.76). Retrospective analysis by a
radiologist of the top 10% highest risk MRIs as ranked by the AI could have increased early detection by up to 30%. (35/115,
CI:22.2–39.7%, 30% sensitivity). A radiologist identified a visual correlate to biopsy-proven cancers in 83 of prior-year MRIs (83/115, CI:
62.1–79.4%). The AI algorithm identified the anatomic region where cancer would be detected in 66 cases (66/115, CI:47.8–66.5%); with
both agreeing in 54 cases (54/115, CI:%37.5–56.4%).
Conclusion: This novel AI-aided re-evaluation of "benign" breasts shows promise for improving early breast cancer detection with MRI.
As datasets grow and image quality improves, this approach is expected to become even more impactful.

Key Words: Breast cancer; Magnetic resonance imaging; Early detection; Deep learning.
© 2024 The Association of University Radiologists. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

SUMMARY STATEMENT mammogram, supplemental screening MRI can detect an


additional 15–18 cancers per 1000 high-risk women (3). Due
Predicting the probability of developing breast cancer from
to the higher sensitivity of MRI over mammography (4,5),
MRI using AI has the potential to meaningfully improve
we propose that it may be possible to estimate an individual’s
early detection.
risk of developing cancer within the next year, based on their
most recent negative screening MRI.
Retrospective studies suggest that 34–47% of detected
INTRODUCTION
cancers were present already in prior MRI exams (6–8).
In the United States, over 500,000 women undergo annual More recent studies report visual correlates in the preceding
supplemental screening breast MRI (1). Women are enrolled year in up to 75% of screen-detected cancers (9–11). While
in such supplemental MRI screening because they are at these studies benefit from the hindsight of retrospective re­
increased risk of breast cancer (2). After a negative screening view, they suggest an opportunity for earlier lesion detection,
even though radiologists already demonstrate excellent sen­
sitivity with MRI (12). In addition to local cancer signs,
global risk factors may be identifiable. For example, back­
Acad Radiol 2025; 32:1218–1225 ground parenchymal enhancement may be a predictor of
From the City College of New York, 160 Convent Ave, New York, breast cancer development (13–16). While it is not clear how
New York 10031, USA (L.H., Y.H., H.A.M., L.C.P.); Memorial Sloan Kettering
Cancer Center, 300 E 66th St Floors 1 - 4, New York, New York 10065, USA relevant global features are for short-term prediction, they
(D.F.M., M.H., S.E.-W., K.P., E.A.M., E.J.S.); University of California, Davis, 1 are in principle available for automated analysis.
Shields Ave, Davis, California 95616, USA (E.A.M.). Received July 22, 2024;
revised October 11, 2024; accepted October 12, 2024. Address corre­
Our proposed framework hypothesizes that current MRI
spondence to: L.C.P. e-mail: [email protected] exams contain information about the outcome of the next annual
© 2024 The Association of University Radiologists. Published by Elsevier Inc. screening, and AI can predict breast cancer occurrence based on a
This is an open access article under the CC BY-NC-ND license (http://crea­ current benign MRI. We intend for the AI to evaluate all MRIs
tivecommons.org/licenses/by-nc-nd/4.0/).https://doi.org/10.1016/
j.acra.2024.10.014 deemed normal or probably benign by radiologists within the

1218
Academic Radiology, Vol 32, No 3, March 2025 EARLY DETECTION OF BREAST CANCER IN MRI

high-risk screening population. By analyzing the current MRI, at the follow-up exam, alongside a group of randomly selected
the AI identifies higher probability cases, which a radiologist can screening patients with no cancer detected during that time-
then re-evaluate. This study will determine in a retrospective frame. This resulted in 3029 sagittal MRIs. These sagittal MRIs
analysis how many cancers could potentially have been detected were separated by left and right breast images, with the ex­
early through this re-evaluation. Additionally, the network ception of unilateral studies (there were 930 unilateral exams).
identified regions of concern. We evaluate whether these match This yielded 5128 individual breast images (Fig 1). Breasts were
the location of future cancers. labeled "benign" (n = 4965) if two years of imaging with BI-
RADS ≤ 3 or a negative biopsy were documented. Breasts were
labeled (future) "malignant" if the follow-up MRI led to a
MATERIALS AND METHODS malignant pathology finding (n = 163), noting that all were
initially deemed benign.
Patient Sample A breast radiologist with 10 years of clinical experience re­
The evaluation used retrospective data from 910 women viewed all 163 MRIs of breasts that developed cancer within a
who underwent breast MRI at a tertiary Cancer Center in year and excluded 48 meeting the following criteria: MRI at
the United States, for screening purposes, with consecutive diagnosis unavailable (n = 30), post-lumpectomy change (n = 6),
screening exams of up to 12 years (Fig. S1). We selected axillary recurrence (n = 2), or biopsy change obscuring visuali­
patients with sagittal-plane MRIs in the period between zation (n = 10). This left 115 breasts (from 112 patients) for
2002 and 2014. The use of this retrospective data was ap­ cancer location analysis and BI-RADS feature evaluation (Fig 1).
proved by the institutional review board with a waiver of The radiologist identified the cancer's anatomical location as the
informed consent, and all procedures were HIPAA com­ slice number containing the largest lesion ("index slice").
pliant. Patient information was removed, and MRIs were
saved with anonymized identifiers before analysis. Lesion Sizes
Inclusion criteria for all MRIs were a BI-RADS assessment
≤ 3 and a follow-up exam within 15 months (referred to as "1 To obtain an unbiased estimate of lesion size at both time
year"). We included all individuals with screen-detected cancer points, we used automatic segmentation using an automatic

Figure 1. Patient sample. Partitions for training and testing of the AI algorithm were done per patient. Results were evaluated per breast and
per exam.

1219
HIRSCH ET AL Academic Radiology, Vol 32, No 3, March 2025

segmentation tool that has been previously validated (17). We Localization of Cancer Lesions
selected a contiguous segmented area at the location of the
The 2D CNN estimates the probability of future cancer de­
index lesion, and measured its length along the principal axis in
tection for each slice in an MRI volume, thus predicting the
2D. This metric was confirmed in four cases measured by the
location of cancer as the slice with the highest probability. We
radiologist using a conventional clinical approach (see Fig. S2).
use the segmentation of the tumor in the follow-up MRI (see
Lesion Size previously), and coregistered it to the current MRI
Development of a Cancer Detection and Localization using the “NiftyReg” software (22) (see Supplement). We
Network considered the machine's prediction “correct” if the highest
The low prevalence of cancer in the screening population predicted probability fell within one slice of the segmented le­
poses challenges for both detection (18,19) and network sion location, otherwise it was a 'miss'. We also used the resulting
training due to the large number of parameters. To address segmentation to compute the size, which were incorporated as
this, we leveraged an existing network that had been trained features into the BIRADS assessment process (see Table S1).
on a large dataset of 11,000 patients from the same clinical site
(20). For details and hyperparameters of the pre-trained model Cost and Benefit in Early Detection
see supplementary material. The patients from this earlier
work did not overlap with the current patient cohort. This A variety of measures are used to quantify performance in
pre-trained 2D convolutional neural network is designed to binary classification problems. All measures are defined based
detect breast cancer in current MRIs, by assigning a prob­ on four possible outcomes, namely, the number of classifi­
ability of containing cancer to each 2D sagittal slice, with the cations that are true positives (TP), true negatives (TN), false
overall output being the maximum probability across all slices positives (FP) and false negatives (FN). Here, “positive”
of a breast. We fine-tuned this network to predict cancer in means that cancer will be present in the next exam and
subsequent MRIs using the data described previously. We “negative” means that cancer will not be present. The tra­
employed 10-fold cross-validation for fine-tuning and testing. ditional approach in the context of diagnosis is to quantify
Patients do not overlap between training and testing to pre­ the tradeoff between sensitivity and specificity:
vent bias when estimating generalization to unseen patients.
Sensitivity = TP/(TP+FN) (1)
However, in the training set, there can be multiple images
from the same patients, including previous exams and the
Specificity = TN/(TN+FP) (2)
contralateral breast. This means that similar images may have
both benign and malignant labels. This is a form of within- This is captured by the conventional receiver operating
patient control that may help identify subtle differences characteristic (ROC) curve. In the context of early detec­
without overtraining on accidental differences between pa­ tion, sensitivity can be seen as a “fraction of re-evaluated
tients. Each fold used 90% of patient MRIs for training and cancers”, following our proposal to re-evaluate the highest
10% for testing. Due to the small data size resulting from low risk exams. These re-evaluations may lead to early detec­
prevalence, no validation set was used. We fine-tuned only tions, and therefore we take sensitivity as a measure of the
the last two layers (610 parameters) and used a fixed 50 epochs ‘benefit’ of re-evaluations. On the flip side, re-evaluating
of training as we did not expect significant overtraining with exams comes at a ‘cost’ in the sense of additional work for
this parameter count. This was confirmed in a post-hoc ana­ radiologists, which we quantify with the:
lysis by re-training the same folds, reserving 10% of the data in
each fold to tracking loss during training. We observed no Re-evaluation rate = (TP + FN)/(TP + FP + TN + FN) (3)
increase in the validation loss during training (Fig. S3). Posi­
Due to the low prevalence of cancers, we note that the re-
tive examples were taken from the index slice where cancer
evaluation rate is approximately equal to 1-specificity (Fig.
eventually developed (we were certain that it contained a
S5). Therefore, this cost-benefit tradeoff is approximately
tumor for this slide only), while negative examples used a slice
captured by the ROC curve.
from the center of benign breasts and another randomly se­
An additional ‘cost’ are the potential recalls for a biopsy
lected slice (this sufficed as we had a large number of benign
that will yield a benign pathology. The worst-case cost
exams) (see Methods, Localization of cancer lesions, and Fig.
would be incurred if all re-evaluations resulted in a recall.
S4). To compensate for strong class imbalance, we enriched
This worst-case cost is captured by the false discovery rate
malignant cases and use a “focal loss” for training (21). A high
(FDR):
gamma parameter in the focal loss (we used gamma = 5)
reduces the loss for samples with high certainty for the correct False discovery rate = FP/(TP+FP) (4)
class, while increasing it for those with low certainty. This
mitigates class imbalance by focusing on more challenging rare In practice, however, we expect fewer recalls upon re-eva­
examples, rather than the abundant easy examples. Details on luation. FDR can be selected based on the traditional perfor­
data acquisition, preprocessing, harmonization, and demo­ mance metric for recalls, namely, the positive predictive value
graphics are provided in the Supplement. (PPV) (23), since FDR = 1 - PPV. In statistics, sensitivity is

1220
Academic Radiology, Vol 32, No 3, March 2025 EARLY DETECTION OF BREAST CANCER IN MRI

Figure 2. AI-estimated probability of developing breast cancer one year in advance from the current cancer-free breast MRI in a clinical
screening population, and cost-benefit analysis. (a) Cross-validation ROC curve for 12-month cancer prediction (cross-validation perfor­
mance). The suggested operating point for sensitivity is selected at 30% (circle), resulting in a specificity of 90%. (b) Distribution of future
screening outcomes for all breasts based on AI-derived probability from the current cancer-free MRI. The histogram is in logarithmic scale to
better visualize the low prevalence of screen-detected cancers (n = 115). (c) Trade-off between the false discovery rate (FDR) and sensitivity
(effectively, a “precision-recall” curve). One can select the operating point based on the desired benefit (sensitivity, vertical arrow) or al­
ternatively, the acceptable cost (FDR, horizontal arrow). (d) Relation between FDR and AI probability to determine the decision threshold for
re-evaluations. The operating point in panel A (circle) corresponds to a decision threshold of 0.64 in panel B (dotted line).

also known as “recall”, while 1-FDR is known as precision method. For AUC-ROC values, they were computed using
(beware that “recall” carries a different meaning in statistics and bootstrapping with replacement.
radiology). Therefore, this cost-benefit tradeoff, captured by
sensitivity and FDR, represents a precision-recall relationship,
which is recommended in low prevalence scenarios where
RESULTS
ROC analysis is less appropriate (24,25).
We trained a network to predict the outcome of the next
Confidence Intervals
scheduled screening from the current MRI, distinguishing 115
future screen-detected cancers from 4965 breasts that remained
All confidence intervals represent 95% confidence. For ra­ benign. The distribution of AI-predicted cancer probability in
tios, they were computed using the Clopper-Pearson exact this cohort is shown in Figure 2b. On this test set, the network

1221
HIRSCH ET AL Academic Radiology, Vol 32, No 3, March 2025

Figure 3. Localization and predicted probability of future cancers. Each of the four panels shows the healthy breast in the current MRI (left)
and the cancer in the subsequent MRI (right), with the cancer highlighted in yellow. In the current MRI, the slice is selected by AI, while in the
subsequent MRI, it is selected by the radiologist. The numeric value in the top-left corner indicates the predicted one-year cancer risk for this
breast. N indicates the number of screen-detected cancers in each category, totaling 115. Panels on top show true positive predictions with
matching (a) or non-matching localizations (b), and panels on the left show matching localization for successful (a) or missed (c) early
detections. (Color version of figure is available online.)

achieved an area under the receiver operating characteristic true positives (early detections) and 80 false negatives
curve (ROC–AUC) of 0.72 (CI: 0.67–0.77, N = 115, standard (N = 115 total). When evaluating at the level of exams, a
deviation across the 10 folds=0.07) when evaluating individual 10% re-evaluation rate could potentially detect cancer earlier
breast MRIs (Fig 2a). When evaluating at the exam level in 23% of cancers in this cohort (circle in Fig. S6b).
(n = 112, with 3 exams having malignancy in both breasts), the
AUC was 0.66 (CI: 0.61–0.71). This prediction task is con­ Cancer Localization
siderably more challenging than diagnosis on the current exam,
as it focuses on breasts deemed cancer-free by radiologists (92% The network assigns a probability of future cancer presence
of MRIs had BI-RADS ≤ 3), and cancers only become ap­ to each MRI slice, potentially guiding radiologists during re-
parent within a year in 2% of cases (Fig 4), consistent with evaluation for decision referral. We evaluated the accuracy of
reported cancer rates (26). this localization using the index slice as ground truth (see
We propose to re-evaluate high-risk cases that exceed a Methods, Localization of cancer lesions).
given probability of developing cancer as predicted by the AI Examples of correct vs. incorrect AI localization are
(Fig 2b, dotted line). To determine this decision threshold shown in Figure 3 (left vs. right). First, we evaluated loca­
we consider the tradeoff between benefit and costs as defined lization independently of classification. Overall, the AI se­
in the Methods. If we select the desired benefit (sensitivity) lected the correct location for future cancer in 57% of cases
we incur a cost in terms of radiologist time (re-evaluation (66/115, CI:47.8–66.5%). Next, we analyzed localization
rate) and worst-case cost of recalls (false discovery rate, separately in true positive and false negative early detections.
FDR). As an example, we selected a sensitivity of 30% Of the 35 true positives, 25 had correct localization
(Fig 2c, vertical arrow), which means potentially detecting (Fig 3a), indicating the AI correctly localized 71% of the
one-third of all cancers early. This results in a FDR of 96% cases recommended for re-evaluation (25/35, CI:
(Fig 2c, horizontal arrow), which corresponds to a positive 53.7–85.4%, reported in Fig 4: “Correct location and de­
predictive value (PPV) of 6%. The cost in terms of radi­ tection”). These may be easier for radiologists to detect upon
ologist time can be read from the ROC curve (Fig 2a, see re-evaluation. In 10/35 true positive cases, the network did
Methods). At a sensitivity of 30%, we obtain a specificity of not select the correct slice, focusing instead on a different
90%, which corresponds to re-evaluating approximately the mass (Fig 3b). Closer examination of these 10 cases revealed
top 10% of all cases. At this sensitivity level, there were 35 that in all instances, a visually more suspicious mass

1222
Academic Radiology, Vol 32, No 3, March 2025 EARLY DETECTION OF BREAST CANCER IN MRI

Figure 4. Summary of early detection and localization results. Each circle represents the total number of breasts examined for screening.
Areas are scaled to the fraction of cases. Left: of all benign exams most will remain benign in the next screening exam (green area) and a
small fraction will have a cancer diagnosis (2% in orange). The AI-tool suggests to re-evaluate 10% of breasts (blue circle). Center: From all
cancers that will be detected in the subsequent screening exam, the AI-tool recommends re-evaluating 30% (blue overlap: “AI: Correct
detection upon re-evaluation”). The AI-tool also correctly flagged the location where cancers would be found next year in 57% of all cancers
(red circle: “AI: Correct location”). Right: of the correct detections recommended for re-evaluation (blue circle), a large portion were also
correctly localized (71% overlap between red circle and blue circle: “AI: Correct location and detection upon re-evaluation”). (Color version of
figure is available online.)

influenced the model's decision. Notably, in four out of these on pre-diagnosis and diagnostic MRIs (Table S1), separated
10 cases, the model locates the correct lesion with a prob­ by AI-determined probability. The AI model was more
ability above the high-risk threshold of 0.64 (Fig. S7). likely to assign low probability to images without a visual
Among the 80 false negatives, the AI selected the correct correlate of cancer (Table S1 note 1, “No visual correlate”:
slice in over half (41/80, Fig 3c). While these cases had a 38% vs. 6% cases, z = 3.5, p = 0.00046). A greater percentage
probability below the detection threshold (Fig 2a) they likely of mass-type lesions were classified as high-probability by the
had evidence of future malignancy at the correct location. model (Table S1, note 2, 21% vs. 6%, z = 2.3, p = 0.02).
For some false negatives, the AI did not correctly locate the There were no notable differences between high- and low-
future cancer (Fig 3d), representing genuinely challenging probability AI predictions in terms of shape, margin, internal
cases with no obvious evidence of future malignancy re­ enhancement, T2 signal, or distribution between focus/
ported by the AI. mass-type lesions. Similarly, non-mass enhancement lesions
Figure 4 summarizes the proposed approach and results. exhibited no notable differences in their characteristics across
After the initial reading by the radiologist, BI-RADS 1–3 the two probability groups. We did not find any significant
cases are evaluated by AI (Fig 4, green). Of these, 2% de­ difference in the cancer pathology between high vs low
velop cancer in the next year's exam (Fig 4, orange; 27.3% of probability AI predictions (Fig. S9).
these were BIRADS 3). The AI ranks cases by the prob­ Finally, we noted that neither age nor family history of
ability of developing cancer. If selecting the top 10% for re- cancer was predictive of cancer in this patient sample (age:
evaluation, a radiologist would see 30% of all cancers de­ rank-sum test, p = 0.09, W=1.71; family history: Chi-square
tected in the subsequent exams (Fig 4: “AI: Correct detec­ test statistic = 0.43, p = 0.51, df = 1). AUC-ROC was not
tion”). AI-flagged regions of concern contained the correct significantly different when demographic information was
cancer location in 57% of cases. held constant (Fig. S10).

Characteristics of Detected Cancers


DISCUSSION
A breast radiologist retrospectively reviewed all cancer cases
(N = 115; see exclusion criteria in Methods) without having We presented a method to forecast the likelihood of breast
access to the AI output. We refer to a “visual correlate” as a cancer developing within one year based on the current
visible abnormality at the same location where the cancer is MRI. Re-evaluating 10% of cases with the highest AI-pre­
detected in the subsequent exam (examples in Fig. S8). Of dicted probability would have captured 30% of cases diag­
the total 115 tumors, the radiologist identified visual corre­ nosed in the next screening exam. The radiologist may
lates on the prior MRI in 83 cases, the machines flagged 66 decide on a shorter follow-up or a biopsy based on these re-
correct locations and both agreed in 54 cases. Of the lesions evaluations. Upon directed review, the radiologist identified
identified on directed review, 77.4% (89/115, CI: a visual correlate on the MRI taken prior to diagnosis in 72%
68.7–84.7%) were less than 0.5 cm and significantly smaller of cases, which is consistent with published rates (9). In most
on average than at the time of diagnosis (0.56 ± 0.0.25 cm cases, the visual correlates were less than 0.5 cm. However,
vs. 1.02 ± 0.56 cm, t(156) = −5.5, p = 1e-7). The radi­ these should not be definitively considered "misses'' or "false
ologist also provided BI-RADS features for all visible lesions negatives" MRIs since a critical part of screening for early

1223
HIRSCH ET AL Academic Radiology, Vol 32, No 3, March 2025

detection is diagnosing an interval change. Such interval takes effort. Therefore, the benefit of early detection (added
change is considered suspicious and guides management for sensitivity) must be balanced against the effort of re-evalua­
all these cases. tion rate, approximately 1 minus specificity at low pre­
It is important to note that the sensitivity of 30% reported valence. We suggest a 10% re-evaluation rate might be
here are for cancers that would have remained otherwise acceptable if the benefit is 30% earlier cancer detection. The
undetected until the next exam. They are additional cancers ROC curve, approximating an effort-benefit curve, allows
above and beyond those that have already been detected selecting a different tradeoff if desired.
with high sensitivity by the radiologist. We have selected an Limitations of this study include a relatively small number
operating point for the AI based on a desired sensitivity, and of screen-detected cancers and only including sagittal scans
reported the resulting cost in terms of re-evaluation rate. from one clinical site. Performance was reported primarily
Alternatively, we could have selected an acceptable cost and for individual breasts as we suggest direct re-evaluation of the
noted the resulting benefit. For instance, the recommended relevant breast. However, it is more difficult to declare a
benchmark for radiologists currently is a PPV of 15% for patient cancer-free in both breasts, which explains the drop
tissue diagnosis, and PPV of 4.4% for abnormal interpretation in AUC for exam-level performance. AI performance and
(27). At a sensitivity of 30% the PPV for re-evaluation based robustness will likely improve with higher-resolution axial
on the AI would have been 6%. If radiologists recalled only MRIs now routine in clinical practice, using prior-year
half of these re-evaluated cases, they would approximate the MRIs to assess lesion changes, and increasing multi-site da­
recommended PPV for tissue diagnosis, and detect at least an tasets necessary for robust deep learning. Nevertheless, this
additional 15% of tumors, which is a clinically meaningful study provides proof-of-principle and baseline performance
improvement. for early detection.
Within a high-risk population, we observed an
ROC–AUC of 0.72 in predicting cancer at a 1-year follow-
up (0–15 months) using MRI. It's worth noting that recent DATA AVAILABILITY
studies utilizing AI for mammography have reported AUC
values ranging from 0.66 to 0.84 in predicting outcomes 1–2 Datasets analyzed in the current study are not public due to
years in advance (28–30). An independent validation re­ patient confidentiality. However, risk prediction and out­
ported a 1-year prediction of 0.62–0.71 for various AI come information for statistical evaluation of the results are
models (31). Similar AUC results (0.68–0.73) were obtained available together with the code.
in a multi-institutional validation study employing MIRAI
(32). However, it's crucial to recognize that all these studies
were conducted within a general mammographic screening DECLARATION OF COMPETING INTEREST
population. In contrast, predicting cancer development
All authors declare no financial or non-financial competing
within the high-risk population may be more challenging.
interests.
This population undergoes highly sensitive yearly MRI
screening (33), and cancers have already been removed
earlier through mammographic screening. We are not aware
of short-term prediction studies based on MRI in this po­ ACKNOWLEDGMENT
pulation. The only available study reported an AUC of 0.63 We want to thank Joanne Chin for extensive and thorough
at a 5-year follow-up (34), highlighting the complexity of proofreading of earlier versions of this manuscript.
this task compared to the broader mammography popula­
tion. We have used here deep-learning methods, but it is
worth noting that traditional machine-learning techniques
AUTHOR CONTRIBUTIONS
may still beneficial for lesion characterization, diagnosis, and
segmentation, especially when only smaller datasets are LH designed the computational methods, analyzed the data,
available (35–37). programmed the network, generated figures and wrote the
Artificial intelligence research often focuses on algorithm manuscript. YH performed all the image preprocessing.
performance rather than clinically relevant outcomes (38). HAM contributed to the design of the study and models, as
For example, most cancer diagnosis studies only report well as writing the manuscript. MH segmented images. DM
ROC–AUC, and MRI risk prediction studies often focus on provided imaging and clinical data. LCP designed the overall
AUC-ROC (34) without considering clinical workflow approach and analysis methods and wrote the manuscript.
impact. More recent studies report the concordance index EJS provided imaging and clinical data, evaluated the pre­
(c-index) (28). These metrics quantify AI performance but dictions of the network, evaluated BI-RADS features on all
do not directly address the effort-benefit tradeoff or early cancers, and edited the manuscript. SEW and KP provided
detection implications. For breast cancer, early detection is extensive input to the manuscript. EM contributions include
crucial for improving treatment outcomes, making even formulating overall study design, data anonymization and
small increases impactful. However, re-evaluating MRIs curation, result interpretation and manuscript review.

1224
Academic Radiology, Vol 32, No 3, March 2025 EARLY DETECTION OF BREAST CANCER IN MRI

CODE AVAILABILITY 16. Hu X, Jiang L, You C, Gu Y. Fibroglandular tissue and background


parenchymal enhancement on breast MR imaging correlates with breast
The underlying code for the network, trained parameters, cancer. Front Oncol 2021; 11:616716.
and code and data needed for statistical analysis of results are 17. Hirsch L, Huang Y, Luo S, et al. Radiologist-Level Performance by Using
Deep Learning for Segmentation of Breast Cancers on MRI Scans.
available on GitHub and can be accessed via this link GitHub Radiol Artif Intell 2022; 4(1).
- lkshrsch/BreastCancerDiagnosisMRI. 18. Laws A, Mulvey TM, Jalbert N, et al. Baseline screening MRI uptake and
findings in women with ≥ 20% lifetime risk of breast cancer. Ann Surg
Oncol 2020; 27:3595–3602.
19. Ghoncheh M, Pournamdar Z, Salehiniya H. Incidence and mortality and
APPENDIX A. SUPPORTING INFORMATION epidemiology of breast cancer in the world. Asian Pac J Cancer Prev
2016; 17:43–46.
Supplementary data associated with this article can be found 20. Hirsch, L. Breast Cancer Detection on MRI with a Deep Neural Network,
in the online version at doi:10.1016/j.acra.2024.10.014. GitHub repository, 〈https://github.com/lkshrsch/BreastCancerDiagnosisMRI〉.
(2023).
21. Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal Loss for
Dense Object Detection. in 2980–2988 (2017).
REFERENCES 22. Modat M, Ridgway GR, Taylor ZA, et al. Fast free-form deformation
using graphics processing units. Comput Methods Programs Biomed
1. Wernli KJ, DeMartini WB, Ichikawa L, et al. Patterns of breast magnetic 2010; 98:278–284.
resonance imaging use in community practice. JAMA Intern Med 2014; 23. Monaghan TF, Rahman SN, Agudelo CW, et al. Foundational statistical
174:125–132. principles in medical research: sensitivity, specificity, positive predictive
2. Bevers TB, Helvie M, Bonaccio E, et al. Breast cancer screening and value, and negative predictive value. Medicina 2021; 57:503.
diagnosis, Version 3.2018, NCCN clinical practice guidelines in on­ 24. Ozenne B, Subtil F, Maucort-Boulch D. The precision–recall curve
cology. J Natl Compr Cancer Netw 2018; 16:1362–1389. overcame the optimism of the receiver operating characteristic curve in
3. Chiarelli AM, Prummel MV, Muradali D, et al. Effectiveness of screening rare diseases. J Clin Epidemiol 2015; 68:855–859.
with annual magnetic resonance imaging and mammography: results of 25. Saito T, Rehmsmeier M. The precision-recall plot is more informative
the initial screen from the ontario high risk breast screening program. J than the ROC plot when evaluating binary classifiers on imbalanced
Clin Oncol Off J Am Soc Clin Oncol 2014; 32:2224–2230. datasets. PLOS ONE 2015; 10:e0118432.
4. Roganovic D, Djilas D, Vujnovic S, Pavic D, Stojanov D. Breast MRI, 26. Fazeli S, Snyder BS, Gareen IF, et al. Patient-reported testing burden of
digital mammography and breast tomosynthesis: comparison of three breast magnetic resonance imaging among women with ductal carci­
methods for early detection of breast cancer. Bosn J Basic Med Sci noma in situ: an ancillary study of the ECOG-ACRIN Cancer Research
2015; 15:64–68. Group (E4112). JAMA Netw Open 2021; 4:e2129697.
5. Zhang Y, Ren H. Meta-analysis of diagnostic accuracy of magnetic 27. Morris EA, Comstock CE, Lee CH, Lehman CD, Ikeda DM. ACR BI-
resonance imaging and mammography for breast cancer. J Cancer RADS® Magnetic Resonance Imaging. in ACR BI-RADS® Atlas, Breast.
Res Ther 2017; 13:862–868. Imaging Reporting and Data System. American College of Radiology;
6. Vreemann S, Gubern-Merida A, Lardenoije S, et al. The frequency of 2013.
missed breast cancers in women participating in a high-risk MRI 28. Yala A, Mikhael PG, Strand F, et al. Toward robust mammography-
screening program. Breast Cancer Res Treat 2018; 169:323–331. based models for breast cancer risk. Sci Transl Med 2021; 13:578.
7. Yamaguchi K, Schacht D, Newstead GM, et al. Breast cancer detected 29. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an
on an incident (second or subsequent) round of screening MRI: MRI AI system for breast cancer screening. Nature 2020; 577:89–94.
features of false-negative cases. Am J Roentgenol 2013; 30. Lotter W, Diab AR, Haslam B, et al. Robust breast cancer detection in
201:1155–1163. mammography and digital breast tomosynthesis using an annotation-
8. Pages EB, Millet I, Hoa D, Doyon FC, Taourel P. Undiagnosed breast efficient deep learning approach. Nat Med 2021; 27:244–249.
cancer at MR imaging: analysis of causes. Radiology 2012; 264:40–50. 31. Arasu VA, Habel LA, Achacoso NS, et al. Comparison of mammography
9. Korhonen KE, Zuckerman SP, Weinstein SP, et al. Breast MRI: false- AI algorithms with a clinical risk model for 5-year breast cancer risk
negative results and missed opportunities. RadioGraphics 2021; prediction: an observational study. Radiology 2023; 307:e222733.
41:645–664. 32. Yala A, Mikhael PG, Strand F, et al. Multi-institutional validation of a
10. Gubern-Mérida A, Vreemann S, Martí R, et al. Automated detection of mammography-based breast cancer risk model. J Clin Oncol 2022;
breast cancer in false-negative screening MRI studies from women at 40:1732–1740.
increased risk. Eur J Radiol 2016; 85:472–479. 33. Kriege M, Brekelmans CT, Boetes C, et al. Efficacy of MRI and mam­
11. Gilbert FJ, Warren RML, Kwan-Lim G, et al. Cancers in BRCA1 and mography for breast-cancer screening in women with a familial or ge­
BRCA2 carriers and in women at high risk for breast cancer: MR ima­ netic predisposition. N Engl J Med 2004; 351:427–437.
ging and mammographic features. Radiology 2009; 252:358–368. 34. Portnoi T, Yala A, Schuster T, et al. Deep learning model to assess
12. Zhang L, Tang M, Min Z, Lu J, Lei X, Zhang X. Accuracy of combined cancer risk on the basis of a breast MR image alone. Am J Roentgenol
dynamic contrast-enhanced magnetic resonance imaging and diffusion- 2019; 213:227–233.
weighted imaging for breast cancer detection: a meta-analysis. Acta 35. Uhlig J, Uhlig A, Kunze M, et al. Novel breast imaging and machine
Radiol 2016; 57:651–660. learning: predicting breast lesion malignancy at cone-beam CT using
13. King V, Brooks JD, Bernstein JL, Reiner AS, Pike MC, Morris EA. machine learning techniques. Am J Roentgenol 2018; 211:W123–W131.
Background parenchymal enhancement at breast MR imaging and 36. Wang S, Sun Y, Li R, et al. Diagnostic performance of perilesional radio­
breast cancer risk. Radiology 2011; 260:50–60. mics analysis of contrast-enhanced mammography for the differentiation
14. Pike MC, Pearce CL. Mammographic density, MRI background par­ of benign and malignant breast lesions. Eur Radiol 2022; 32:639–649.
enchymal enhancement and breast cancer risk. Ann Oncol 2013; 37. Militello C, Rundo L, Dimarco M, et al. 3D DCE-MRI radiomic analysis
24:viii37–viii41. for malignant lesion prediction in breast cancer patients. Acad Radiol
15. Dontchos BN, Rahbar H, Partridge SC, et al. Are qualitative assess­ 2022; 29:830–840.
ments of background parenchymal enhancement, amount of fi­ 38. Varoquaux G, Cheplygina V. Machine learning for medical imaging:
broglandular tissue on MR images, and mammographic density methodological failures and recommendations for the future. Npj
associated with breast cancer risk? Radiology 2015; 276:371–380. Digit Med 2022; 5:1–8.

1225

You might also like