Object Detection in Medical Imaging
Object Detection in Medical Imaging
In the era of digital medicine, a vast number of medical images are produced every
day. There is a great demand for intelligent equipment for adjuvant diagnosis to assist
medical doctors with different disciplines. With the development of artificial intelligence,
the algorithms of convolutional neural network (CNN) progressed rapidly. CNN and
its extension algorithms play important roles on medical imaging classification, object
detection, and semantic segmentation. While medical imaging classification has been
widely reported, the object detection and semantic segmentation of imaging are rarely
described. In this review article, we introduce the progression of object detection and
Edited by:
semantic segmentation in medical imaging study. We also discuss how to accurately
Youyong Kong, define the location and boundary of diseases.
Southeast University, China
Keywords: medical images, convolutional neural network, object detection, semantic segmentation, analysis
Reviewed by:
Guanzhen Yu,
Shanghai University of Traditional
Chinese Medicine, China
INTRODUCTION
Weiming Mi,
Tsinghua University, China In routine medical practice, a large number of medical images are produced in the process of
various examinations, such as radiology, ultrasound, endoscopy, ophthalmology, and pathology.
*Correspondence:
Yingyan Yu
Radiation images include X-Ray, computed tomography (CT), magnetic resonance imaging (MRI),
[email protected] and positron emission tomography-computed tomography (PET-CT). The ultrasound images
include normal ultrasound images and color Doppler ultrasound images. The endoscopic images
Specialty section: contain white light endoscopy (WLE), chromoendoscopy (CE), and magnifying endoscopy-
This article was submitted to narrow-band imaging (ME-NBI). The images of ophthalmology deal with optical coherence
Cancer Imaging and Image-directed tomography (OCT) images, while the pathological images cover gross images and microscopic
Interventions, images (Figure 1). Clinical doctors have to spend a great deal of time to screen and evaluate
a section of the journal
these images.
Frontiers in Oncology
With the development of artificial intelligence (AI), AI industries gradually enter into medical
Received: 26 December 2020 fields, and involve in medical imaging analysis, that help doctors to solve diagnostic problems
Accepted: 11 February 2021
and improve efficiency (1). AI is a branch of computer science for designing and executing
Published: 09 March 2021
tasks originally carried out by human intelligence (2). Machine learning (ML) is a kind of
Citation: technologies using computer to perform repetitive and well-defined tasks (3–5). ML includes
Yang R and Yu Y (2021) Artificial
supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
Convolutional Neural Network in
Object Detection and Semantic
The supervised learning means that the training dataset is labeled by medical experts. The
Segmentation for Medical Imaging unsupervised learning means that the training dataset is unlabeled. The semi-supervised learning
Analysis. Front. Oncol. 11:638182. means that a part of training data is labeled and others are unlabeled. The reinforcement learning
doi: 10.3389/fonc.2021.638182 receives feed-back to obtain the learning information and update the model parameter. Deep
FIGURE 1 | Categories of medical images. The radiological images (Purple) contain X-Ray, CT, MRI, and PET. The ultrasound images (Red) contain normal ultrasound
and color Doppler images. The pathological images (Green) include gross images and microscopic images. The endoscopic images (Blue) include WLE, CE, ME-NBI
and so on. The images of ophthalmology include OCT (Orange).
learning (DL) is a new direction in ML, which is based on the The construction and application of CNN developed rapidly
simulating neural network structure of human brain to build after 1980. For instance, Simonyan et al. developed VGG16
a computational model (5, 6). DL is often used in analysis (13 convolutional layers and three fully connection layers) and
of high-dimensional data, including image classification, object VGG19 (16 convolutional layers and three fully connection
detection, and semantic segmentation. Convolutional neural layers) (8). Szegedy et al. developed GoogLeNet model with small
network (CNN) is the representative algorithm of DL. convolution kernel to reduce the computational complexity by
changing convolution modules in 2014 (9). He et al. constructed
CNN AND ITS EXTENSION ResNet model, which accelerates the network convergence speed
and improve the image classification effect by increasing the
The research of CNN could be traced back to 1962, when Hubel network depth (10). Some new models in combination with
and Wiesel analyzed the structure of visual cortex in the cat brain different features of the above models were gradually constructed,
and found that the biological visual information is transferred such as DenseNet and Inception-ResNet-v2, etc. MobileNet is
through multi-layer receptive domain (7). They tried to construct a lightweight CNN model which is introduced by Google in
similar algorithms to make the machine recognizes images. the Conference on Computer Vision and Pattern Recognition
in 2017. This model utilized depth-wise separable convolutions a softmax classifier and class-specific bounding box regression
to compress model parameters and improve computing speed simultaneously. The speed is increased three to ten times in
(11). As a lightweight CNN, MobileNet can be set into mobile training and testing sets. The Faster R-CNN also utilized CNN
equipment to achieve mobile prediction. To optimize the speed to extract features and obtained region of interest (ROI) using
and accuracy, Tan et al. introduced EfficientDet, which contains region proposal network. The most important improvement of
eight model structures, including EfficientDet D 0 to EfficientDet Faster R-CNN is to establish an integrated, simpler, and faster
D7 (12). object detection framework relied on CNNs. Lin et al. introduced
Image classification is the popular application of CNN feature pyramid model into Faster R-CNN to establish feature
algorithms. Recently, scientists tried to integrate traditional CNN pyramid network (FPN) to achieve state-of-the-art result without
algorithms with object detection and semantic segmentation. The sacrificing speed and internal storage, which is more suitable for
purpose of object detection is to make sure whether there are small object detection. The backbone network of feature pyramid
objects from predefined categories. For instance, it can be used to model utilizes ResNet with three additional parts, including
determine the existence and region of tumors on organs or tissues bottom-up pathway, top-down pathway and lateral connection
of medical images. If the target is present, it could be indicated (17). However, the disadvantages of the two-stage framework are
on spatial location. The object in images is marked by a frame the requirement of large resources for computation.
(like a boundary box) with the confidence on the top of boundary To overcome the above shortcomings, scientists developed
box (13). Object detection can perform many tasks such as the detection strategy of one-stage framework. In one-stage
lesion location, lesion tracking and image discrimination. The framework, all computation is encapsulated in a single network.
application of object detection in medical images is extremely YOLO is the abbreviation of You-Only-Look-Once. YOLO
wide. Semantic segmentation is another algorithm in which solved an object detection problem as a regression problem.
computer segments images based on the pixels presented in the The input images are proposed with an inference that enabled
images. The semantic refers to the content of the image, and the position and category of all objects in the images. YOLO
the segmentation means that different objects in the image are is originated from GoogLeNet containing 24 convolution layers
segmented based on pixels. In semantic segmentation analysis, and two fully connection layers. It used a 1 × 1 convolution layer
each pixel in the image is labeled (14). and a 3 × 3 convolution layer to replace the Inception structure
(18, 19). Single Shot Multibox Detector (SSD) (15) is faster
than YOLO and competitive with region-based detectors such as
THE COMMON ALGORITHMS FOR Faster R-CNN with a higher accuracy. SSD inherits the method
OBJECT DETECTION of transforming detection to regression, and completes region
proposals and classification in one stage. It improves the running
Object detection in medical images refers to identify location speed and detection accuracy compared with other frame works.
of lesions and classify different objects. Popular algorithms Most of the one-stage framework can adjust parameters of prior
include R-CNN, Fast R-CNN, Faster R-CNN, PFN, PSPNet, SSD, boxes to show all potential object locations and classify objects
YOLO, CenterNet, and EfficientNet (15, 16). Object detection has in detection results. This detecting method takes too much time
two steps: (1) the target feature extraction, and (2) classifying and reduces detection efficiency. The CenterNet model achieves
and positioning the objects. The target feature extraction of improvement in speed and accuracy. The key-point estimation
image relies on CNN automatically. There are two types of is utilized in CenterNet to find the central point and go back
frameworks for object detection: two-stage detection framework, to other object properties that do not need detection of all
and one-stage detection framework. The former includes a potential objects with high accuracy. Whereas, the EfficientDet
preprocessing step for generating object detection proposal and designed by Tan et al. used EfficientNet as backbone, and
a step for object detection. The latter has an integrated process constructed bi-directional feature pyramid network (BiFPN) to
containing both steps. The two-stage framework contains two obtain continuous fusion of up-sampling and sub-sampling (12).
parts. The first part is to extract CNN features of regions from
images without category information. The second part is to use
category-specific classifier to determine the category labels. The THE COMMON ALGORITHMS FOR
one-stage framework includes SSD, YOLO and CenterNet and SEMANTIC SEGMENTATION
EfficientNet series, that are relatively fast but less accurate. Object
detection algorithms get the predicted box through the prior Before deep learning was applied to computer vision, researchers
box technology, and then adjust parameters of prior boxes to always use TextonForest or Random Forest to construct the
obtain the result of the predicted box. Moreover, the CenterNet classifier for semantic segmentation. With the development of
algorithm could provide the center point detection by means of deep learning, especially the emergence of CNN, computer
predicted box in images (Figure 2). algorithms with deep learning not only classify images accurately,
The R-CNN means the region of CNN, which is based on the but also perform better work on segmentation. It is known that
framework of AlexNet. The processing of this framework begins images composed of many pixels. In the task of image semantic
from input images. Then the proposed regions are extracted segmentation, computer algorithms segment images based on
and CNN features are computed to achieve region classification the semantic and pixels presented in images. The input is a
(14). The Fast R-CNN solves some problems of R-CNN, and three-channel RGB image of H × W × 3 and the output is
improves the detecting speed and quality. This framework uses a corresponding H × W matrix whose element indicated the
FIGURE 2 | The sketch of object detection. (A) A medical image from an endoscopic examination. (B) In prior box technology, a group of prior boxes is created
during object detection. (C) A predicted box is presented. (D) The same medical image as above from the endoscopic examination. (E) The CenterNet model is used
for center point detection. (F) Only center point of the object is presented in the predicted box.
semantic label in the corresponding pixel. The analytic results algorithm for semantic segmentation in 2015. In UNet analysis,
of semantic segmentation not only identify the objects, but two steps of feature extraction, sub-sample and up-sample are
also mark the boundaries of each object. The current popular needed. Since the network structure is like the letter “U,” it is
algorithms for semantic segmentation include FCN, SegNet, called UNet (22). The UNet model is simple with few parameters.
PSPNet, DeepLab and UNet (20–22). Fully Convolutional It is suitable for the classification of medical images with small
Network (FCN) uses the fully convolutional network for the end- dataset. In medical imaging analysis, it is easy to overfit if
to-end training of segmentation. FCN modifies the structure of more parameters are involved in the model. Therefore, the
VGG16 and other networks to generate segmented images of the UNet model performs well in most medical imaging analysis.
same size with an input of non-fixed size. The most important To improve the efficiency, Zhou et al. introduced UNet++,
change of FCN is that all fully connected layers are replaced by a nested UNet architecture for medical image segmentation.
convolutional layers. The structure of FCN includes convolution, UNet++ borrowed the dense connection of DenseNet and
upsampling and skip connection. FCN can accept input images improved the skip connection structure of UNet (25).The VNet
of any size and avoid repetition of storing and calculating the is constructed to satisfy the need of analyzing 3D images in CT
convolution. In 2015, Badrinarayanan et al. proposed SegNet, a and MRI images, that have similar structure of UNet (26). In
new framework of semantic segmentation (23). This framework DeepLab framework (DeepLab v1), atrous convolution was used
is based on FCN and constructs an encoder-decoder structure in combination of CNN for semantic segmentation. To optimize
based on VGG16. It is the first time to utilize symmetric network performance, DeepLab v2 added a new model, atrous spatial
structure and had good performance on semantic segmentation. pyramid pooling (ASPP), which utilized atrous convolution to get
Based on FCN, PSPNet utilizes pre-trained ResNet101 as the multi-scale information and reduces computation instead of fully
feature extraction layer and introduces the pyramid pooling connection layer. And DeepLab v3 improved the ASPP model
module to identify the prior information of the context in the with one 1 × 1 convolution and three 3 × 3 convolution. This
image. It has fantastic understanding and higher identification framework is a genetic framework which can be applied to any
of complex scenes (24). Ronneberger et al. proposed U-Net network such as VGG, and ResNet. For DeepLab v3, a simple and
FIGURE 3 | Algorithms of CNN backbone, object detection and semantic segmentation. (A) CNN backbones. To achieve feature extraction in medical images, many
CNN models can be selected such as VGG16, VGG19, GoogLeNet, ResNet, Inception-ResNet-v2, Xception, DenseNet, MobileNet, and EfficientDet. (B) Object
detection algorithms include two types. The two-stage framework (left branch) includes R-CNN, Fast R-CNN, Faster R-CNN and FPN with high accuracy. The
one-stage framework (right branch) includes SSD, YOLO, CenterNet and EfficientDet with fast speed but low accuracy. (C) Semantic segmentation algorithms are
divided into FCN, DeepLab, and UNet series. FCN (left branch) is the first algorithm which uses fully convolution network without fully connection layers. SegNet and
PSPNet are based on it. DeepLab (middle branch) is a novel algorithm under development. UNet (right branch) is the most popular algorithm. The UNet++, VNet and
MIFNet are the derivatives of UNet.
efficient decoder model was designed to improve segmentation a neural network framework that is specifically targeted at
results (21, 27). GPU-accelerated deep artificial neural network programming.
Comparing with Caffe and Tensorflow, Pytorch has become the
most popular framework in 2019. As an open-source framework
THE PERFORMANCE COMPARISON OF by Facebook, Pytorch is compact, easy to use and supports
AVAILABLE ALGORITHMS dynamic graphs (31).
The performance of object detection and semantic
There are a large number of open-source packages for running segmentation algorithms is highly dependent on the data.
CNN programs. The convolutional architecture for fast feature To avoid overfitting, some image augmentation methods could
embedding (Caffe) was born in Berkeley, California and now be used to ensure input sufficient data size including flipping,
hosted by Berkeley Vision and Learning Center (BVLC). Caffe cropping, rotation, translation, noise injection, random erasing,
is an early framework with high-performance and seamless mixing images and so on (32).The advantages and disadvantages
switching between CPU and GPU models. It supports cross- of the above introduced algorithms of object detection and
platform of Windows, Linux and Mac (28). With the emergence semantic segmentation (Figure 3) are listed in Table 1.
of Tensorflow and Pytorch, Caffe is not the first choice any The performance of deep learning algorithms could be
more. Tensorflow is the open resource of Google at November, evaluated by several parameters. Researchers optimize their
2015, and then updated to Google TensorFlow 1.0 in 2017 models by the indexes of accuracy, specificity, sensitivity, recall,
(29). Keras is a re-encapsulation of Tensorflow to support receiver operating characteristic curve (ROC), and area under
a fast practice allowing researchers to quickly turn ideas curve (AUC). As the specific indexes to evaluate the training
into results (30). Pytorch is the python version of torch, results, in the field of object detection, mean average precision
TABLE 1 | Advantages and limitations of object detection and semantic segmentation models.
Object detection Two-stage R-CNN CNN extracts features automatically. Extract features in every object box using
framework CNN;
Slow speed and low accuracy.
Fast R-CNN Single feature extraction by CNN. Extract ROI by selective searching. Slow.
Fast and high accuracy.
Faster R-CNN Adding RPN to extract ROI. The Detecting whole image by RPN with slow
object detection rate is high. speed.
FPN Adding feature pyramid model, and Slow speed comparing with one-stage
good for small object detection. framework.
One-stage YOLO Based on GoogLeNet, fast in speed. Bad performance for small object
framework detection.
More parameters and higher occupation
of GPU than SSD.
SSD Balance advantages of YOLO and Bad performance in small object detection
Faster R-CNN with high detection comparing with Faster R-CNN.
speed and high object detection rate.
CenterNet The balance of speed and accuracy. Difficult to deal with the coincidence of two
Using the key-point estimation to find object centers.
the central point.
EfficientDet Introducing BiFPN to obtain Parameter setting relys on experience.
continuous fusion of up-sampling and
sub-sampling.
Semantic segmentation FCN Becoming full convolutional layer Low accuracy of feature maps with high
(without connected layer). GPU occupation.
SegNet The first symmetric network. Slow speed.
UNet The structure is simple like the letter U Difficult to obtain uniform standard of
with less parameter. Suitable for sub-sampling and up-sampling.
object detection in small number of
medical images.
DeepLab Using atrous convolutional layer. The atrous convolution layer occupied
high GPU.
PSPNet Using the Pyramid pooling module to Base backbone of ResNet101 made
identify the prior information; processing speed slow.
Fantastic understanding and high
identification of complex scenes.
(mAP) is introduced. The AP value is presented by a curve However, inexperienced doctors may overlook some atypical
according to all precision values and recall values. The horizontal lesions because most of those lesions arise from atrophic
coordinate represents the recall value, and the vertical coordinate mucosa that results in false-negative results. The object detection
represents the precision value. The region under the curve is the algorithm could detect lesions automatically and assist diagnosis
AP value of one class. The mAP value means the AP average of during the process of endoscopic examination. Hirasawa et
all classes. In semantic segmentation algorithm, intersection over al. used SSD to diagnose gastric cancer in chromoendoscopic
union (IoU) is used to evaluate the testing results. IoU refers to images. The training dataset consisted of 13,584 images and
the ratio of predicted region and marked region. The higher the the test dataset included 2,296 images from 77 gastric lesions
IoU value, the better the model. in 69 patients. The SSD performed well to extract suspicious
lesions and evaluate early gastric cancer. The result showed that
the time spent for analyzing 2,296 images is 47 s, and the total
THE APPLICATION OF OBJECT sensitivity was 92.2%. It meant that SSD model could analyze a
DETECTION IN MEDICAL IMAGE large number of endoscopic images in a short period of time, and
ANALYSIS greatly reduced the load of endoscopic doctors (33). Wu et al.
proposed an object detection model-ENDOANGEL for real-time
Different types of algorithms can be applied in different medical gastrointestinal endoscopic examination. The ENDOANGEL can
image analyzing. Endoscopy is an essential tool for the diagnosis efficiently extract suspicious lesions and evaluate the severity of
of digestive diseases. Endoscopy makes lesions of the digestive lesions. ENDOANGEL has been utilized in many hospitals in
tract visible and biopsies can be taken for histology. It is often China for assisting clinical diagnosis (34). Gao et al. analyzed
used for early diagnosis or follow-up of cancers postoperatively. peri-gastric metastatic lymph nodes of CT images using Faster
FIGURE 4 | Evolutional tree of AI algorithms for medical image analysis. The CNN model is the backbone algorithm. And then, object detection and semantic
segmentation are developed. The two algorithms are further divided into different branches including FCN, UNet and DeepLab, faster RCNN, SSD, and YOLO. The
ends of each branch correspond to the application in various medical images. Endo refers to endoscopic images. OCT means the images of optical coherence
tomography.
R-CNN. The analysis was divided into two stages, the initial series of algorithms of DeepLab provide a great choice for
learning stage for training and the precise learning stage for accurate delineation of tumor margin. In the examination of
fine-tuning and testing. The result showed that, in the initial cancers, accurate delineation of tumor margin is critical for
learning stage, the recall rates of nodule classes for training set the choice of treatment and surgical resection, especially when
and validation set, the mAP was 0.5019 and AUC was 0.8995. the resection is performed under the endoscope. Luo et al.
In the precise learning stage, the mAP and AUC were 0.7801 developed a Gastrointestinal Artificial Intelligence Diagnostic
and 0.9541, which was obviously improved, compared to initial System (GRAIDS) based on DeepLab v3+ for diagnosis of upper
learning stage. Thus, the Faster R-CNN model had high judgment gastrointestinal cancers in endoscopic images. After input the
effectiveness and recognition accuracy for CT diagnosis of peri- endoscopic images of the upper gastrointestinal tract cancers,
gastric metastatic lymph nodes (16). the model provides two outputs, a standard two-class task for
lesion classification and a semantic segmentation task capturing
THE APPLICATION OF SEMANTIC the tumor regions. The accuracy of using this system was 95.5%
SEGMENTATION IN MEDICAL IMAGE in the internal validation dataset, 92.7% for the prospective
ANALYSIS validation dataset, and 91.5 to 97.7% for the external validation
dataset. The diagnostic sensitivity of GRAIDS was similar to
For accurate delineation of lesions border, sematic segmentation endoscopic experts and superior to non-expert endoscopic
based on CNN backbones has a potential application. This doctors (35).
UNet and its extension models are a series of algorithms to fields of medical imaging study, particularly in the digestive
achieve semantic segmentation in medical field. An et al. reported system, respiratory system, endocrine system, cardiovascular
that UNet++ model can delineate the resection margins of system, brain, eye, and breast. These algorithms can be
early gastric cancer under the indigo carmine chromoendoscopy used to analyze multiple images including radiation images
or white light endoscopy (36). Besides accurate delineation of (CT, MRI, and PET), pathological images, ultrasound
tumor margin, for beast MRI images, Piantadosi et al. aimed images, and endoscopic images. The development of
to construct and modify a DCNN model based on UNet to various algorithms and their application are presented
achieve 3D MRI image segmentation of breast parenchyma from in Figure 4.
other tissues. There were two datasets, the first dataset was a However, there are some limitation of object detection
private dataset and the second one was a public breast MR and semantic segmentation in the application of analyzing
image dataset. After training and testing, the result showed medical images. In the model training stage, a large
that the modified model performed better and the median dice number of medical images are needed. In addition, both
similarity coefficient (DSC) for both the datasets was 96.60 object detection and semantic segmentation belongs
and 95.78% (37). At present, the contradiction between a large to supervised algorithms, which require experienced
number of pathological images and a shortage of pathologists doctors to label images. Therefore, future study should
was a problem worldwide. There is a great opportunity in focus on how to use limited medical images to get good
the field of pathology for deep learning algorithm. Cai et al. training results.
constructed a multi-input model called MIFNet to segment
lesions in pathological images, and increase the dice coefficient AUTHOR CONTRIBUTIONS
to 81.87%. That was a great progress because the dice coefficient
in some existing segmentation models was relatively low, i.e., YY and RY designed the study. RY wrote the
67.73% in UNet, and 63.89% in SegNet. They believed that manuscript. YY reviewed and revised the manuscript.
semantic segmentation algorithm was suitable for analyzing All authors contributed to the article and approved the
pathological images (38). In addition, the ENDOANGEL model submitted version.
not only realized automatic object detection during endoscopic
examination, but also realized semantic segmentation. An et FUNDING
al. reported that UNet++ model can delineate the resection
margins of early gastric cancer under the indigo carmine This work was partially supported by grants from the
chromoendoscopy or white light endoscopy (36). Wickstrømand Shanghai Science and Technology Committee (18411953100
et al. utilized semantic segmentation models of FCN, UNet and and 20DZ2201900), National Key R&D Program of China
SegNet to analyze endoscopic images of colorectal polyps. The (2017YFC0908300 and 2016YFC1303200), National Natural
result showed that the FCN performed better than other two Science Foundation of China (82072602 and 81772505),
models (39). the Cross-Institute Research Fund of Shanghai Jiao Tong
University (YG2017ZD01), Shanghai Collaborative Innovation
CONCLUSION Center for Translational Medicine (TM202001, TM201617, and
TM201702). The funders had no role in study design, data
Both object detection and semantic segmentation algorithms collection and analysis, decision to publish, or preparation of
are based on CNN. They are widely applied in various the manuscript.
REFERENCES 7. Hubel DH, Wiesel TN. Reccptive field, binocular interaction and functional
architecture in the cat’s visual cortex. J Physiol. (1962) 160:106–54.
1. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation doi: 10.1113/jphysiol.1962.sp006837
of artificial intelligence technologies in medicine. Nat Med. (2019) 25:30–6. 8. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale
doi: 10.1038/s41591-018-0307-0 image recognition. arXiv. (2014) 1409.1556.
2. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al. 9. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going Deeper
Deep learning: a primer for radiologists. Radiographics. (2017) 37: 2113–31. with Convolutions. In: Proceedings of the IEEE Conference on Computer
doi: 10.1148/rg.2017170077 Vision and Pattern Recognition. Boston, MA: Scientific Research (2015), 1–9.
3. Erickson BJ, Korfiatis P, Akkus Z, Kline TL. Machine learning for medical doi: 10.1109/CVPR.2015.7298594
imaging. Radiographics. (2017) 37:505–15. doi: 10.1148/rg.2017160130 10. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:
4. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. (2015) 521:436–44. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
doi: 10.1038/nature14539 Las Vegas, NV. (2016). p. 770–778.
5. Pesapane F, Codari M, Sardanelli F. Artificial intelligence in medical imaging: 11. Yu D, Xu Q, Guo H, Zhao C, Lin Y, Li D. An efficient and lightweight
threat or opportunity? Radiologists again at the forefront of innovation convolutional neural network for remote sensing image scene classification.
in medicine. EurRadiol Exp. (2018) 2:35. doi: 10.1186/s41747-018-0 Sensors (Basel). (2020) 20:1999. doi: 10.3390/s20071999
061-6 12. Tan M, Pang R, Le QV. EfficientDet: Scalable and Efficient Object
6. Yang YJ, Bang CS. Application of artificial intelligence in gastroenterology. Detection.2020IEEE/CVF Conference on Computer Vision and Pattern
World J Gastroenterol. (2019) 25:1666–83. doi: 10.3748/wjg.v25.i14.1666 Recognition (CVPR). Seattle, WA (2020). p. 10778–10787.
13. Venhuizen FG, Bram VG, Bart L, Freekje VA, Vivian S, Sascha F, et al. Deep 28. Sladojevic S, Arsenovic M, Anderla A, CulibrkD, Stefanovic D. Deep neural
learning approach for the detection and quantification of intraretinal cystoid networks based recognition of plant diseases by leaf image classification.
fluid in multivendor optical coherence tomography. Biomed Opt Express. Comput Intell Neurosci. (2016) 3289801. doi: 10.1155/2016/3289801
(2018) 9:1545. doi: 10.1364/BOE.9.001545 29. Ju Y, Wang X, Chen X. Research on OMR recognition based on convolutional
14. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate neural network tensorflow platform. In: 2019 11th International Conference
object detection and semantic segmentation. In: 2014 IEEE Conference on on Measuring Technology and Mechatronics Automation (ICMTMA). Qiqihar.
Computer Vision and Pattern Recognition. Columbus, OH. (2014). p. 580–587. (2019). p. 688–691.
doi: 10.1109/CVPR.2014.81 30. Vani AK, Raajan RN, Winmalar DH, Sudharsan R. Using the keras model for
15. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al. SSD: single accurate and rapid gender identification through detection of facial features.
shot MultiBoxDetector. In European Conference on Computer Vision, Cham: In: 2020 Fourth International Conference on Computing Methodologies and
Springer. (2016). p. 21–37 doi: 10.1007/978-3-319-46448-0_2 Communication (ICCMC). Erode. (2020). p. 572–574.
16. Gao Y, Zhang ZD, Li S, Guo YT, Wu QY, Liu SH, et al. Deep 31. Florencio F, Valenca T, Moreno ED, Colaço M Jr. Performance analysis of deep
neural network-assisted computed tomography diagnosis of metastatic learning libraries: tensorflow and PyTorch. J Comput Sci. (2019) 15:785–99.
lymph nodes from gastric cancer. Chin Med J. (2019) 132:2804–11. doi: 10.3844/jcssp.2019.785.799
doi: 10.1097/CM9.0000000000000532 32. Ma B, Guo Y, Hu W, Yuan F, Zhu Z, Yu Y, et al. Artificial intelligence-
17. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature based multiclass classification of benign or malignant mucosal lesions of the
pyramid networks for object detection. In: 2017 IEEE Conference on stomach. Front Pharmacol. (2020) 11:572372. doi: 10.3389/fphar.2020.572372
Computer Vision and Pattern Recognition (CVPR). (2016). p. 936–944. 33. Hirasawa T, Aoyama K, Tanimoto T, Ishihara S, Shichijo S, Ozawa T, et al.
doi: 10.1109/CVPR.2017.106 Application of artificial intelligence using a convolutional neural network
18. Zhuang Z, Liu G, Ding W, Raj ANJ, Qiu S, Guo J, et al. Cardiac VFM for detecting gastric cancer in endoscopic images. Gastric Cancer. (2018)
visualization and analysis based on YOLO deep learning model and modified 21:653–60. doi: 10.1007/s10120-018-0793-2
2D continuity equation. Comput Med Imaging Graph. (2020) 82:101732. 34. Wu L, Zhou W, Wan X, Zhang J, Shen L, Hu S, et al. A deep neural network
doi: 10.1016/j.compmedimag.2020.101732 improves endoscopic detection of early gastric cancer without blind spots.
19. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, Endoscopy 51:522–31. doi: 10.1055/a-0855-3532
real-time object detection. In: 2016 IEEE Conference on Computer Vision and 35. Luo H, Xu G, Li C, He L, Luo L, Wang Z. Real-time artificial
Pattern Recognition (CVPR). (2016). p. 779-88. intelligence for detection of upper gastrointestinal cancer by endoscopy: a
20. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic multicentre, case-control, diagnostic study. Lancet Oncol. (2019) 20:1645–54.
segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern doi: 10.1016/S1470-2045(19)30637-0
Recognition (CVPR). (2015). p. 7–12. 36. An P, Yang D, Wang J, Wu L, Zhou J, Zeng Z, et al. A deep
21. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with learning method for delineating early gastric cancer resection margin under
atrous separable convolution for semantic image segmentation. In: European chromoendoscopy and white light endoscopy. Gastric Cancer. (2020) 23:884–
Conference on Computer Vision. Cham: Springer (2018). p. 833–851. 92. doi: 10.1007/s10120-020-01071-7
22. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for 37. Piantadosi G, Sansone M, Fusco R, Sansone C. Multi-planar 3D breast
biomedical image segmentation. In: Medical Image Computing and Computer- segmentation in MRI via deep convolutional neural networks. Artif Intell Med.
Assisted Intervention(MICCAI), Cham: Springer. (2015). p. 234–41. (2020) 103:101781. doi: 10.1016/j.artmed.2019.101781
23. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional 38. Cai L, Gao J, Zhao D. A review of the application of deep learning in
encoder-decoder architecture for image segmentation. In: IEEE Transactions medical image classification and segmentation. Ann Transl Med. (2020) 8:713.
on Pattern Analysis & Machine Intelligence. (2017). p. 1–1. doi: 10.21037/atm.2020.02.44
24. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. 2017 IEEE 39. Wickstrøm K, Kampffmeyer M, Jenssen R. Uncertainty and interpretability
Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, in convolutional neural networks for semantic segmentation of colorectal
HI, (2017). p. 6230–9. polyps. Med Image Anal. (2020) 60:101619. doi: 10.1016/j.media.2019.101619
25. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: a nested
U-net architecture for medical image segmentation. Deep Learn Med Conflict of Interest: The authors declare that the research was conducted in the
Image Anal Multimodal Learn Clin Decis Support. (2018) 11045:3–11. absence of any commercial or financial relationships that could be construed as a
doi: 10.1007/978-3-030-00889-5_1 potential conflict of interest.
26. Milletari F, Navab N. V-net: fully convolutional neural networks for
volumetric medical image segmentation. In: 2016 Fourth International Copyright © 2021 Yang and Yu. This is an open-access article distributed under the
Conference on 3D Vision (3DV). (2016). p. 565–571. doi: 10.1109/3DV. terms of the Creative Commons Attribution License (CC BY). The use, distribution
2016.79 or reproduction in other forums is permitted, provided the original author(s) and
27. Liangchieh C, Papandreou G, Kokkinos I, Murphy K, Yuille A. Semantic the copyright owner(s) are credited and that the original publication in this journal
image segmentation with deep convolutional nets and fully connected CRFs. is cited, in accordance with accepted academic practice. No use, distribution or
Computerence. (2014) 4:357–361. reproduction is permitted which does not comply with these terms.