Bone Fracture Detection Through The Two-Stage Syst
Bone Fracture Detection Through The Two-Stage Syst
PII: S2352-9148(20)30602-X
DOI: https://doi.org/10.1016/j.imu.2020.100452
Reference: IMU 100452
Please cite this article as: Ma Y, Luo Y, Bone Fracture Detection Through the Two-stage System
of Crack-Sensitive Convolutional Neural Network, Informatics in Medicine Unlocked, https://
doi.org/10.1016/j.imu.2020.100452.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
of
Abstract
ro
Automated fracture detection is an essential part in a computer-aided tele-
-p
medicine system. Fractures often occur in human’s arbitrary bone due to
accidental injuries such as slipping. In fact, many hospitals lack experienced
re
surgeons to diagnose fractures. Therefore, computer-aided diagnosis (CAD)
lP
reduces the burden on doctors and identifies fracture. We present a new clas-
sification network, Crack-Sensitive Convolutional Neural Network (CrackNet),
na
by using CrackNet. Total of 1,052 images are used to test our system, of which
526 are fractured images and the rest are non-fractured images. We assess the
performance of our proposed system with X-ray images from Haikou People’s
Hospital, achieving 90.11% accuracy and 90.14% F-measure. And our system is
better than other two-stage systems.
Keywords: Crack-Sensitive Convolutional Neural Network, Schmid Filters,
Fracture Detection, Faster RCNN
∗ Correspondingauthor
Email address: [email protected] (Yixin Luo)
Fractures often occur in infants, the elderly and young people due to falls,
crashes, fights and other accidents [1]. Many doctors use medical images to
make judgement on whether bone fractures occur. With the development of
5 medical sophisticated machines, there are lots of ways to get multiple kinds of
high quality medical images, such as X-ray, computer tomograph(CT), magnetic
resonance imaging(MRI), and ultrasound [2]. It is a regular way to determine
of
the presence and severity of the bone fracture through visual inspection of an X-
ray image attempting to receive suitable treatments [3]. An experienced doctor
ro
10 needs to take a lot of time inspecting where bone fracture happened in an X-ray
-p
image. However, in many hospitals there is a lack of experienced radiologists to
deal with these medical images. In order to assist doctors in the bone fracture
re
detection, computer aided diagnosis (CAD) has been widely used in analysis of
lP
the medical images and it has received an increasing attention these years [4].
15 Previous work [3, 4] in bone fracture detection consists of three major steps:
na
(1) X-ray images denoising, (2) feature extraction, and (3) image classification.
One commonality of these previous work was that they focused on either a single
ur
anatomical region or a single type of fracture [3], e.g. tibia (open fracture), arm
and femur neck (subtle fracture). This method [4] could only recognize whether
Jo
20 the bone image was fractured, but the fracture region could not be identified.
However, in practice, expert doctors have to detect fracture in different anatom-
ical parts. Thus, a more practical system would be helpful to reasonably detect
bone fractures on different types of bones in human body. Building such a sys-
tem is a very challenging task due to the large variations across different types
25 of bone. We propose a system that possess this universal capability.
In this study, we propose a system which adopts the idea of using Faster R-
CNN and Crack-Sensitive Convolutional Neural Network (CrackNet) to detect
bone fracture. It is of great importance for doctors to diagnose where bone
fractures happen with X-ray images. Previous method is only to detect the
30 fracture of a single bone region like distal radius [3]. However, there are fractures
2
of different types of bones in the X-ray image. We firstly use Faster R-CNN to
detect the boundingbox of each bone in the X-ray image and classify the bone.
The second step is to detect fracture in different types of bone region by using
Crack-Sensitive CNN. And this two-stage system gets possible fracture regions
35 in an X-ray image to relieve doctors’ burden. As shown in Figure 1, it gets
bones’ segmentation and fracture region by using our system. More results are
addressed in supplementary experiment.
Most importantly, our contribution as following:
of
1. The two-stage system proposed in this paper gets possible fracture regions
ro
40 in an X-ray image to relieve doctors’ burden.
2. We propose a new Crack-Sensitive Convolutional Neutral Network, which
-p
has good feature expression for bone fracture recognition. As we show in exper-
re
iments, the CrackNet can better reflect the fracture line.
3. We are the first to use Faster R-CNN to detect 20 different types of bones.
lP
na
ur
Jo
Figure 1: An X-ray image of fracture region through using our system. The category repre-
sented by each bounding box in the middle picture corresponds to the category written in the
below rectangle, which has the same color as the bounding box. The bounding box in the
third image represents the fractured region.
3
2. Related Work
50 X-ray, which creates images of any bone including the hand, wrist, hip,
pelvis and so on, is one of the oldest and frequently used forms in clinical
medicine [2]. A typical bone ailment is the fracture, which occurs when bone
can not withstand outside force like direct blows, twisting injuries or falls [1].
Fractures are cracks in bones and are defined as a medical condition in which
55 there is destroyed in the continuity of the bone [1]. Detection and correct
of
treatment of fractures are considered important since a wrong diagnosis often
cause ineffective patient management, increased dissatisfaction and expensive
ro
litigation [5]. Bone fracture detection is a challenging task, especially in presence
-p
of noise. It differs from traditional object detection in several key aspects: 1.
60 Different bone in X-ray images varies a lot in terms of its scales [5]. In human
re
bone structure diagram, there are different types of bones like skull, wrist,radius
lP
and so on. 2. Different types of fractures have different textures and shapes,
including Traverse fracture, Open fracture, Simple fracture, Spiral fracture and
na
region [7]. As reported, they used adaptive windowing, boundary tracing and
wavelet transform to extract feature in the pelvic CT images, and then used a
70 registered active shape model to detect fracture [8]. Or Yu et. al [9] used stacked
random forests based on feature fusion to detect fracture in X-ray images. After
edge and shape features are extracted from bone, multiple classifiers, such as
Back Propagation Neural Network, K-Nearest Neighbor, Support Vector Ma-
chine, Max/Min Rule, Product Rule, are fused to design as combining classifier
75 to detect fracture [10, 11]. Among others, mathematical morphology has been
widely used in bone fracture detection [12]. These methods are based on the
entire image to determine whether the image is fractured [4, 7, 12], but cannot
determine which bone region is fractured.
4
In previous work [13], an entropy-based thresholding approach was used for
80 segmenting a bone in X-ray images from its surrounding flesh region. Many
people detected fracture in one bone of human body. In [14], the authors pro-
posed a system to automatically detect fractures by using filtering algorithms
to remove the noise, using edge detection methods to detect the edgesin, using
the Wavelet andthe Curvelet transforms to extract features and building the
85 classification algorithms like a decision tree, in hand bones using x-ray images.
Chai et. al. [15] proposed a Gray Level Cooccurrence Matrix (GLCM) based
of
algorithm to detect the fracture in femur if it was existed. Also in [16], they
ro
detected the fracture of femur by using modified canny edge detection algorithm
to extract femur contour, measuring neck-shaft angle from femur contour, and
90
-p
then using the neck-shaft angle to build the classification algorithms. In [17],
re
the authors use processing techniques like pre-processing, segmentation, edge
detection and feature extraction methods to preprocess the X-ray / CT images,
lP
and then use different types of classifier are used such as decision tree (DT) and
neural network (NN) and meta-classifier to classify fractured and non-fractured
na
95 image, and they get good accuracy of 85% on 40 images. The paper [7] used
to an entropy-based segmentation method with an adaptive thresholding-based
ur
contour tracing localize the line-of-break for easy visualization of the fracture,
Jo
and then identified its orientation, and assessed the extent of damage in the bone
about long-bone digital X-ray image. Mahendran and Baboo [18] proposed a
100 fusion classification technique for automatic detection of existence of fractures
in the Tibia bone (one of the long bones of the leg). Another study [19, 2]
adopted deep convolutional networks (ConvNets) for the automated detection
in posterior element fractures of the spine with CT images. These methods only
detect bone fracture in medical images of a specific bone [17, 18, 2, 7, 16, 15],
105 however, these methods cannot detect fractures in medical images of different
types of bones in the human body.
In order to assist doctors in fracture detection, we need to determine the spe-
cific area of fracture in X-ray images. Firstly, we divide each bone in a medical
image. For segmentation of bone, previous work had used segmentation entropy
5
110 quantitative assessment(SEQA) [20], classical Canny edge detector [21, 22] and
genetic algorithms [23] to segment medical images. Also in [24], they used 2D
and 3D CNNs for an automatic proximal femur segmentation in structural MR
images. However, these methods could not classify different types of bone. In
this paper, we consider prior information related to bone fracture and then in-
115 tegrate some of the traditional approaches like Schmid filters into CNN, named
as CrackNet which is sensitive to fracture line. We propose a two-stage system:
Firstly, we use Faster RCNN to detect bones, and then CrackNet to identify
of
fractures.
ro
3. Related Background Knowledge
120
-p
This chapter introduces some basic concepts used in this method.
re
3.1. R-CNN
lP
125 object in images, but also localize many other objects by drawing bounding
boxes in proper size around them. Object detection is widely used in reality.
Jo
135 2. Passing every region into a convolutional neural network to get its feature
map.
6
of
ro
-p
re
lP
na
ur
140 b Refining bounding box (x,y,w,h) around the object its feature map
by Linear Regression.
The R-CNN framework trains network is divided into multiple steps, which
is relatively cumbersome. It needs to fine-tune the CNN network to extract
features, train SVM to classify positive and negative samples, and train the
145 bounding box regressor to get the correct prediction position. In addition, it
takes a long time to train the network. So we do not use RCNN framework for
7
Figure 3: Architecture of RCNN for detecting every singe bone. This image is from [25].
of
ro
bone detection.
-p
3.2. Faster R-CNN re
With the development of object detection in past years, there are many
150 detection frameworks. Fast R-CNN [26] and Faster R-CNN [27] are those based
lP
on region proposal method. Faster R-CNN which based on R-CNN shows the
best recognition accuracy among these frameworks. Firstly, Faster R-CNN finds
na
155 shown in Figure 4. At last, Faster RCNN outputs both bounding-box which
surrounds an object and the corresponding category of this object.
Jo
Figure 4: Architecture of Faster RCNN for detecting every singe bone. An image undergoes
CNN to extract feature maps, and then RPN to obtain feasible regions. Finally, the network
performs regression and classification on these regions.
8
1. A set of basic CNN layers, composed of Conv + Relu + Pooling, used
to extract the Feature Map of the input image. Usually you can choose
160 VGG16 [28] with 13 convolutional layers or ReNet101 [29] with 101 con-
volutional layers. The Feature Map extracted by Conv Layers is used for
RNP network to generate candidate regions and fully connected layers for
classification and border regression.
of
165 by the previous convolutional layer, and the output is a series of candidate
regions.
ro
3. The input of RoI pooling layer is the Feature Map extracted by the con-
-p
volutional layer and the candidate area RoI generated by the RPN. Its
re
function is to convert the region corresponding to each RoI in the Fea-
170 ture Map into a fixed-size H × W feature map and input it into the fully
lP
4. The input of this fully connected layer is the H × W feature map (RoI)
na
Then they judge the category of each RoI through SoftMax and cross-
175 entropy loss function, and modify the bounding box through smoothL1
Jo
loss function.
9
called as convolutional kernel, while more and more deep neural networks using
convolutional kernels (so called convolutional neural network, CNN) showing
powerful capability of representation [30]. In addition, CNN can extract both
190 local and global feature of an input image since the portion between size of
convolutional kernel and a layer in a CNN would be larger with the increasing
depth, which means that it can extract more ”global” feature. For example,
shallow layers could recognize straight lines or winding curves in a medical im-
age, while deeper layers could recognize whole shape of bone or even whether it
of
195 is fractured. Therefore, with the increasing depth in a CNN, the extracted fea-
ro
ture would be more and more abstract which might be the potential information
of the input image.
-p
re
lP
na
ur
Jo
Figure 5: Residual structure is from [29]. Weight refers to the convolution operation in the
convolutional network, and addition refers to the unit addition operation.
10
In our system, we use ResNet [29], which has very deep depth and shows very
powerful ability of representation, as our recognition architecture. As reported
200 in [29], this network can learn more subtle features and more generalization
ability. As shown in Figure 5, the residual part(right side of the figure) is
generally composed of two or three convolution operations. The residual part
in ResNet used in our experiment is a three-layer convolution. In ResNet, xl
may have a different number of Feature Maps from xl+1 . At this time, you need
205 to use 1 × 1 convolutional layer(left side of Figure 5) to increase or decrease
of
dimensionality. ResNet has many structures like Figure 5.
ro
4. Method
-p
In this chapter, we firstly introduce how to detect 20 different types of bones,
re
and then introduce how to recognize whether they are fractured.
lP
image, doctors firstly should recognize every type of bone and where it is located
ur
in human body. Our system has the capability to detect every single bone like
experienced doctors do, and we achieve it by Faster R-CNN.
Jo
11
each time, and make judgement on whether it is fractured. Therefore, we could
detect all specific bones at one time by Faster R-CNN and leave it to recognition
system to inference which of them are fractured.
In our detection system, we split all human bones into 20 different types
230 of bones according to human anatomy [31], where each single bone is alike in
a specific bone region. However, totally different with any bone in other bone
regions by length, or thickness, or relative location to whole body, or anything
else [31]. In our paper, the 20 different types of bones of the human anatomy are
of
skull, clavicle, scapula, rib, humerus, radius, ulna, metacarpal, carpal, phalanx,
ro
235 finger bone, vertebrae, pelvis, femur, patella, tibia, fibula, calcaneus, tarsal, and
metatarsus, respectively. In this way, we can maximize the difference between
-p
each bone region which is a good property for detection task. What is more,
re
bones’ segmentation is related to human anatomy, which provides more friendly
guidance for doctors.
lP
Network
Jo
After obtaining bounding box and type of each bone, we need to figure
245 out whether this bone region is fractured or not. There are lots of conven-
tional methods. For example, one can extract image feature vectors [15] such
as texture-feature, edge-feature and wave-feature. Then, one can classify im-
ages into two categories (fractured and non-fractured) by traditional machine
learning algorithms like SVM [11] and random forest [9]. With the development
250 of deep learning, image recognition algorithms based on conventional neural
network are becoming predominant. In recent years, the most advanced neu-
ral networks for classification task in ILSVRC (ImageNet Large Scale Visual
Recognition Challenge) are residual net [29].
12
4.2.1. Traditional Texture Filters
Doctors judge the fracture by looking at the fracture line, which is the tex-
ture information of the image from the medical image [1]. To better divide
fractures and non-fractures, we firstly enhance the image texture information
and then identify it through ResNet. The image texture filter has Sobel filter
[32], Laplace filter [33], Gabor filter [34] and Schmid filter [35]. Through our
experimental verification, it is found that Schmid filter is the best for fracture
identification. As mentioned in [35], the Schmid filter has a rotation invariance,
of
which can capture the invariant texture description. For bone images, Schmid
ro
filter can describe the bones’ edges and fracture line. Schmid filter mainly gen-
erate transform matrix through the kernel function, and then the convolution
-p
operation is carried out through the fixed matrix. Its kernel function is as
re
following:
2πτ r − r22 p
F (r, σ, τ ) = cos( )e 2σ , r= x2 + y 2 , (1)
lP
σ
255 where σ is the standard deviation of the gaussian, and τ is the number of cycles
na
of the harmonic function within the Gaussian envelope of the filter, and (x, y)
represents the coordinate position of pixel points.
ur
13
in Figure 6 is ResNet. The difference between the Schmid convolutional layer
and the ordinary convolutional layer lies in the kernel. On the one hand, the
characteristics obtained by Schmid convolutional layer are directional. On the
other hand, the kernel of the Schmid convolutional layer has only two parame-
275 ters through Equation 1. The Schmid convolutional layer reduces the amount of
learnable parameters for generating the convolution kernel, and can strengthen
the extraction of the feature of the fracture line. As proved by the latter exper-
imental results, CrackNet added Schmid convolutional layer improves the recall
of
rate of fracture images. For bone fracture recognition, the common convolution
ro
280 module used is ResNet, because ResNet has a better feature expression.
-p
re
lP
na
ur
Figure 6: CrackNet: The input to the network is a three-channel image. For each channel, a
Jo
convolution operation using the parameterized filter kernel obtained by Equation 1 yields a
feature map. Then, the obtained two-dimensional feature map (three channels) is integrated
into a three-dimensional feature map according to the last dimension. Finally, the new feature
map is subjected to a multi-layer common convolution layer and a classified fully connected
layer to obtain the final output (category score).
In this way, we have defined the forward propagation of the CrackNet. As for
backward propagation, parameters of the Schmid kernel function are updated
through chain rule. Suppose that L is the loss function, w is each convolutional
kernel, η is learning rate, and p are generating parameters (σ and τ ) of the
285 Schmid kernel function. Then, the updation δ and updated parameters p∗ are
14
calculated in the following equation (is from [37]):
∂L ∂L ∂w
δ= = , p∗ = p − ηδ, (2)
∂p ∂w ∂p
In conclusion, we can get all separate different types of bone regions by
Faster R-CNN and recognize whether they are fractured by CrackNet.
5. Experiments
of
290 In this chapter, we will evaluate the performance of different types of bones
ro
detection based on Faster R-CNN and region-wise classification based on Crack-
Net, and the whole results by using the proposed two-stage system.
it. The dataset consists of 3,053 X-ray images, where 112 are from website
Radiopaedia [38] and others are collected from hospital DICOM files. As shown
ur
in Table 1, we divide the entire dataset into two parts. We use 2,001 images for
training and testing of the object detection network and recognition network,
Jo
300
and the remaining images are used for comparison of the two-stage system
proposed by us with other methods. For the object detection network, we use
1800 images from dataset1 as training dataset and 201 images from dataset1 as
testing dataset. For the recognition network, we use 194 images from dataset1
305 to get 20 different types of bones regions as training dataset and 48 images from
dataset1 to get 20 different types of bones regions as testing dataset. As shown
in Figure 7, these image data contain five major parts of bones of the body,
namely skull, upper trunk, lower trunk, lower limb and upper limb. Resolution
of images is 3052 × 3052, 1024 × 889 and so on. The data set information is
310 shown in Table 1.
15
Class dataset1 dataset2
skull 34 22
lower trunk 483 252
upper trunk 484 252
upper limb 500 263
lower limb 500 263
total 2001 1052
of
Table 1: Skeletal image data distribution information.
ro
-p
re
lP
Figure 7: Outer cover: (a) lower limb, (b) lower limb, (c) upper limb, (d) upper trunk, (e)
lower trunk, (f) skull.
5.1.2. Metrics
In the work of region patches classification, there are only four possible
outcomes of applying the classifier on any instance. These outcomes are:
- True Positive (TP) which refers to the fractured images that are correctly
315 labelled as fractured.
16
- True Negative (TN) which refers to the normal (non-fractured) images
that are correctly labelled as normal.
- False Positive (FP) which refers to the normal images that are incorrectly
labelled as fractured.
320 - False Negative (FN) which refers to the fractured images that are incor-
rectly labelled as normal.
of
The performance of the proposed system is evaluated in terms of accuracy,
precision, sensitivity, Specificity, and F-measure in the following definition, re-
ro
spectively [39]. The sensitivity represents the proportion of all positive examples
-p
325 that are divided and measures the classifier’s ability to identify positive cases
and the recall is same as the sensitivity. The specificity represents the propor-
re
tion of all negative cases that are divided, and measures the classifier’s ability to
identify negative cases. The precision represents the proportion that is actually
lP
positive in the examples that are divided into positive cases. The F-Measure
is a comprehensive evaluation indicator, and its high value indicates that the
na
330
TP + TN
Accuracy = (3)
Jo
TP + TN + FP + FN
TP
P recision = (4)
TP + FP
TP
Recall or Sensitivity = (5)
TP + FN
TN
Specif icity = (6)
TN + FP
2 · P recision · Recall
F − measure = (7)
Recall + P recision
IoU is a measure of the overlap between the box predicted by the object
detection algorithm and the marked box in the original image, and the accuracy
of target detection can be obtained through this value. Its calculation formula
is as follows:
detectionresult ∩ groundtruth
IoU = (8)
detectionresult ∪ groundtruth
17
Mean average precision (mAP) [40] is the average value of APs in multiple
categories. The closer its value is to 1, the better the detection framework.
This indicator is an important indicator of the target detection algorithm. AP
335 is the area enclosed by the Precision-Recall curve and the abscissa. The higher
the AP value, the better the classifier performance. The Precision-Recall curve
mainly changes the probability threshold of the positive sample, so that the
classifier sequentially recognizes the test set, and draws the curve according to
the Precision value and the Recall value obtained by different thresholds. The
of
340 abscissa is the Recall value, and the ordinate is the Precision value. In target
ro
detection, the Precision-Recall curve is mainly drawn by setting different IoU
thresholds to obtain TP and FP. When he calculates the AP of each category, the
-p
main calculation method is to smooth the curve first, take the largest Precision
re
value on the right for each point, and connect it into a straight line; directly
345 calculate the area enclosed by the smoothed curve and the Recall axis, which is
lP
the AP value.
In the work of bone localization in an X-ray image, we use the common
na
metric: mean average precision (mAP) [40]. All experiments are run in one
NVIDIA TITAN X.
ur
350
18
of
(a) (b)
ro
-p
re
lP
na
ur
Jo
(c)
Figure 8: Outer cover: (a) the original image, (b) the artificial mark of (a), (c) the output
image of (a) by using Faster R-CNN.
19
We use ResNet101 architecture in Faster R-CNN to extract image features.
During the training, the maximum iteration step is 70000, the optimizer is SGD,
the base learning rate is 0.001 and the stepsize is 50000. In addition, we use
370 the finetune strategy to train ResNet101. The fine-tuning strategy is that we
train the ResNet101 model on the ImageNet dataset, and then save its weight
parameters. During the training of Faster R-CNN, ResNet101 is initialized with
these weight parameters. As shown in Table 2, the mAP of test sets is 0.82555,
and we can known the average precision of each category. The test performance
of
375 of an image by using Faster R-CNN is shown in Figure 8. From these result,
ro
we can precisely and quickly locate each bone and take the region of each bone
on an image through Faster R-CNN. Previous work can only segment bone, but
-p
they can’t know which types of bones they are.
re
class AP class AP
lP
380 After locating each bone in an X-ray image, we need to determine whether
the bone region is fractured. There are 242 original X-ray images, which are
derived from images evaluating Faster R-CNN, to train and test CrackNet. By
20
marking the location of each bone, 5,743 bone patches cropped from 242 images
are obtained, in which 723 are fracture ones. In this classification, fracture patch
385 is a positive example. In the experiments, the data of each class are divided
into five equal parts. Three of them were selected as training sets, one as a
validation set and the rest as a test set. When training, to balance the data,
fracture patches are expanded to 4,016 by rotating the image and changing the
background of the image. For the test set, there are 145 fracture patches and
390 1,004 non-fracture patches. Then, we train each model five times to reduce
of
accidental error. Stochastic gradient descent optimization algorithm and the
ro
finetune strategy are used during the training process, with the batch size as
50, initial learning rate as 0.001 and weight decay as 0.0005. The learning rate
-p
is reduced to one fifth per 4000 iterations. Each class of data is queued, and the
re
395 category data in each batch is equal during training. We report the performance
of our algorithm on a test set after 12,000 iterations based on the average over
lP
five runs. ResNet with original images, ResNet with Gabor filter processed
images, ResNet with Sorbel filter processed images, ResNet with Laplace filter
na
processed images, ResNet with Schmid filter processed images and CrackNet
400 with original images are employed for comparison. As shown in Figure 9, these
ur
four kinds of texture filtering can reflect the fracture line information of X-ray
Jo
images and can better identify the bone fracture. These models of test results
are shown in Table 3. Except for the ResNet model, other models have the
same computational complexity. The ResNet model has the least computational
405 complexity.
As shown in Table 3, we can know that the method that firstly use texture
21
(a) (b) (c)
(d) (e)
Figure 9: (a) The original image of fractured clavicle, (b) Schmid response of (a), (c) Gabor
of
response of (a), (d) Laplace response of (a), (e) Sorbel response of (a)
ro
filter preprocess image and then use the Convolutional Neural Network to clas-
-p
sify, which is good for fracture recognition. Through the experimental results,
we can know that the best effect of four texture filtering operators on fracture
re
410 recognition is Schmid filter. On the other hand, the Crack-Sensitive CNN is
better than the method that firstly use Schmid filter preprocess image and then
lP
use the ResNet to classify. From the aspect of the recall, CrackNet is best,
which is 3% higher than other networks. Therefore, Schmid convolution layer
na
is more sensitive to fractures. From the aspect of the specificity, the ResNet is
ur
415 best, and the effects of other networks are the same. In conclusion, CrackNet
is more suitable for bone fracture images. What’s more, it is obvious that the
Jo
recall is low as shown in Table 3. The main reason is that there are many
kinds of fractures, and the texture information of each kind of fracture image is
different.
22
The test dataset were 940 images converted from DICOM files, of which 470
were fracture images, the rest were non-fractured images, and 112 images were
430 downloaded from Radiopaedia [38], 56 of which were fracture images, and the
rest were non-fracture images. On the one hand, we compare our system with
others two-stage system on X-ray images, and on the other hand we compare
our system with other methods on Radiopaedia dataset.
of
Accuracy 0.9011 0.8909 0.8859
P recision 0.8973 0.8910 0.8861
ro
Recall 0.9049 0.8909 0.8859
F − measure 0.9014 0.8910 0.8860
-p
Table 4: The performance of our system and other two-stage systems on X-ray dataset
re
lP
Table 5: The performance of our system and other methods on Radiopaedia dataset
Jo
As shown in Table 4, it can be seen that the best method is using Faster
435 R-CNN and Crack-Sensitive CNN with an accuracy higher than 90% and an
F-measure higher than 90% on X-ray images. As shown in Table 5 and Table 4,
it is obvious that using our system proposed in this paper shows the best results.
From the recall, it can be seen that our system is more sensitive to the fracture
line. Our system can more accurately identify fractures. As shown in Figure 10,
440 the images through the system proposed in this article, can get the fracture
region. This assists doctors in fracture detection, shortens their diagnosing
time. In addition, the input of our system is an image containing any different
types of bones.
23
(a) input image (b) input image (c) input image
of
ro
-p
re
lP
Figure 10: Outer cover: (a) input an X-ray image of fractured calcaneus, (b) input an X-ray
image of fractured ulnar, (c) input an X-ray image of fractured patella, (d) bone fracture
region of (a) detected by our system, (e) bone fracture region of (b) detected by our system,
ur
6. Conclusion
445 In this paper, we proposed a novel system to fast and systematically detect
fracture in an X-ray image. We have shown, for the first time, that the power of
deep learning techniques can be harnessed to provide fast and accurate solutions
to the automation for medical image analysis. We present a new classification
network, CrackNet, which is sensitive to fracture lines and more accurately
450 identifies fractures. The proposed system can not only identify bone is fractured,
but also localize this bone in an X-ray image. It can help doctors quickly detect
the bone fracture. Extensive experiments on the Radiopaedia dataset confirmed
the efficacy of our proposed system, achieving 88.39% accuracy, 87.5% recall and
24
89.09% precision, outperforming other methods on the bone fracture detection
455 task.
In the future, we can change this two-stage system to a one-stage system.
Faster R-CNN and CrackNet are trained together instead of training in two
stages.
7. Acknowledge
of
460 Notably, we would like to express our full thanks to Shiwei Wang and Haikou
people’s Hospital for providing medical data. We would also like to thank the
ro
support of Data Science Laboratory of University of Science and Technology of
-p
China. re
[1] D. B. Burr, Introduction - bone turnover and fracture risk, Journal of
465 Musculoskeletal & Neuronal Interactions 3 (4) (2003) 408–409.
lP
25
480 ture, Research Journal of Pharmacy & Technology 10 (11) (2017) 1994–
2002.
of
Imaging 2012 (2012) 1.
ro
[9] Y. Cao, H. Wang, M. Moradi, P. Prasanna, T. F. Syeda-Mahmood, Fracture
-p
detection in x-ray images through stacked random forests feature fusion,
490 in: Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium
re
on, IEEE, 2015, pp. 801–805.
lP
classifiers for bone fracture detection in x-ray images, in: Image Processing,
2005. ICIP 2005. IEEE International Conference on, Vol. 1, IEEE, 2005,
pp. I–1149.
500 [12] J. Liang, B.-C. Pan, Y.-H. Huang, X.-Y. Fan, Fracture identification of
x-ray image, in: Wavelet Analysis and Pattern Recognition (ICWAPR),
2010 International Conference on, IEEE, 2010, pp. 67–73.
26
[14] M. Al-Ayyoub, I. Hmeidi, H. Rababah, Detecting hand bone fractures in
x-ray images., JMPT 4 (3) (2013) 155–168.
of
[17] T. Anu, M. M. R. Raman, Detection of bone fracture using image process-
ro
515 ing methods, International Journal of Computer Applications (0975–8887).
-p
[18] S. Mahendran, S. S. Baboo, An enhanced tibia fracture detection tool us-
ing image processing and classification fusion techniques in x-ray images,
re
Global Journal of Computer Science and Technology 11 (14) (2011) 23–28.
lP
[21] C. Bhabatosh, et al., Digital image processing and analysis, PHI Learning
Pvt. Ltd., 2011.
27
535 [24] C. M. Deniz, S. Hallyburton, A. Welbeck, S. Honig, K. Cho, G. Chang, Seg-
mentation of the proximal femur from mr images using deep convolutional
neural networks, arXiv preprint arXiv:1704.06176.
of
[26] R. Girshick, Fast r-cnn, arXiv preprint arXiv:1504.08083.
ro
[27] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object
-p
detection with region proposal networks, in: Advances in neural informa-
545 tion processing systems, 2015, pp. 91–99.
re
[28] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-
lP
[29] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
na
550
555 [32] I. Sobel, History and definition of the sobel operator, Retrieved from the
World Wide Web.
28
560 [35] C. Schmid, Constructing models for content-based image retrieval, in:
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceed-
ings of the 2001 IEEE Computer Society Conference on, Vol. 2, IEEE,
2001, pp. II–II.
[36] S. Luan, B. Zhang, C. Chen, X. Cao, Q. Ye, J. Han, J. Liu, Gabor convo-
565 lutional networks, arXiv preprint arXiv:1705.01450.
[37] Y. Ma, Y. Luo, Z. Yang, Pcfnet: Deep neural network with predefined
of
convolutional filters, Neurocomputing 382 (2020) 32–39.
ro
[38] Radiopaedia, http://radiopaedia.org/.
-p
[39] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical
re
570 machine learning tools and techniques, Morgan Kaufmann, 2016.
29
Acknowledge
We would like to express our full thanks to Shiwei Wang and Haikou people's Hospital for
providing medical data. We would also like to thank the support of Data Science Laboratory
of University of Science and Technology of China.
f
r oo
-p
re
lP
na
ur
Jo
Conflicts of Interest Statement
f
oo
that they have no affiliations with or involvement in any
r
organization or entity with any financial interest (such as
-p
re
honoraria; educational grants; participation in speakers' bureaus;
lP