0% found this document useful (0 votes)
53 views

An Intelligent Knowledge Extraction Framework For Recognizing Identification Information From Real-World ID Card Images

This document describes an intelligent framework for recognizing identification information from images of ID cards. The framework first uses a multi-operator algorithm to conduct marginal detection and localize the ID card region using an SVM classifier. It then segments linguistic characters from the card region using an improved projection algorithm and recognizes specific characters using an eight-layer convolutional neural network. Extensive experiments on a Chinese ID card dataset validated that the proposed method improves the efficiency of Chinese ID card recognition.

Uploaded by

JamesLim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

An Intelligent Knowledge Extraction Framework For Recognizing Identification Information From Real-World ID Card Images

This document describes an intelligent framework for recognizing identification information from images of ID cards. The framework first uses a multi-operator algorithm to conduct marginal detection and localize the ID card region using an SVM classifier. It then segments linguistic characters from the card region using an improved projection algorithm and recognizes specific characters using an eight-layer convolutional neural network. Extensive experiments on a Chinese ID card dataset validated that the proposed method improves the efficiency of Chinese ID card recognition.

Uploaded by

JamesLim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received May 18, 2019, accepted July 2, 2019, date of publication July 18, 2019, date of current version

November 25, 2019.


Digital Object Identifier 10.1109/ACCESS.2019.2929816

An Intelligent Knowledge Extraction Framework


for Recognizing Identification Information
From Real-World ID Card Images
LIN ZUO 1, WENYU CHEN2 , HONG QU 2, LI HUANG2 , ZHENG WANG2 , AND YONG CHEN3
1 School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
2 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
3 School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China

Corresponding authors: Lin Zuo ([email protected]) and Wenyu Chen ([email protected])


This work was supported in part by the National Science Foundation of China under Grant 61877009 and Grant 61573081, and in part by
the Sichuan Provincial Science and Technology Plan Project under Grant 2018GZ0396.

ABSTRACT In this work, we study the problem of recognizing identification (ID) information from
unconstrained real-world images of ID card, which has extensively applied in practical scenarios.
Nonetheless, manual ways of processing the task are impractical due to the unaffordable cost of labor and
time consumption as well as the unreliable quality of manual labeling. In this paper, we propose an intelligent
framework for automatically recognizing ID information from images of the ID cards. Specifically, we first
conduct marginal detection using a multi-operator algorithm and then localize the region of ID card from
all the proposed candidate regions with SVM classifier. Furthermore, we segment linguistic characters from
the card region by an improved projection algorithm. Finally, we recognize the specific characters by an
eight-layer convolutional neural network. We perform extensive experiments on a Chinese ID card dataset
to validate the effectiveness and efficiency of our proposed method. The experimental results demonstrate
the superiority of proposal over other existing schemes.

INDEX TERMS Identification information recognition, intelligent framework, convolutional neural


network.

I. INTRODUCTION additional equipment. However, some challenging issues still


In recent years, extracting textual information from images remain to be addressed:
has received considerable concerns due to the rapid develop- • Constrained place: Usually, the additional equipment
ment of information technology which requires an enormous requires the ID cards to be processed in some fixed
growth of accessible information. The explosive growth of position with the assistance of professional technicians
smart device has been improving the quality of our daily to extract the information. In other words, customers
life. For instance, we may take photos by the camera of have to personally go to a specified location to have
cell phones to record and share many significant occasions theirs ID cards being processed. Undoubtedly, this way
and information (e.g., vital document, number and street). is quite time-consuming and the user experience is very
However, extracting text information like personal details, poor, especially when the operation is urgent, such as
iconic numbers printed on smart cards (such as, ID card, stock exchange.
and credit card) is a challenge. Those information are com- • Strict environments: The additional equipment is highly
monly used in banks, airports, security company and other sensitive to illumination environment, too bright or
places where high accuracy in identification and recording is too weak light may affect the recognition results,
required. Traditionally, this task can be completed by manual which inevitably requires expensive cost of professional
identifying and inputting digital information into system. technician to assist the operation.
However, the reliability and efficiency cannot be guaranteed. • Recognition accuracy: Due to the scarcity of sufficient
The rapid development of machine vision techniques enable real-world training images and effective recognition
automatic recognizing information from images with certain approaches, existing identification systems cannot guar-
The associate editor coordinating the review of this manuscript and antee the accuracy of recognition. Moreover, the robust-
approving it for publication was Shiqi Wang. ness of the performance should be further enhanced too.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
165448 VOLUME 7, 2019
L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

In a nutshell, a highly accurate approach is demanded


from bank, financial institution and corporation to realize the
timeliness personal business for exacting customer informa-
tion from images, enabling customers to perform the task as
quickly as possible by themselves. It is similar to the vehicle
license plate recognition (LPR) technology which processes
natural images photographed by outdoor cameras without
interventions from staffs. Oftentimes, these images are not
readily to be processed due to complex background and a
wide range of illumination conditions. The challenging issue
of the LPR is to localize the license plate, which is similar
to the ID card recognition problem in this study. The purpose FIGURE 1. The flowchart of our proposed scheme.
of our study is to promptly segment characters from images
with various backgrounds and illumination conditions [1].
Unlike the LPR problem in which a high quality camera is
used, the problem to be solved is much more complex than
that of LPR, because images could be taken by a diversity
of phone cameras with different qualities [2], [3]. After we
locate the ID card region, another important stage is the
character recognition which typically uses a statistical pattern
matching method [4]–[6]. This method, however, is sensitive
to diverse images datasets, noise and the robustness of a
particular learning algorithm [7], [8]. Another method which FIGURE 2. The raw images usually have many problems. For example,
is extensively used for pattern matching is the deep artifi- the useful information of the left image is less than 40%. The right one is
a distorted image as the camera is not vertically aligned to ID card.
cial neural networks. Deep artificial neural networks are an
important advance in solving recognition problems [9], [10].
It excels at discovering intricate structures and requires a very proposed method can improve the efficiency in Chinese ID
few assistance from human beings [11]. card recognition.
In [12], the BP neural network was used to recognize the
Chinese ID card numbers and the ability of ID card number II. LOCALIZING ID CARD REGION
recognition was greatly improved. However, the study only A universal ID card recognition project should have a
focused on the ID card numbers while ignoring other text good ability to obtain highly accurate ID card regions
information that are equally important. In [13], an approxi- even in complicated situations. As the quality of images
mative Bayes optimality liner discriminant analysis (BLDA) taken by different people may be the same, we learned the
was presented for Chinese handwriting character recogni- idea from the question of dehaze images, including image
tion, this model reduced the searching spaces of BLDA sig- enhancement-based approaches and image restoration-based
nificantly and acquired good accuracy of recognition. The approaches [15], [16]. As images may be taken by different
proposed model not only recognized the numbers but also people, various image issues are to be expected including
successful recognized the Chinese characters on ID cards a copy of ID card with a large proportion of empty space
which have a great difference with handwriting characters. as shown in Fig. 2 (left): where the useful information in
It is worth mentioning that Google has an open source system, the picture is less than 40%, or may be a photo taken by a
namely Tesseract, which has been widely used for text recog- mobile phone or other handhelds as depicted in Fig. 2 (right):
nition by researchers for western characters recognition [14]. depending on how the handheld device was held, the image is
Unfortunately, Tesseract does not has a good performance for not vertically aligned to ID card. The distortion may directly
Chinese characters recognition. affect ID card identification. These reasons among others
In this paper, an intelligent framework for automatically render the reasons why original image must be preprocessed
extracting ID information from images of ID cards is pro- to acquire exact ID card region from the original images.
posed, the flowchart of framework is illustrated in Fig. 1. We use three steps to remove noise and obtain an accurate
By using a multi-operator algorithm, the marginal of ID cards position of ID card. In the first step, the foregrounds and
are detected, and the region of ID card is located by the SVM backgrounds should be separated and the color images are
classifier. An improved projection algorithm is employed binarized to remove any form of noises. The multi-operators
to segment the linguistic characters from the card region. is used to detect the edge of ID cards. The multi-operators
A tailored eight-layer convolutional neural network is used algorithm will sort out all eligible regions which have rectan-
to recognize the specific characters. A set of experiments gle block, also called candidates. In the second step, an SVM
are conducted to examine the effectiveness and efficiency model will be trained to classify all the candidates and
of the proposed method. The results demonstrate that the filter out the obvious incorrect regions. In the third step,

VOLUME 7, 2019 165449


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

a confidence algorithm is used to calculate the most likely


ID card region. The comprehensive details are shown below.

A. DENOSING
In order to remove the noise and make the image smoother,
we first use the Gaussian low-pass filtering and image bina-
rization which can help eliminate noises and interference.
Filtering is the most basic operation of image processing.
In the broadest sense of the word ‘‘filtering’’, the value of a
filtered image at a given location is a function of the values of
the input image in a small neighborhood of the same location.
The Gaussian low-pass filtering computes a weighted average
of pixel values in the neighborhood [17]. By the Gaussian FIGURE 3. Intermediate result of the multi-operator algorithm which can
low-pass filtering, the noise in the images can be significantly acquire exact marginal information. The purpose of adding mosaics is to
protect the sensitive information of the ID card.
removed. It is worth mentioning that even though the bina-
ryzation is powerful for denoising, it has its own drawbacks. the gradients direction as follows:
The most important one among many drawbacks is the lose Gy
of useful color properties if images are binarized. 2 = atan(
)
The kernel of size (2k +1)∗(2k +1) Gaussian filter function Gx
is given by: where 2 is 0 for a vertical edge which is lighter on the right
side, 2 is π for the anther case.
1 − (x−(k+1))2 +(y−(k+1))2
Canny operator: The Canny edge detector uses a
Gxy = e 2σ 2
2πσ 2 multi-stage algorithm to detect a wide range of edges in
(1 ≤ x, y ≤ (2k + 1)) images and it’s optimal at any scale [20]. It has a sim-
ple approximate implementation in which edges are marked
We input images, and the Gaussian kernel is used for the
at maxima in gradient magnitude of a Gaussian-smoothed
Gaussian filter to convolve images, so as to make images
image. Empirical experience in the area of machine vision
smooth and eliminate the noise and interference.
recommends that the canny edge detection provides good and
reliable detection.
B. MULTI-OPERATOR IN EDGE DETECTING
We used multigroup parameters of the Sobel operator and
Using the multi-operator to detect edges is the core of our Canny edge detector together. Multigroup parameters and
proposed method. Without this stage we cannot obtain the combined results ensure that the algorithm can acquire the
marginal information of ID card for the ensuing segmentation exact marginal information. In fact, experiments show that
and recognition. the results of combined of the two operators outperform any
Sobel operator: The Sobel operator is a famous detec- single operators as shown in section V.
tion algorithm in the field of image processing and machine
vision [18], and it can create image with emphasized edges. C. DISCRIMINATION THE CANDIDATES
It uses two odd number matrices, one for horizontal changes There are several candidate images after executing the
and the other for vertical changes, as kernels for convoluting multi-operator algorithm, most of which are not very useful.
the original image to calculate approximations of the deriva- We therefore classify them. In this study, we use the Support
tives. Compared to other edge operator, the Sobel has two Vector Machine (SVM) model and the confidence algorithm
main merits: it adds some smoothing effect to the random for selecting the useful images before proceeding the next
noise of the image. Because the Sobel is the differential of step. The Support Vector Machine (SVM) [21] is a class of
two rows or two columns, the elements of the edge on both machine learning model used for classification and regres-
sides has been enhanced [19]. sion. Given a set of labeled training samples, each marked
sample belongs to one of two categories: positive or negative,
 
−1 0 1
Gx =  −2 0 2  ∗A the SVM learns from those labeled samples to classify a new
−1 0 1 image into the two classes. The results of a trained SVM

−1 −2 −1
 is a classification model that can be used to classify new
examples.
Gy =  0 0 0  ∗A
1 2 1 The SVM exhibits excellent generalization performance
(accuracy on test sets) in practice and have strong theoretical
where A is the source image, Gx and Gy are two images of motivation in statistical learning theory [22].
which each point contains the horizontal and vertical deriva- In our case, we have 1200 positive and negative sam-
tive approximations respectively, ∗ denotes the 2-dimensional ples, respectively. The positive samples are all regular ID
signal processing convolution operation. We then calculate card images with complete information, whereas the negative

165450 VOLUME 7, 2019


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

FIGURE 5. Correcting skew image by checking those lines.

FIGURE 6. The problem of naive project algorithm used on Chinese chara-


cter, the algorithm always separates the left-right structure character.
FIGURE 4. The training samples for the SVM model.

samples are various kinds of irregular images with incomplete


information as shown in Fig. 4. To ensure the correctness The segmentation falls into two types: the line segments
of the final result and reduce the computing complexity, and the character segments. The line segmentation uses the
we use the confidence algorithm to verify the results produced relative position of the ID card image because ID cards have
by the SVM and preserve final images which satisfy the uniform sizes and styles. In China, every ID card has the
criteria. Since we used multigroup parameters and combined same layout with the facial photo being on the right side and
the results of two operators, we may obtain more than one the personal information on the left side. The position, size,
final image. Usually, we believe that they are all correct ID and line space of the images are all fixed. According to this
card images and we select directly the first one for the next characteristic, the text message can be roughly determined by
step. using the position of detected pixels where ID card’s exact
region in image can be obtained.
D. IMAGE CORRECTION The characters segmentation uses vertical projection infor-
mation in this study and it is modified to accommodate the
Since we use the real-world ID card data, many images
complex structure in Chinese characters. The vertical pro-
are tilted and distorted, and they require corrective actions.
jection requires that images are binary before segmentation,
We adopt corner detection which is widely used in the field
and the vertical projection can be calculated on every column
of machine vision to acquire images features such as image
pixels based on binary image. The naive projection works
location, video tracking, and target recognition, to correct
well for Latin characters. As letters are next to each other,
these images. In our system, the corner detection focuses on
and the projection algorithms are efficiently used. However,
the edge of ID card’s profile and analyze the distribution of
Chinese characters differ from western words. Chinese char-
corner points.
acters have their particular structure, it may have a gap even
After the image’s corner points are acquired, we find the
if only a word, such as Chinese character in Fig. 6. When
inclination of the image by using the Hough Transform [23],
separating Chinese characters, the naive projection algorithm
which is high accuracy transform algorithm to detect object’s
always produces a recognition error. To solve this problem,
shape, such as detecting straight lines. Because the ID
we improve the projection algorithm to adapt to Chinese
cards have high linearity, we can easily repair tilted images
character segmentation. Experimental results show that the
by figuring out their angle of the inclination as shown
improved projection algorithm is much more accurate for
in Fig. 5.
Chinese character segmentation.
III. SEGMENTING LINGUISTIC CHARACTERS 1) The width and space of numbers and Chinese charac-
To recognize the ID card, the characters and numbers must be ters should be recorded and treated as the threshold of
separated out from the image of ID cards, the performance of segmenting criterion.
separation directly affect the accuracy of recognition. In this 2) When the algorithm start scanning, it adds minutes of
study, the segment method is based on the row scanning of the the start position sCH and end position eCH of charac-
original ID card image along with the projection information ter n(n = 1, 2, ...), then judge whether the (sCH , eCH )
from the vertical scanning. is greater than the threshold:

VOLUME 7, 2019 165451


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

FIGURE 7. The results from characters segmentation.

• If eCH − sCH is very close to a pre-specified


threshold, we believe that it is an acceptable
segmentation.
• If eCH −sCH is less than a pre-specified threshold,
it means there may be a Chinese word which have a
division like left-right structure and was segmented
incorrectly. The algorithm should merge the cur-
rent region and the next right region, and then
discriminate with the threshold again. Usually,
the process will be repeated twice through the
algorithm to get a correct segmentation.
• The last situation is that eCH − sCH is greater
than threshold, and it means there that may be an
adhesive character to be segmented again. By using FIGURE 8. Our dataset layout.
the physical position to segment text line and the
modified vertical projection method to segment characters. In fact, the common characters in Chinese are
characters, the accuracy of segmentation increases more than 5000 which are considerably larger than the west-
significantly. It can process some cases of Chi- ern characters. The core problem that to acquire enough
nese characters such as the structural problems and character labels to train the model remains.
the adhesion characters problems. The results of We utilize image post-processing to edit real ID card
segmentation is shown in Fig. 7. images, replace those words on card with ‘‘raw’’ words which
not exist in training dataset and ignored syntax and grammar.
IV. CHARACTER RECOGNITION It is worth mentioning that, we chose to edit real ID card
Machine learning technologies are powerful in many images instead of creating image of a word directly because
aspects of modern society [9], such as detecting object the words on ID card have their own font, size and style.
in images [24], audio recognition [25], path-planning for We do so to make sure the artificial word data is consis-
mobile robots [26] and optimize classic question [27]. Cur- tent with the real word so that it can be useful to train our
rently, the common form of machine learning is supervised model. Figs. 9 and 10 show the ‘‘man-made’’ image and the
learning. To model a supervised learning of object identifi- segmentation results, respectively. Our training dataset was
cation in image, there is a need to collect a large dataset of enlarged to 3000 words by our approach. Our experimental
the object we called ‘‘label’’, then use the ‘‘label’’ to correct results show that the recognition accuracy of our model has
the model’s output. Lots of studies have demonstrated that the improved tremendously since our training dataset increased
supervised learning is very useful in solving recognition prob- by artificial data.
lems. Recently, supervised learning models are increasingly The deep learning technology has widespread applications,
being adopted in practical applications, especially in image from web searches to content filtering on social net-
recognition and natural language process-the accuracy of works to recommendations on e-commerce websites [9].
neural network model base on machine learning technology In speech recognition, using the deep learning structure can
of handwritten digit corpus MNIST is greater than 98% [28]. achieve record-breaking results on a standard benchmark
Although machine learning is very successful in western in a small vocabulary [29] and quickly develop on a large
characters and digit recognition, limited researches effort vocabulary [30]. There are many kinds of neural network
have been made on Chinese characters recognition, because models, the Convolution Neural Network (CNN) is one
it needs a relatively large data to train models for Chinese among many. The CNN is designed to process data that come

165452 VOLUME 7, 2019


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

Convolutional layer: Generally, a classical CNN model


with one convolution layer followed by a max-pooling layer
should perform well when obtaining features. Empirical
experiences suggest that deep architecture with multiple lay-
ers could achieve a good performance. As the convolution
layers play critical roles in this study, we add one more con-
volution layer before max-pooling layer to extract deeper and
significant information. The input of a convolutional layer is
a feature map {zi = (u, v) ∈ <d } where (u, v) ∈ i are image
coordinates and d denotes d scalar features. The output is a
new feature map zi+1 , such as:

FIGURE 9. In order to enlarge our training dataset, we used ‘‘raw’’ word zki+1 = gi (Wik zi + bik )
that not exist in our earlier training dataset to ‘‘create’’ some ‘‘artificial’’
ID cards and ignored syntax and grammar. where Wik and bik denote the k-th filter kennel and bias
respectively. The g(·) is a nonlinear activation function. In our
study, the Rectified Linear Unit (ReLU) [36] is used, and we
will introduce it later.
Max-pooling Layer: We conduct a max-pooling layer:
hi = max z2i (u, v) after two convolution layers to capture
the most important feature, where 2 denote the max-pooling
that uses the feature map of the second convolutional layer.
Max-pooling can maintain the translation invariant of image,
it means that the same feature will be active even the image
undergoes translations. It is very useful because our images
come from the real world photos, it always have a litter left or
right translation or tilt, but we need the model still accurately
classify it regardless of its position.
Fully-connected Layer: We conduct a fully-connected
layer h = [h1 , ..., hk ]T for classification purposes, where k
is the number of filters.
In addition to the above layers, the neural network is
FIGURE 10. The segmentation result of an ‘‘artificial’’ ID card, the segme- composed of input layer and output layer. The input layer
ntary effect and the pieces (samples) are the real data when we train our is the Chinese character images we introduced before, while
model. The recognition accuracy has been significantly increased since
these artificial samples were added into the train set.
the output layer is a softmax layer which we used to clas-
sify characters and then output the recognition results [37].
Inspired by Nature Language Processing (NLP) tasks, such
in the form of multiple arrays like an image composed of as language model [38], [39], machine translation [40], [41],
arrays containing pixel [31]. It is a particular type of deep, we used the vocabulary probability as our output. In this
feedforward network that is much easier to train and gener- approach, the dimension of the softmax layer is the same
alized than networks with full connectivity between adjacent as the vocabulary size, representing the probabilities of the
layers [32], [33]. The CNN has many practical achievements corresponding words. Although the number of parameters in
in image processing and widely used in the field of machine the layer of softmax is huge, this method is a promising way
vision. In the early 1990s, the CNN model has used for to predict words in NLP tasks. We present and analyze the
face recognition. Now, CNNs are the dominant model for experiment results in Section V.
recognition and detection task [9], [34], [35].
A CNN model has a series of layers. Most of CNN models V. EXPERIMENTS
have one convolution layer and one pooling layer followed to In this section, we present and discuss our experiments
obtain features and a fully connected layer as hidden layers, results.
then a classify layer to output, all hidden layers are free to To handle the data problem, firstly, we search available
stack. To obtain better features, our CNN model, with twice dataset on Internet. However, it does not work because the
stacked two conventional layers and one max-pooling layer, ID card has its peculiar fonts and those available datasets
is different with conventional CNN model structure. The are unsuitable. Besides, there are very limited references on
CNN has fully connected layers for nonlinear classification. Chinese ID card identification. Although some companies
The inputs of the model are the images of Chinese characters have developed their ID card identification SDK for limited
and the outputs are the corresponding recognition results. The use and commercial applications. However, as the principles
architecture of our model is shown in Fig. 11. and algorithms of ID card are not either publicized or reported

VOLUME 7, 2019 165453


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

FIGURE 11. The model architecture.

in the literature, we cannot make a full comparison with these


methods. To solve this issue, we create a train dataset by
ourselves. We use method mentioned in Section III to obtain
many character fragments, then we classify them by hand.
This set has 1000 labels and more than 10000 pieces, each
label has a folder to store pieces belonging to the label, tens
of images samples per one label, including 10 numbers and
a capital character ‘‘X’’ which is a special symbol in China
ID card. Each character sample is cut out from real-world ID
card image so that our model can get very accurate results.
The dataset is shown in Fig. 7.
However, after experiments, our model could not work
well with our trained dataset because there were still many
characters that could not be identified. We checked the inter-
mediate results and found out that those characters did not
exist in the training dataset. In other words, the dataset was
still too small to work well for identification. Unsurprisingly,
as I mentioned earlier, the common Chinese characters are
more than 5000 words, our training dataset was only one-
fifth. To overcome this barrier, we try a ‘‘man-made’’ method
to extend our training dataset.

A. MULTI-OPERATOR EXPERIMENTS
Generally, the photos of ID cards are instantly photographed,
we should first locate the ID card in picture. As mentioned FIGURE 12. Multi-operator experiments.
Section II, we use multi-operator to obtain the ID card profile
and then to acquire the region of ID card in an image. Initially,
TABLE 1. The performances of three activation functions, the higher the
we used only the Sobel operator to detect the ID card profile score, the better the model.
in an image, but the results may be unstable. As a variety
of images will be collected, a single operators may not be
able to locate the ID card in each individual image. The worst
result is shown in Fig. 12(a), the location of ID card was not
identified correctively. The problem happens because the ID
card’s shape in images are diverse as we said before, so we
combined the Soble operator and the Canny operator and give
five group parameters for dealing with complex situation. evaluate our model with human perception of the recognition
As expected, the profiles of most ID cards can be extracted in quality.
this way, the result is shown in Fig. 12(b). Dataset and Evaluation: As mentioned before, we use two
datasets to test our model, the small one includes 1000 Chi-
B. CHARACTER RECOGNITION EXPERIMENTS nese character image samples while the largest set includes
We present our recognition experiment from the convolution more than 3000 image samples. We carried out a human
neural network model in this section. Because our model will evaluation and the reference label evaluation, the human
be used in some practical applications, we invite humans to evaluation is the human perception of the model’s recognition

165454 VOLUME 7, 2019


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

TABLE 2. The performances of different structure of hidden layers. The 1,2 models are basic models, d denotes the basic model stack times, 3,4 models
are twice stacked model based 1,2 model respectively, 5 model is three times stacked model 2.

quality. We asked human rater to rate the recognition results TABLE 3. Evaluation of recognition quality by human side-by-side evalu-
ation. The number denotes recognition rate(%). The front of ID card incl-
in a side-by-side comparison way. udes information of name, gender, birthday, nation, address and ID card
Training details: Our character recognition moder uses a number of holder (1-6 in the table), the back have two lines of words are
the information of expiry date and issuance authority.
seven-layer CNN model to output a probability of p(c|x) over
our dataset alphabet C including 122/300 words, 10 digits and
a special capital letter ‘‘X’’, giving a total of 133/311 classes.
The input {z1 = x} of the CNN are gray-scale clipped char-
acter images of 30 × 30 pixels, the images is convolved with
64 filters of size 3 × 3 and pooled with activation function in
size of 2 × 2. The loss function is a cross entropy function
and all the parameters of the model are jointly optimized to
minimize the loss function over a training set using Stochastic
Gradient Descent (SGD) and back-propagation.
Because recognition accuracy of neural networks is
affected by using different activation functions, we compare
our model with three activation functions:
• Sigmoid function: a bounded in [0,1] and dif-
ferentiable real function which is computed as: and one max-pooling layer structure at last, the results show
sigmoid(x) = 1+e1 −x . our model is reliable and satisfactory in terms of accuracy
• Tanh (Hyperbolic function): Tanh function is a solution requirement of many scenarios. Before our work, most secu-
to the nonlinear boundary value problem, bounded in rity companies manually fill in the identification information,
−x x
[−1,1], and it is computed as: tanh(x) = eex +e−e−x . resulting in intensive human labor and a high percentage
• ReLU (Rectified Linear Unit) [42]: A simple function of error. In order to solve the problem of designing labo-
expressed as: f (x) = max{0, x}. Actually, the ReLU rious hand-typing work and increase the processing speed,
function becomes a standard nonlinear activation we developed the proposed approach and introduced human
function of CNN, because it provides a simple calcula- evaluation to illustrate the advantages of our algorithm in
tion and significantly reduce the number of iterations. terms of reducing manual labor and improving the accuracy
We use 100 ID card images to test the accuracy of the rate. We used 120 real-world ID cards excluding in our train-
three activation functions with the same parameters as our ing samples to evaluate the recognition rate. The results are
model. Table 1 shows the performances of the three activa- shown in Table 3 by human side-by-side evaluation. Since
tion functions. Appearently, the ReLU function improves the there are some anti-counterfeiting marks and patterns on the
recognition quality in all the cases. Therefore, we chose the back side, it is not an easy task to locate character areas and
ReLU function as the model’s activation function. recognize information. Moreover, the boundary between the
The performances of neural network also depends on the periods is highly interfering, leading to recognition errors.
structure of hidden layers, it can be greatly influenced by Therefore, the back recognition rate is often lower than the
the number of hidden layers. Using a conventional convo- front side. It should be noted that all the training datasets are
lution neural network always has a pooling layer after one all from real world ID cards, and due to the complexness of
convolution layer. We test some combinations of model for Chinese characters, it is normal that the character recognition
high accuracy. All cases of combinations used 100 ID card rate is a little bit lower than the number recognition.
images to test as the activation functions test, the result shown
in Table 2. Although the three times stack CNN structure C. MODEL RESULT EXHIBITION
acquires higher accuracy, taking account of the amount of Our model includes four parts: the image processing, ID card
time consumption, we chose the twice stacks CNN structure. location, character segmentation and recognition. The overall
According to the above experimental results, our model performance of the model depends on the performance of
uses the ReLU activation function, two convolutional layers these four sections. Therefore each section of the model

VOLUME 7, 2019 165455


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

network. Experimental results show that our model performs


well, the recognition result is reliable with satisfactory accu-
racy for practical scenarios.

REFERENCES
[1] B. Shan, ‘‘Vehicle license plate recognition based on text-line construction
and multilevel RBF neural network,’’ JCP, vol. 6, no. 2, pp. 246–253, 2011.
[2] B. Yang, X. Zhang, L. Chen, H. Yang, and Z. Gao, ‘‘Edge guided salient
object detection,’’ Neurocomputing, vol. 221, pp. 60–71, Jan. 2017.
[3] Z. Wang, G. Xu, Z. Wang, and C. Zhu, ‘‘Saliency detection integrating
both background and foreground information,’’ Neurocomputing, vol. 216,
pp. 468–477, Dec. 2016.
[4] G. Cheng, J. Han, L. Guo, X. Qian, P. Zhou, X. Yao, and X. Hu, ‘‘Object
detection in remote sensing imagery using a discriminatively trained
mixture model,’’ ISPRS J. Photogramm. Remote Sens., vol. 85, no. 9,
pp. 32–43, 2013.
[5] J. Han, D. Zhang, G. Cheng, L. Guo, and J. Ren, ‘‘Object detection in
optical remote sensing images based on weakly supervised learning and
high-level feature learning,’’ IEEE Trans. Geosci. Remote Sens., vol. 53,
no. 6, pp. 3325–3337, Jun. 2015.
[6] H. Sun, X. Sun, H. Wang, Y. Li, and X. Li, ‘‘Automatic target detection in
high-resolution remote sensing images using spatial sparse coding bag-of-
words model,’’ IEEE Geosci. Remote Sens. Lett., vol. 9, no. 1, pp. 109–113,
Jan. 2012.
[7] L. Zhang, L. Zhang, D. Tao, and X. Huang, ‘‘Sparse transfer mani-
fold embedding for hyperspectral target detection,’’ IEEE Trans. Geosci.
Remote Sens., vol. 52, no. 2, pp. 1030–1043, Feb. 2014.
[8] G. Cheng, P. Zhou, and J. Han, ‘‘Learning rotation-invariant convo-
lutional neural networks for object detection in VHR optical remote
sensing images,’’ IEEE Trans. Geosci. Remote Sens., vol. 54, no. 12,
pp. 7405–7415, Dec. 2016.
FIGURE 13. The recognition result. [9] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
pp. 436–444, May 2015.
[10] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, ‘‘Deep
learning for visual understanding: A review,’’ Neurocomputing, vol. 187,
is equally important. The results of our model are shown pp. 27–48, Apr. 2016.
[11] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, ‘‘A survey of
in Fig. 13, the original picture is a natural picture such as a deep neural network architectures and their applications,’’ Neurocomput-
person readily photographed, our model extracts the ID card ing, vol. 234, pp. 11–26, Apr. 2017.
region in the complex background and recognize it. Although [12] X. S. Xin, Q. Song, L. Yang, and W. H. Jia, ‘‘Recognition of the smart card
iconic numbers,’’ in Proc. Int. Conf. Electron., Inf. Comput. Eng., vol. 44,
the result are not 100% correct, the first number of ID number 2016, Art. no. 02087.
is wrong as shown in Fig. 13, the method is still acceptable [13] C. Yao and G. Cheng, ‘‘Approximative Bayes optimality linear discrimi-
and can significantly decrease human one work load. nant analysis for Chinese handwriting character recognition,’’ Neurocom-
puting, vol. 207, pp. 346–353, Sep. 2016.
Moreover, it is worth mentioning that our Chinese char-
[14] W. Wu, J. Liu, and L. Li, ‘‘Text recognition in mobile images using
acter data set is only about 3000 character image labels in perspective correction and text segmentation,’’ Int. J. Signal Process.,
total, including artificial data and real data but not cover the Image Process. Pattern Recognit., vol. 9, no. 10, pp. 171–178, 2016.
common Chinese words. Our future work aims at solving [15] J.-B. Wang, N. He, L.-L. Zhang, and K. Lu, ‘‘Single image dehazing
with a physical model and dark channel prior,’’ Neurocomputing, vol. 149,
these flaws in order to improve our model. pp. 718–728, Feb. 2015.
[16] Z.-J. Zhu, Y. Wang, and G.-Y. Jiang, ‘‘Unsupervised segmentation of
natural images based on statistical modeling,’’ Neurocomputing, vol. 252,
VI. CONCLUSION pp. 95–101, Apr. 2017.
In this paper, we proposed an effective machine learning [17] C. Tomasi and R. Manduchi, ‘‘Bilateral filtering for gray and color
approach for real-world Chinese ID card recognition, includ- images,’’ in Proc. ICCV, Jan. 1998, vol. 98, no. 1, pp. 839–846.
[18] S. Gupta and S. G. Mazumdar, ‘‘Sobel edge detection algorithm,’’ Int. J.
ing all the techniques that are critical to ensure its accuracy. Comput. Sci. Manage. Res., vol. 2, no. 2, pp. 1578–1583, Feb. 2013.
To the best of our knowledge, we are the first to use the deep [19] W. Gao, X. Zhang, L. Yang, and H. Liu, ‘‘An improved sobel edge detec-
learning method to identify Chinese ID card, and also we are tion,’’ in Proc. 3rd IEEE Int. Conf. Comput. Sci. Inf. Technol. (ICCSIT),
vol. 5, Jul. 2010, pp. 67–71.
the first to publicly report the use of deep learning methods
[20] J. Canny, ‘‘A computational approach to edge detection,’’ IEEE Trans.
to identity Chinese ID card, which will drive the academic Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986.
development of the OCR identification technique. On our [21] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn.,
real-world ID card dataset, the recognition quality of our vol. 20, no. 3, pp. 273–297, 1995.
[22] V. N. Vapnik, Statistical Learning Theory, vol. 3. New York, NY, USA:
method approaches accuracy of 94% which can significantly Wiley, 1998.
decrease human work and highly recommendable for use in [23] D. H. Ballard, ‘‘Generalizing the Hough transform to detect arbitrary
the industry. Moreover, for the limitation of the real-world shapes,’’ Pattern Recognit., vol. 13, no. 2, pp. 111–122, 1981.
[24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
ID cards dataset, we leverage an ID card characters dataset Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
by training dataset ourselves to train a convolution neural Pattern Recognit., Jun. 2016, pp. 779–788.

165456 VOLUME 7, 2019


L. Zuo et al.: Intelligent Knowledge Extraction Framework for Recognizing Identification Information

[25] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, ‘‘Convolutional deep WENYU CHEN received the Ph.D. degree in
belief networks for scalable unsupervised learning of hierarchical repre- computer science from the University of Elec-
sentations,’’ in Proc. 26th Annu. Int. Conf. Mach. Learn. (ACM), 2009, tronic Science and Technology of China, Chengdu,
pp. 609–616. in 2009, where he is currently a Professor with the
[26] H. Qu, S. X. Yang, A. R. Willms, and Z. Yi, ‘‘Real-time robot path planning School of Computer Science and Engineering. His
based on a modified pulse-coupled neural network model,’’ IEEE Trans. research interests include computing theory, and
Neural Netw., vol. 20, no. 11, pp. 1724–1739, Nov. 2009. machine intelligence and pattern recognition.
[27] H. Qu, Z. Yi, and H. Tang, ‘‘Improving local minima of columnar compet-
itive model for TSPs,’’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53,
no. 6, pp. 1353–1362, Jun. 2006.
[28] D. Ciresan, U. Meier, and J. Schmidhuber, ‘‘Multi-column deep neu-
ral networks for image classification,’’ 2012, arXiv:1202.2745. [Online].
Available: https://arxiv.org/abs/1202.2745
[29] A. Mohamed, G. E. Dahl, and G. Hinton, ‘‘Acoustic modeling using deep
belief networks,’’ IEEE Trans. Audio, Speech, Language Process., vol. 20,
no. 1, pp. 14–22, Jan. 2012. HONG QU received the Ph.D. degree in computer
[30] G. E. Dahl, D. Yu, L. Deng, and A. Acero, ‘‘Context-dependent pre- science from the University of Electronic Science
trained deep neural networks for large-vocabulary speech recognition,’’
and Technology of China, Chengdu, in 2006. From
IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30–42,
2014 to 2015, he was a Senior Visiting Scholar
Jan. 2012.
[31] H. L. H. Li, J. Chen, and Z. Chi, ‘‘CNN for saliency detection with with the Humboldt University of Berlin. He is
low-level feature integration,’’ Neurocomputing, vol. 226, pp. 212–220, currently a Professor with the School of Computer
Feb. 2017. Science and Engineering, University of Electronic
[32] Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, Science and Technology of China. His research
L. D. Jackel, and D. Henderson, ‘‘Handwritten digit recognition with a interests include artificial intelligence and neural
back-propagation network,’’ in Proc. Adv. Neural Inf. Process. Syst., 1990, networks.
pp. 396–404.
[33] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
pp. 2278–2324, Nov. 1998.
[34] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, ‘‘Efficient
object localization using convolutional networks,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2015, pp. 648–656.
LI HUANG is currently pursuing the Ph.D.
[35] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and degree with the School of Computer Science
Y. LeCun, ‘‘OverFeat: Integrated recognition, localization and detection and Engineering, University of Electronic Sci-
using convolutional networks,’’ 2014, arXiv:1312.6229. [Online]. Avail- ence and Technology of China. Her research inter-
able: https://arxiv.org/abs/1312.6229 ests include deep learning and natural language
[36] V. Nair and G. E. Hinton, ‘‘Rectified linear units improve restricted Boltz- processing.
mann machines,’’ in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010,
pp. 807–814.
[37] R. Collobert and J. Weston, ‘‘A unified architecture for natural language
processing: Deep neural networks with multitask learning,’’ in Proc. 25th
Int. Conf. Mach. Learn., 2008, pp. 160–167.
[38] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, ‘‘Language modeling
with gated convolutional networks,’’ in Proc. 34th Int. Conf. Mach. Learn.,
vol. 70, 2017, pp. 933–941.
[39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
of deep bidirectional transformers for language understanding,’’ 2018, ZHENG WANG received the bachelor’s and
arXiv:1810.04805. [Online]. Available: https://arxiv.org/abs/1810.04805 Ph.D. degrees from Zhejiang University, China,
[40] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, in 2017 and 2011, respectively. From 2014 to
L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv. 2015, he was a Visiting Scholar with DLR,
Neural Inf. Process. Syst., 2017, pp. 5998–6008. German Aerospace Center funded by CSC. He is
[41] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by currently a Postdoctoral Research Fellow with
jointly learning to align and translate,’’ 2014, arXiv:1409.0473. [Online]. the School of Computer Science and Engineer-
Available: https://arxiv.org/abs/1409.0473 ing, University of Electronic Science and Tech-
[42] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectifier neural net-
nology of China (UESTC). His current research
works,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011, pp. 315–323.
interests include cross-media analysis, computer
vision, and machine learning.

LIN ZUO received the Ph.D. degree in computer


science from the University of Electronic Sci- YONG CHEN is currently pursuing the degree
ence and Technology of China, Chengdu, in 2011. with the School of Mechanical and Electrical
From 2009 to 2010, she was a Visiting Pre- Engineering, University of Electronic Science
doctoral Fellow with Northwestern University, and Technology of China. His research interests
Evanston, IL, USA. She is currently an Asso- include computer vision and visual slam.
ciate Professor with the School of Information and
Software Engineering, University of Electronic
Science and Technology of China. Her research
interests include artificial intelligence and neural
networks.

VOLUME 7, 2019 165457

You might also like