0% found this document useful (0 votes)
98 views49 pages

ANPD Project Documentation

This document is a project report submitted for the degree of Bachelor of Technology in Electronics and Communication Engineering. It discusses developing a system for automatic number plate recognition (ANPR) using convolutional neural networks (CNN). The motivation is that ANPR can help identify vehicles that violate traffic laws. The problem is to extract the number plate from an input image, recognize the numbers, and search a database to identify the vehicle. The work carried out includes learning Python and relevant machine learning modules. Image processing techniques like edge detection are used to detect the rectangular number plate from an image for recognition.

Uploaded by

B. SRUTHILAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views49 pages

ANPD Project Documentation

This document is a project report submitted for the degree of Bachelor of Technology in Electronics and Communication Engineering. It discusses developing a system for automatic number plate recognition (ANPR) using convolutional neural networks (CNN). The motivation is that ANPR can help identify vehicles that violate traffic laws. The problem is to extract the number plate from an input image, recognize the numbers, and search a database to identify the vehicle. The work carried out includes learning Python and relevant machine learning modules. Image processing techniques like edge detection are used to detect the rectangular number plate from an image for recognition.

Uploaded by

B. SRUTHILAYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

NUMBER PLATE DETECTION USING CNN

A Project Report

Submitted in partial fulfilment of the


requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY

In
ELECTRONICS AND COMMUNICATION ENGINEERING

By

DIVYASRI KARTHEEK
SRUTHILAYA ANAND PAUL

Under the esteemed guidance


of Smt.B.REVATHI, MTech

Asst. PROFESSOR, ECE DEPARTMENT

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

S.R.K.R. ENGINEERING COLLEGE


(Affiliated to ANDHRA UNIVERSITY, VISAKHAPATNAM)
(Recognized by A.I.C.T.E., Accredited By NAAC., NEW DELHI)

CHINNA AMIRAM, BHIMAVARAM- 534 204


(2016 – 2020)

S.R.K.R. ENGINEERING COLLEGE


(Affiliated to ANDHRA UNIVERSITY, VISAKHAPATNAM)
(Recognized by A.I.C.T.E., Accredited by N.B.A., NEW DELHI)
CHINNA AMIRAM,
BHIMAVARAM-534204 (2016-
2020)

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

BONAFIDE CERTIFICATE
This is to certify that this project work entitled
“TELUGU HANDWRITTEN
RECOGNITION USING CNN”
Is the bonafied work of

Mr/Miss……………………………………………………………………………….
Regd.no ……………………………. of final year B.Tech along with his/her batch
mates submitted in partial fulfilment of the requirements for the award of Degree in
Bachelor of Technology in Electronics and Communication
Engineering during the academic year2016-2020.
Guide: Head of the Department:

Mrs.B.REVATHI Dr. N. UDAY KUMAR


M Tech

Asst Professor of ECE Department Department of ECE


CERTIFICATE OF EXAMINATION

This is to certify that we had examined the report and hereby accord our

approval of it as a project carried out and presented in a manner required for its

acceptance on partial fulfillment for the award of Degree of BACHELOR OF

TECHNOLOGY in ELECTRONICS AND COMMUNICATION ENGINEERING

for which it has been submitted.

This approval does not necessarily endorse or accept every statement made,

opinion expressed or conclusion drawn as recorded in the project; it only signifies the

acceptance of the report for the purpose for which it is submitted .


ACKNOWLEDGEMENT

Our most sincere and grateful acknowledgement to our alma mater SAGI RAMA

KRISHNAM RAJU ENGINEERING COLLEGE for giving us the opportunity to fulfill our

aspirations and for successful completion of the project.

We are highly indebted to Asst. Prof, Smt B.Revathi Department of Electronics and

Communication Engineering, our project guide for giving valuable and timely suggestions

for the project work.

We convey our sincere thanks to UDAY KUMAR , Head of the department of

Electronics and Communication Engineering, for his kind cooperation in the successful

completion of the project work.

We are grateful to our principal , providing us with necessary facilities to carry out

our project.

We extend our sense of gratitude to all our teaching and non-teaching staff and all our

friends, who indirectly helped us in this endeavour.


DECLARATION

This is to certify that the project entitled “AUTOMATIC NUMBER PLATE

RECOGNITION USING CNN'' was submitted by A.DIVYASRI (18B91A0409)

B.SRUTHILAYA (18B91A0421) KARTHEEK(18B91A0457)

ANANDPAUL(18B91A0410), in partial fulfillment of the requirement for the award of

degree in B.Tech in Electronics and Communication Engineering of S.R.K.R. Engineering

college , affiliated to JNTU, KAKINADA comprises only our original work and due

acknowledgement has been made in text to all other materials used.

A.DIVYASRI(18B91A0409)

B.SRUTHILAYA(18B91A0421)

KARTHEEK(18B91A0457)

ANANDPAUL(18B91A0410)
CONTENTS

`
ABSTRACT

Every country's traffic control and vehicle owner identification has become a major issue. It
can be difficult to identify the owner of a vehicle that violates traffic laws and drives too fast.
As a result, it is impossible to apprehend and punish such individuals because traffic officers
may be unable to retrieve the vehicle number from a moving vehicle due to the vehicle's
speed.

As a result, one of the solutions to this problem was the development of Automatic Number
Plate Recognition (ANPR). We have proposed an Optical Character Recognition (OCR)
technique using the ResNet 50 network model which has higher character level accuracy
rates with the goal of exploring current challenges such as working in low and bright light,
blurred images, and detecting fast moving vehicles. Characters in number plate are pre-
processed and segmented using morphological operation and we have achieved good
recognition rate.Initially we have achieved an accuracy of 90%. Due to some confusing
characters in printed letters (like ‘B’ and ‘8’) , the character recognition is mismatched. Use
conditional rendering to get the best possible results. Finally we have achieved an accuracy
of more than 90%

CHAPTER 1:INTRODUCTION
1.1Chapter overview:

Technologies toward smart vehicles, smart cities, and intelligent transportation systems continue
to transform many facets of human life. As a consequence, technologies such as automatic
number plate recognition (ANPR) have become part of our everyday activities. Moreover, the
concept of ANPR is promising to contribute toward various use cases while eliminating human
intervention.

Automatic number plate recognition is a computer vision practice that allows devices to read
license number plates on vehicles quickly and automatically, without any human interaction.
Hence, ANPR is used to capture and identify any number plate accurately through the use of
video or photo footage from cameras.

ANPR is one of the most accurate applications of computer vision systems. Systems for
automated number plate recognition use optical character recognition (OCR) to read vehicle
registration plates. Cameras capture high-speed images of number plates, and software for image
processing is used to detect characters, verify the sequence of those characters, and convert the
number plate image to text.

Typical ANPR systems include a digital image capture unit (camera), a processing unit, and
different algorithms for video analytics. In addition, the use of infrared lighting allows such
systems to capture vehicle registration plates at night, making it possible to operate ANPR at all
hours of the day.

1. Firstly, the ANPR camera captures images that contain a license plate (video stream or
photo).
2. Then, the plate is detected using machine learning and computer vision processes (Object
Detection)
3. Finally, OCR software is applied to the detected plate area to return the license plate
number in text format. The converted number is usually stored in a database for
integration with other IT systems.

1.2Motivation:

Similarly to CCTV, automatic number plate recognition systems can provide you with the details
regarding when someone was at your premises, whenever they are required. The images taken by
this camera can be used as evidence and can provide valuable information that can be used in
investigations.

1.3Problem definition:

The vehicle number plate is considered as an input image, the system should extract that number
from the image and should search the database for that recognized number plate. It should
recognize the number plates even in the low light or shadow like conditions.
1.4 Work carried out in thesis:

We have learned basics of python along with modules such as


Numpy,Matplotlib,tensorflow,pillow,cv2.We have learned basic machine learning models such
as KNN classifier, Linear Regression, Logistics Regression and basic CNN models, Comparison
between different CNN models. We have also learnt different image processing techniques such
as canny edge detection, sobel operator vertical and horizontal projection profiles.

CHAPTER 2:BASICS OF DIGITAL IMAGE PROCESSING

The image of a vehicle whose number plate is to be recognised is taken from a digital camera
which is then loaded to a local computer for further processing. OpenCV Open Source Computer
Vision) is a library of programming functions mainly aimed at real-time computer vision. In
simple language it is a library used for Image Processing. It is mainly used to do all the
operations related to Images. Python, being a versatile language, is used here as a programming
language. Python and its modules like Numpy, Scipy, Matplotlib and other special modules
provide the optimal functionality to be able to cope with the flood of pictures. To enhance the
number plate recognition further, we use a median filter to eliminate noises but it not only
eliminates noise. It concentrates on high frequencies also. So it is more important in edge
detection in an image, generally the number plates are in rectangular shape, so we need to detect
the edges of the rectangular plate. Image Processing mainly involves the following steps:

1. Image acquisition: This is the first step or process of the fundamental steps of digital image
processing. Image Acquisition is the capturing of an image by any physical device (in this case
the primary camera of the computer) so as to take the input as a digital image in the computer.

2. Image Enhancement: Image enhancement is among the simplest and most appealing

areas of digital image processing. Basically, the idea behind enhancement techniques is to bring
out detail that is obscured, or simply to highlight certain features of interest in an image. Such as,
changing brightness & contrast etc. In this step the quality or rather the clarity of the input image
is enhanced and the image is made clear enough to be processed.

3. Morphological Processing: Morphological operations apply a structuring element to an input


image, creating an output image of the same size. The image is converted to a binary image,
making it more to apply structural extraction to the image and extract any structure related to a
particular mathematical model from it, in this case a license plate.

4. Segmentation: Segmentation procedures partition an image into its constituent parts or


objects. In general, autonomous segmentation is one of the most difficult tasks in digital image
processing. A rugged segmentation procedure brings the process a long way toward a successful
solution of imaging problems that require objects to be identified individually.

5. Representation: Representation and description almost always follow the output of a


segmentation stage, which usually is raw pixel data, constituting either the boundary of a region
or all the points in the region itself. Choosing a representation is only part of the solution for
transforming raw data into a form suitable for subsequent computer processing. Description
deals with extracting attributes that result in some quantitative information of interest or are basic
for differentiating one class of objects from another.

6. Recognition: Recognition is the process that assigns a label, such as, “Plate”to an object based
on its descriptors.

CHAPTER 3: INTRODUCTION TO OCR

3.1 OPTICAL CHARACTER RECOGNITION

OCR (optical character recognition) is the use of technology to distinguish printed or


handwritten text characters inside digital images of physical documents, such as a scanned paper
document. The basic process of OCR involves examining the text of a document and translating
the characters into code that can be used for data processing. OCR is sometimes also referred to
as text recognition.

OCR systems are made up of a combination of hardware and software that is used to convert
physical documents into machine-readable text. Hardware, such as an optical scanner or
specialized circuit board is used to copy or read text while software typically handles the
advanced processing. Software can also take advantage of artificial intelligence (AI) to
implement more advanced methods of intelligent character recognition (ICR), like identifying
languages or styles of handwriting.
Fig:3.1The different areas of character recognition

The process of OCR is most commonly used to turn hard copy legal or historic documents into
PDFs. Once placed in this soft copy, users can edit, format and search the document as if it was
created with a word processor.

How optical character recognition works :

The first step of OCR is using a scanner to process the physical form of a document. Once all
pages are copied, OCR software converts the document into a two-color, or black and white,
version. The scanned-in image or bitmap is analyzed for light and dark areas, where the dark
areas are identified as characters that need to be recognized and light areas are identified as
background.

The dark areas are then processed further to find alphabetic letters or numeric digits. OCR
programs can vary in their techniques, but typically involve targeting one character, word or
block of text at a time. Characters are then identified using one of two algorithms:

1. Pattern recognition- OCR programs are fed examples of text in various fonts and formats
which are then used to compare, and recognize, characters in the scanned document.
2. Feature detection- OCR programs apply rules regarding the features of a specific letter or
number to recognize characters in the scanned document. Features could include the
number of angled lines, crossed lines or curves in a character for comparison. For
example, the capital letter “A” may be stored as two diagonal lines that meet with a
horizontal line across the middle.

3.2The History of OCR

The origin of character recognition can actually be traced to as far back as 1870. This was the
year that C.R. carey invented the Retina Scanner which was an image transmission system using
a mosaic of photocells. Further developments are as follows.

• The first attempts of OCR are credited to Tauschek in 1929 and Handel in 1933. In 1929
Gustav Tauschek obtained a patent on OCR in Germany, followed by Handel who obtained a US
patent on OCR in 1933. Tauschek's machine was a mechanical device that used templates and a
photodetector.

• In 1950, David H. Shepard addressed the problem of converting printed messages into
machine language for computer processing and built a machine to do this. Shepard, I Page 63
Chapter 4: Pre-processing, Noise Removal, and Segmentation founded Intelligent Machines
Research Corporation (IMR), which went on to deliver the world's first several OCR systems
used in commercial operation. • In about 1965, Reader's Digest and RCA collaborated to build an
OCR Document reader designed to digitize the serial numbers on Reader Digest coupons
returned from advertisements. This reader was followed by a specialized document reader
installed at TWA where the reader processed Airline Ticket stock. The readers processed
documents at a rate of 1,500 documents per minute, and checked each document, rejecting those
it was not able to process correctly. The product became part of the RCA product line as a reader
designed to process 'Turnaround Documents' such as those Utility and insurance bills returned
with payments.

• The United States Postal Service has been using OCR machines to sort mail since 1965 based
on technology devised primarily by the prolific inventor Jacob Rabinow. The first use of OCR in
Europe was by the British General Post Office (GPO). Canada Post has been using OCR systems
since 1971. OCR systems read the name and address of the addressee at the first mechanized
sorting center and print a routing barcode on the envelope based on the postal code. Envelopes
may then be processed with equipment based on simple barcode readers.

• In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc. and led
development of the first omni-font optical character recognition system - a· computer program
capable of recognizing text printed in any normal font. He created a reading machine for the
blind in 1976. This device required the invention of two enabling technologies - the CCD flatbed
scanner and the text-to-speech synthesizer.

• In 1978, Kurzweil Computer Products began selling a commercial version of the optical
character recognition computer program. LexisNexis was one of the first customers who bought
the program to upload paper legal and news documents onto its nascent online databases. Two
years later, Kurzweil sold his company to Xerox, which had an interest in further
commercializing paper-to-computer text conversion. Kurzweil Computer Products became a
subsidiary of Xerox known as Scansoft, now Nuance Communications.
CHAPTER 4: IMAGE SEGMENTATION AND EDGE BASED METHOD

4.1 IMAGE SEGMENTATION:

In computer vision, image processing is crucial. The image segmentation


technique divides an image into separate segments based on their feature attributes.
There are different segmentation Techniques.They are:

1.Thresholding Method: -
To identify certain threshold values, use the image's Histogram peaks. It is the
simplest method for image segmentation, and no prior knowledge is required. The restriction is
strongly reliant on peaks. Binary images can be created via thresholding.
2. Edge Based Method: -

An image's edge representation Reduces the amount of data that needs to be


processed significantly. The capacity to extract the exact edge line with good largely focused
Based on Discontinuity Detection is the most important attribute of the edge detection technique.

There are different types of Edge detection techniques:


a) Roberts Edge detection:
It measures a 2-D spatial gradient on an image in a straightforward and quick
way.

b) Sobel Edge detection:


It's used to calculate the estimated absolute gradient magnitude at each
location in a grayscale.

c) Prewitt edge deletion:


To correctly estimate the amount and direction of an edge Prewitt.

d) Kirsch edge detection:

It is regarded as a single mask that may be rotated in eight different compass


directions.

e) Canny edge detection:

Canny is a highly essential approach for finding edges by extracting noise


from the image before finding edges of the image.

3. Region-based method:

This method is based on splitting a picture into homogeneous regions that are
more noise-resistant, and it is beneficial when defining similarity criteria is simple. The
disadvantage is that the process is time and memory intensive.

4. Clustering Method:
It is based on homogeneous cluster division. This is primarily used to solve
problems in real time. However, establishing the membership function is difficult in this case. As
a result, we will not always prefer this strategy.

5. Watershed Method:

This method is based on the results of topological interpretation. It is more


stable, detects boundaries, and is continuous. However, the gradient calculation is complicated. It
locates the lines that run along ridge tops.

6. PDE Based Method:

It is based upon the working of differential equations. It is the fastest


method among all and it is best for Time critical applications .The disadvantage is more
computational complexity.
7. ANN Based Method :

It is based on the Simulation of learning process for decision making. The


advantage is there is no need to compile programs. But for training images a lot of time is
wasted.

4.2 EDGE BASED METHOD:

Image processing on computers serves two purposes: Create more appropriate visuals for
individuals to observe and recognise first. Second, we wish that computers could recognise and
understand images automatically. The most fundamental aspects of an image are its edges. It
provides a variety of internal image information. As a result, edge detection is one of the most
important image processing research projects.

- Edges are significant local changes of intensity in an image.


- Edges typically occur on the boundary between two different regions in an image.
• Goal of edge detection
- Produce a line drawing of a scene from an image of that scene.
- Important features can be extracted from the edges of an image (e.g., corners,
lines, curves).
- These features are used by higher-level computer vision algorithms (e.g., recognition).

There are numerous methods for detecting edges (Sobel [1,2], Prewitt [3], Roberts [4],
Canny [5]). These techniques have been presented for identifying image transitions.
Edges define boundaries and are hence a critical issue in image processing. Images with
strong intensity contrasts — a jump in intensity from one pixel to the next – are called
edges. Edge detection minimizes the quantity of data in an image and filters away
unnecessary information while keeping the image's crucial structural qualities. Edge
detection can be done in a variety of methods. The majority of approaches, however, may
be divided into two categories: gradient and Laplacian. The gradient approach finds edges
by looking for maximum and minimum values in the image's first derivative.

Figure 3.1 signal with an edge shown by the jump in intensity

If we take the gradient of this signal (which, in one dimension, is just the first derivative with
respect to t) we get the following:
Figure 3.2 gradient of the above signal

The derivative clearly reveals a maximum in the original signal at the middle of the edge.
The Sobel method is part of the "gradient filter" family of edge detection filters, which
includes this way of locating an edge. If the gradient value exceeds a certain threshold, a
pixel position is deemed an edge location. Edges have higher pixel intensity values than the
surrounding intensity values. After setting a threshold, we can compare the gradient value to
the threshold value and detect an edge anytime the threshold is surpassed. Furthermore, the
second derivative is zero when the first derivative is at its greatest. As a result, finding the
zeros in a matrix provides another option for locating the location of an edge.

Figure 3.3 second derivative of the signal

The Sobel operator performs a 2-D spatial gradient measurement on an image and
emphasizes regions of high spatial gradient that correspond to edges. Typically, it is used to
find the approximate absolute gradient magnitude at each point in an input grayscale image.
Compared to other edge operator, Sobel has two main advantages:
1.Since the introduction of the average factor, it has some smoothing effect on the
random noise of the image.

2.Because it is the differential of two rows or two columns, so the elements of the
edge on both sides have been enhanced, so that the edge seems thick and bright.

4.3 AN EDGE DETECTION MODEL BASED ON CANNY OPERATOR :

Canny Edge Detection is a popular edge detection algorithm. It was developed by John F.
Canny in 1986. It is a multi-stage algorithm and we will go through each stage.

1. Noise Reduction

Since edge detection is susceptible to noise in the image, the first step is to remove the
noise in the image with a 5x5 Gaussian filter. We have already seen this in previous chapters.

2. Finding Intensity Gradient of the Image

Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical
direction to get the first derivative in horizontal direction ( ) and vertical direction ( ). From
these two images, we can find edge gradient and direction for each pixel as follows:

Gradient direction is always perpendicular to


edges. It is rounded to one of four angles representing vertical, horizontal and two diagonal
directions.
3. Non-maximum Suppression

After getting gradient magnitude and direction, a full scan of the image is done to
remove any unwanted pixels which may not constitute the edge. For this, at every pixel, the
pixel is checked if it is a local maximum in its neighborhood in the direction of gradient.
Check the image below:

Fig 3.4 Non maximum Suppression

Point A is on the edge (in vertical direction). Gradient direction is normal to the edge. Point
B and C are in gradient directions. So, point A is checked with point B and C to see if it
forms a local maximum. If so, it is considered for the next stage, otherwise, it is suppressed
(put to zero).

In short, the result you get is a binary image with “thin edges”.

4. Hysteresis Thresholding :

This stage decides which all edges are really edges and which are not. For this, we
need two threshold values, min Val and max Val. Any edges with intensity gradient more
than max Val are sure to be edges and those below min Val are sure to be non-edges, so
discarded. Those who lie between these two thresholds are classified edges or non-edges
based on their connectivity. If they are connected to “sure-edge” pixels, they are considered
to be part of edges. Otherwise, they are also discarded. See the image below:
Fig3.5 Hysteresis Thresholding

The edge A is above the max Val, so considered as “sure-edge”. Although edge C is
below max Val, it is connected to edge A, so that is also considered a valid edge and we get
that full curve. But edge B, although it is above min Val and is in the same region as that of
edge C, it is not connected to any “sure-edge”, so that is discarded. So it is very important
that we have to select min Val and max Val accordingly to get the correct result.

This stage also removes small pixel noises on the assumption that edges are long
lines. So what we finally get is strong edges in the image.

4.4 Comparison of Sobel operator and Canny edge detectors

Sobel Operator:

The Sobel operator's greatest advantage is its simplicity. The gradient magnitude is
approximated using the Sobel method. The Sobel operator also has the ability to recognise
edges and their orientations. The detection of edges and their orientations is said to be simple
in this cross operator due to the approximation of the gradient magnitude. The Sobel
approach, on the other hand, has several downsides. That is, it is affected by noise. As the
amount of noise in the image rises, the magnitude of the edges will deteriorate. When a
result, as the magnitude of the edges diminishes, Sobel operator accuracy suffers. Overall,
the Sobel approach is unable to detect a thin and smooth edge accurately.

Canny Edge Detector:

Any noise in an image can be eliminated with a Gaussian filter. The second benefit is that the
signal is enhanced in relation to the noise ratio. The non-maxima suppression method is used
since the output is one pixel broad ridges. The third benefit of using the thresholding method is
that it improves edge identification, especially in noisy situations. Adjustable parameters
influence the efficiency of the Canny technique. Fine filters are preferred for detecting small,
crisp lines since they generate less blurring. For identifying larger, smoother edges, large filters
are preferred. However, it increases the likelihood of blurring. The main downside of the Canny
edge detector is that it takes a long time to compute due to its complexity.

Operator Strengths Weaknesses

Sobel Simple. Detects edges and their Inaccurate and sensitive to


orientation noise

Canny Smoothing effect to remove Difficult to implement to


noise. Good localization and reach real time response.
response. Enhances signal to Time consuming.
noise ratio. Immune to noisy
environments.
Table 1: Compares strengths and weaknesses of two detectors.

CHAPTER 5
INTRODUCTION OF NEURAL NETWORKS
The word "neural networks" conjures up images of brain activity. It conjures up images of robots
that resemble brains, and it could be packed with the Frankenstein mythos' science fiction
undertones. One of the book's main goals is to deconstruct neural networks and demonstrate
how, while they do have something to do with brains, their study intersects with other fields of
science, engineering, and mathematics. Although some mathematical notation is required to
quantitatively explain particular principles, procedures, and structures, the goal is to do it in as
non-technical a manner as possible. Nonetheless, all symbols and expressions will be explained
as they come, so that they do not get in the way of the essentials, which are the thoughts and
ideas that are being discussed.

5.1 NEURAL NETWORKS:

A neural network is an interconnected assembly of simple processing elements, units or


nodes, whose functionality is loosely based on the animal neuron. The processing ability of the
network is stored in the interunit connection strengths, or weights, obtained by a process of
adaptation to, or learning from, a set of training patterns

To flesh this out a little we first take a quick look at some basic neurobiology. The human
brain consists of an estimated (100 billion) nerve cells or neurons, a highly stylized example of
which is shown in Figure. Neurons communicate via electrical signals that are short-lived
impulses or "spikes" in the voltage of the cell wall or membrane. The interneuron connections
are mediated by electrochemical junctions called synapses, which are located on branches of the
cell and referred to as dendrites. Each neuron typically receives many thousands of connections
from Figure Essential components of a neuron, shown in stylized form. other neurons and is
therefore constantly receiving a multitude of incoming signals, which eventually reach the cell
body. Here, they are integrated or summed together in some way and, roughly speaking, if the
resulting signal exceeds some threshold then the neuron will "fire" or generate a voltage impulse
in response. This is then transmitted to other neurons via a branching fiber known as the axon. In
determining whether an impulse should be produced or not, some incoming signals produce an
inhibitory effect and tend to prevent firing, while others are excitatory and promote impulse
generation. The distinctive processing ability of each neuron is then supposed to reside in the
type—excitatory or inhibitory—and strength of its synaptic connections with other neurons.

TYPES OF NEURAL NETWORKS

5.2 Artificial Neural Networks:

An artificial neural network (ANN) is a hardware or software system that mimics the way
neurons in the brain and nervous system work. Artificial neural networks are a type of deep
learning technology that falls under the Artificial Intelligence umbrella.

Deep learning is a subset of machine learning that employs a variety of neural networks. Because
these algorithms are based on how our brains work, many scientists feel they are our best chance
at achieving true AI (Artificial Intelligence).

5.3 Different Types of Neural Networks

Different types of neural networks use different principles in determining their own rules.
There are many types of artificial neural networks, each with their unique strengths. You can
take a look at this video to see the different types of neural networks and their applications in
detail.
5.3.1) Feedforward Neural Network – Artificial Neuron

One of the most basic types of artificial neural networks is feedforward. The data in a
feedforward neural network flows via many input nodes before reaching the output node.

To put it another way, data only goes in one direction from the top tier to the output node. This is
also known as a front propagating wave, and it's normally accomplished with the help of a
classifying activation function.

There is no back propagation in this sort of neural network, and data only moves in one way. A
feed-forward neural network might have only one layer or multiple levels.

The sum of the products of the inputs and their weights is determined in a feed-forward neural
network.

Fig 4.2 Feedforward Neural Network – Artificial Neuron


Feedforward neural networks are used in technologies like face recognition and computer
vision. This is because the target classes in these applications are hard to classify.

A simple feedforward neural network is equipped to deal with data which contains a lot
of noise. Feedforward neural networks are also relatively simple to maintain.

5.3.2) Radial Basis Function Neural Network:

A radial basis function considers the distance of any point relative to the center. Such
neural networks have two layers. In the inner layer, the features are combined with the radial
basis function.

Then the output of these features is taken into account when calculating the same output
in the next time-step. Here is a diagram which represents a radial basis function neural network.

The radial basis function neural network is applied extensively in power restoration
systems. In recent decades, power systems have become bigger and more complex.

This increases the risk of a blackout. This neural network is used in the power restoration
systems in order to restore power in the shortest possible time.

5.5.3). Multilayer Perceptron :

A multilayer perceptron has three or more layers. It is used to classify data that cannot be
separated linearly. It is a type of artificial neural network that is fully connected. This is because
every single node in a layer is connected to each node in the following layer.
A multilayer perceptron uses a nonlinear activation function (mainly a hyperbolic tangent
or logistic function). Here’s what a multilayer perceptron looks like

Fig 4.3 Multilayer Perceptron

This type of neural network is applied extensively in speech recognition and machine
translation technologies.

5.3.4). Convolutional Neural Network :

A convolutional neural network (CNN) employs a multilayer perceptual variation. A CNN may
have one or more convolutional layers. These layers can be entirely integrated or grouped
together.

The convolutional layer performs a convolutional operation on the input before transferring the
result to the next layer. The network can be much deeper but with fewer parameters thanks to
this convolutional process.
Convolutional neural networks excel at image and video recognition, natural language
processing, and recommender systems because of this ability.

Convolutional neural networks excel at semantic parsing and paraphrase identification as well.
They're also used in picture categorization and signal processing.

CNNs are also used in agriculture for picture analysis and recognition where meteorological
features are important.

Fig 4.5 Convolutional Neural Network

(f * g) (i) =∑_(j=1)^m▒〖g(j) .〗 f (i – j + m/2)

Which is nothing but a dot product of the input function and a kernel function.In case of
Image processing, it is easier to visualize a kernel as sliding over an entire image and thus
changing the value of each pixel in the process.
Fig 4.6 kernel matrix

Pooling: Pooling is a sample-based discretization process. The objective is to down-


sample an input representation (image, hidden-layer output matrix, etc.), reducing its
dimensionality and allowing for assumptions to be made about features contained in the sub-
regions binned.

Fig 4.7 Pooling


Fully connected:Fully connected layers connect every neuron in one layer to every
neuron in another layer. It is in principle the same as the traditional multi-layer perceptron neural
network (MLP). The flattened matrix goes through a fully connected layer to classify the images.

Receptive field:In neural networks, each neuron receives input from some number of
locations in the previous layer. In a fully connected layer, each neuron receives input from every
element of the previous layer. In a convolutional layer, neurons receive input from only a
restricted subarea of the previous layer. Typically, the subarea is of a square shape (e.g., size 5
by 5). The input area of a neuron is called its receptive field. So, in a fully connected layer, the
receptive field is the entire previous layer. In a convolutional layer, the receptive area is smaller
than the entire previous layer.

Weights:Each neuron in a neural network computes an output value by applying a


specific function to the input values coming from the receptive field in the previous layer. The
function that is applied to the input values is determined by a vector of weights and a bias
(typically real numbers). Learning, in a neural network, progresses by making iterative
adjustments to these biases and weights.

The vector of weights and the bias are called filters and represent particular features of
the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can
share the same filter. This reduces memory footprint because a single bias and a single vector of
weights are used across all receptive fields sharing that filter, as opposed to each receptive field
having its own bias and vector weighting.

Distinguishing features:

3D volumes of neurons: The layers of a CNN have neurons arranged in 3 dimensions:


width, height and depth where each neuron inside a convolutional layer is connected to only a
small region of the layer before it, called a receptive field. Distinct types of layers, both locally
and completely connected, are stacked to form a CNN architecture.

Local connectivity: following the concept of receptive fields, CNNs exploit spatial locality
by enforcing a local connectivity pattern between neurons of adjacent layers. The architecture
thus ensures that the learned "filters" produce the strongest response to a spatially local input
pattern. Stacking many such layers leads to non-linear filters that become increasingly global
(i.e. responsive to a larger region of pixel space) so that the network first creates representations
of small parts of the input, then from them assembles representations of larger areas.

Shared weights: In CNNs, each filter is replicated across the entire visual field. These
replicated units share the same parameterization (weight vector and bias) and form a feature
map. This means that all the neurons in a given convolutional layer respond to the same feature
within their specific response field. Replicating units in this way allows for the resulting feature
map to be equivariant under changes in the locations of input features in the visual field, i.e. they
grant translational equivariance.

Pooling: In a CNN's pooling layers, feature maps are divided into rectangular sub-regions, and
the features in each rectangle are independently down-sampled to a single value, commonly by
taking their average or maximum value. In addition to reducing the sizes of feature maps, the
pooling operation grants a degree of translational invariance to the features contained therein,
allowing the CNN to be more robust to variations in their positions.

Together, these properties allow CNNs to achieve better generalization on vision


problems. Weight sharing dramatically reduces the number of free parameters learned, thus
lowering the memory requirements for running the network and allowing the training of larger,
more powerful networks.

Spatial arrangement:Three hyper parameters control the size of the output volume of the
convolutional layer: the depth, stride and zero-padding.

● The Depth of the output volume controls the number of neurons in a layer that connect to
the same region of the input volume. These neurons learn to activate different features in
the input. For example, if the first convolutional layer takes the raw image as input, then
different neurons along the depth dimension may activate in the presence of various
oriented edges, or blobs of color.
● Stride controls how depth columns around the spatial dimensions (width and height) are
allocated. When the stride is 1 then we move the filters one pixel at a time. This leads to
heavily overlapping receptive fields between the columns, and also to large output
volumes. When the stride is 2 then the filters jump 2 pixels at a time as they slide around.
Similarly, for any S > 0 integer a stride of S causes the filter to be translated by S units at
a time per output. In practice, stride lengths of S >= 3 are rare. The receptive fields
overlap less and the resulting output volume has smaller spatial dimensions when stride
length is increased.
● Sometimes it is convenient to pad the input with zeros on the border of the input volume.
The size of this padding is a third hyper parameter. Padding provides control of the
output volume and spatial size. In particular, sometimes it is desirable to exactly preserve
the spatial size of the input volume.

5.3.5) Modular Neural Network

A modular neural network has a number of different networks that function


independently and perform sub-tasks. The different networks do not really interact with or signal
each other during the computation process. They work independently towards achieving the
output.

As a result, a large and complex computational process can be done significantly faster
by breaking it down into independent components. The computation speed increases because the
networks are not interacting with or even connected to each other. Here’s a visual representation
of a Modular Neural Network.
Fig 4.8 Modular Neural Network

5.3.6) Sequence-To-Sequence Models :

A sequence to sequence model consists of two recurrent neural networks. There’s an


encoder that processes the input and a decoder that processes the output. The encoder and
decoder can either use the same or different parameters. This model is particularly applicable in
those cases where the length of the input data is not the same as the length of the output data.

Sequence-to-sequence models are applied mainly in chatbots, machine translation, and


question answering systems.

CHAPTER 6 : ADVANTAGES AND DISADVANTAGES

ADVANTAGES:

● Added security
ANPR largely acts as a deterrent. The knowledge that their number plate is being recorded and
checked is usually enough to stop criminal behavior in advance. ANPR is also useful for the
police, who can browse the data collected and check for suspicious vehicles, or vehicles that
were involved in a crime. Thanks to the need to store the data for a short while, ANPR data can
provide both alibis and incriminating data. ANPR also provides security on a lower level, such as
open workplace parking where it can manage permit parking for staff vehicles, or recognise a
vehicle that has previously been banned from your premises. ANPR offers an extra measure of
security for both public and private use.
● Automated service
ANPR cameras are an efficient and cost-effective way to monitor parking solutions. In car parks,
they negate the need for parking wardens. Thanks to their high-accuracy readings and 24/7
operation, they are more efficient than most individuals and therefore provide a more dependable
service. They also offer a confrontation-free parking solution, which some have found to be
beneficial when delivering fines to drivers. Parking management teams often find that both
traffic personnel and ANPR cameras work well together, especially in traffic and parking
enforcement, where staff can rely on ANPR to provide the necessary information, minimizing
the time they spend on the streets.

● Real-time benefits
ANPR is beneficial to many industries thanks to the real-time imaging it offers. Historically,
number plate recording would take time, and then longer still to send out penalty notices to those
who violate traffic laws. With ANPR however, number plates can be recognised and checked
against the database almost instantaneously. From this, it takes as little as 48 hours to issue a
penalty notice. The fast nature of these cameras allows for an immediate response to criminal
activity, making sure no unwanted behavior goes unchecked.

● Cost-effective

As well as being easier and more efficient, ANPR technology is also one of the most cost-
effective solutions for managing your car park. You will be able to cut costs and reduce the need
for security personnel when you choose this smart solution. Many companies will also issue
fines to anyone picked up by their ANPR system that shouldn’t be on their private property or
anyone that has exceeded the maximum time limit. This can bring in extra money for the
company and may even end up paying for this security solution.
DISADVANTAGES:

● Privacy Concerns
Using ANPR cameras raises privacy concerns for many people, who dislike the idea of their data
being stored for months. There is the concern that storing information could lead to data leaks
and theft, or misuse of their personal information. People also dislike the idea of their
whereabouts being known at all times. However, ANPR is not considered an infringement on an
individual’s privacy, and the data is always stored securely and should only be accessed for good
reason by a senior official.

● Extreme circumstances
While a great addition to a car park, ANPR is not a fool-proof method. ANPR cameras may
struggle to work in adverse weather conditions, such as heavy rain, or snow, where the number
plate is obscured or distorted. These cameras also rely on sensible driving from cars. For
example, if, when leaving the car park, you were too close to the car in front and your number
plate was obscured, the cameras may not recognise that you had left, and could end up
overcharging you. Not only this, some ANPR cameras are not advanced enough to recognise
number plates that vary from the standard, such as vanity or foreign plates. In these situations, it
would be useful to mix both personnel and automated systems.

● Human behavior
A disadvantage of ANPR parking systems is that they rarely take into account human error and
behavior. ANPR systems do not usually consider giving a grace period when you enter a car
park. This means that those drivers who enter the car park and don’t find a space can be charged,
as the camera saw them enter and leave, but can find no matching ticket. Similarly, a mistyped
ticket at a ticket machine, for example, using the letter ‘O’ instead of a zero, can result in a fine,
as the system cannot find a matching ticket to the number plate the cameras read.

CHAPTER 7: SYSTEM MODEL


Input image -> grayscale Conversion -> Number Plate Detection -> character level
segmentation -> CNN based training& testing phase -> recognized text

Fig 7.1 SYSTEM MODEL

5.1 PREPROCESSING:
● The aim of pre-processing is an improvement of the image data that suppresses
unwanted distortions or enhances some image features important for further
processing.
● Pre-processing includes the following stages

Binarization.

Skew removal.

Noise removal.

Processed image.

5.1.1Binarization:

Binarization is an active research area in the field of Document Image Processing.


Binarization converts grey images into binarized images. Document image binarization is the
most important step in pre-processing scanned documents to save all or maximum
subcomponents such as text, background and image. Binarization computes the threshold
value that differentiates object and background pixels. Colour and grey level image
processing consumes lots of executive powers. But binarized images decrease the
computational load and increase the efficiency of the given systems. Binarization has many
advantages such as medical image processing, document image analysis, face recognition
etc. Binarization can be classified into two categories: global and adaptive. Global methods
are based on finding a single threshold value for the entire image, and adaptive methods are
based on the local information obtained from the candidate pixel and is needed for the
calculation of the threshold value for every pixel. If clarification of input image is not similar
(evenly illuminated), local methods might perform better. If the image has equal illumination
then global methods can work better. But global methods cannot handle any of the image
degradations and are not able to remove noise. Local methods are significantly more time-
consuming and computationally expensive. Fast and accurate algorithms are necessary for
Document Image Binarization.
The following command achieves Binarization:

Greyscale_image = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)

Where orig_img is original image

5.1.2 Skew removal:

The image obtained after scanning an opened book page usually suffers from various
scanning artefacts. One such major artefact is the Skew defect. This defect reduces the
quality of the scanned images and cause many problems to the process of document image
analysis. It is difficult to understand such documents by the Optical Character Recognizer
(OCR). Some effective methods are present to rectify this error.

5.1.3 Noise removal:

Images are often degraded by noises. Noise can occur and obtain during image capture,
transmission, etc. Noise removal is an important task in image processing. In general, the
results of the noise removal have a strong influence on the quality of the image processing
techniques. Several techniques for noise removal are well established in colour image
processing. The nature of the noise removal problem depends on the type of noise corrupting
the image. In the field of image noise reduction, several linear and nonlinear filtering
methods have been proposed. Linear filters are not able to effectively eliminate impulse
noise as they have a tendency to blur the edges of an image. On the other hand, nonlinear
filters are suited for dealing with impulse noise. Several nonlinear filters based on Classical
and fuzzy techniques have emerged in the past few years. For example, most classical filters
that remove simultaneously blur the edges, while fuzzy filters have the ability to combine
edge preservation and smoothing. Compared to other nonlinear techniques, fuzzy filters are
able to represent knowledge in a comprehensible way.

5.1.4 Processed image:


A processed image is enhanced over the original image if it allows the observer to better
perceive the desirable information in the image.

Fig 7.2 Processed Image

5.3 Number Plate Detection:


Taking image as input, then applying ‘haar cascade’ that is pre-trained to detect Indian
license plates, here the parameter scale factor stands for a value by which the input image
can be scaled for better detection of Number plate.

The haar cascade is used for the line or edge detection features and it is fast at computing the
features due to the use of integral images

● Read the .xml file .Convert the image to gray scale and do contouring to detect
number plate in an image by defining width and height
● Read the input image file and mark the number plate
● Convert the image again to gray scale now with the detected number plate
● Perform Morphological transform on the image
● Perform edge detection. Cropping the image so as to just focus on number plate

(7.4) create bounding box (7.5) Detected number plate


5.4 Character segmentation:
Finally, this stage segments the word image into characters by the gaps between two
characters. We have implemented a vertical projection profile in order to extract the
character image from a word image

Step1: Firstly, find the pixel intensities of the image along the width of the image and add
them with the previous value and place them in a list.

Step2: Now take that list and split the list whenever the same value occurs twice (if there is
no character the pixel intensity at that position is “0” so adding the previous value to those
gives the same value).

Step3: Take each value from the list and subtract it from the previous value to have the
character widths.

Step-4: Now we have a list of values which have been segmented perfectly

(7.6) Word Segmentation


Neural Network Description:

Fig 7.7 Resnet-50 Model

Our neural network consists of a convolution neural network that is used for feature
extraction. The output of the convolution neural network is then connected to the Resnet
neural network

The Resnet neural network uses different sizes of filters to extract features at different
levels of the image. Our neural network consists of 4 different convolutional filters of
different sizes (1x1 3x3). All the different outputs of the filters are concatenated by using a
concatenation filter and the most important features are extracted by using the max pool
layer.
The creation of the different layers of the neural network can be achieved by using the
below commands in python

#Convolutional neural network

Conv1=tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME', name=name)

h_pool1 = max_pool_3x3(h_conv1, name='h_pool1')

#Resnet layers (Convolution + Subsampling)

W_conv1 = tfc.get_variable('W_conv1', shape=[7, 7, 1, 64],

initializer=tf.contrib.layers.variance_scaling_initializer())

b_conv1 = tf.Variable(tf.constant(0.1, shape=[64]), name='b_conv1')

h_conv1 = tf.nn.relu(conv2d_2(x_image, W_conv1) + b_conv1, name='h_conv1')

Training and Testing:

First, the images are converted into Greyscale which removes unwanted details and then fed
into our neural network. The size of our dataset is more than 2000 samples, Divide the entire
dataset for both testing and training which is 80% of the dataset will go to the training part
and 20% of data for testing We trained with around 2000 images with 20,000 training steps
and with a batch size of 64. Finally achieved an accuracy of more than 90%
CHAPTER 8: RESULTS

Fig 8.1 Input image Fig 8.2 Output


Fig 8.3 Training Graph

The X-axis represents the training steps and the Y-axis represents the accuracy. The graph
first starts at an accuracy of 25% and increases as the number of batches increases and
saturates at around 20,000 samples. Continuous Training after this point leads to overfitting
and decreases the accuracy of the model(test accuracy). We got the highest accuracy.

CHAPTER 9: CONCLUSION

In conclusion to retrieve the vehicle number from a moving vehicle due to the vehicle's

speed. We have proposed an Optical Character Recognition (OCR) technique using

ResNet 50 network model which has higher character level accuracy rates.The time

required for training the model is less for Resnet as compared to other models.The Haar

Cascade algorithm efficiently performs the edge detection of number plate from a input

image.The Segmentation analysis efficiently performs character segmentation of number

plate.Initially we have achieved an accuracy of 90%. Due to some confusing characters

in printed letters (like ‘B’ and ‘8’) , the character recognition is mismatched. Use

conditional rendering to get the best possible results. Finally we have achieved an

accuracy of more than 90%


CHAPTER 10: REFERENCES

[1] V Gnanaprakash ,N Kanthimathi ,Saranya Naga,Automatic number plate recognition


using deep learning,IOP conference series materials science and engineering .
[2] I.V. Pustokhina, D.A. Pustokhin, J.J.P.C. Rodrigues, D. Gupta, A. Khanna,K. Shankar,
C. Seo, G.P. Joshi, “Automatic Vehicle License Plate Recognition Using Optimal K-Means
with Convolutional Neural Network for Intelligent Transportation Systems,” IEEE Access,
8, 92907–92917.
[3] G. Kumar, A. Barman, M. Pal, “License Plate Tracking using Gradient based
Segmentation,” IEEE Region 10 Annual International Conference,Proceedings/TENCON,
2019-Octob, 1737–1740, 2019,doi:10.1109/TENCON.2019.8929688.
[4] A. Kashyap, B. Suresh, A. Patil, S. Sharma, A. Jaiswal, “Automatic Number Plate
Recognition,” Proceedings - IEEE 2018 International Conference on Advances in
Computing, Communication Control and Networking,ICACCCN 2018, 838–843, 2018
[5] M. Mondal, P. Mondal, N. Saha, P. Chattopadhyay, “Automatic number plate
recognition using CNN based self synthesized feature learning,” 2017 IEEE Kolkata
Conference, CALCON 2017 - Proceedings, 2018-January, 378–381, 2018.
CHAPTER 11: SOURCE CODE
import sys

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import tensorflow as tf

import cv2

sys.path.append('../src')

from ocr.normalization import word_normalization, letter_normalization

from ocr import page, words

from ocr.helpers import implt, resize

from ocr.tfhelpers import Model

from ocr.newdatahelpers import idx2char

from ocr.segment2 import segment

%matplotlib inline

plt.rcParams['figure.figsize'] = (15.0, 10.0)

IMG = '../data/pages/v2.jpeg'

LANG = 'en_hw'
MODEL_LOC_CHARS = '../models/char-clas/' + LANG + '/CharClassifier'

CHARACTER_MODEL = Model(MODEL_LOC_CHARS)

image = cv2.cvtColor(cv2.imread(IMG), cv2.COLOR_BGR2RGB)

implt(image)

crop = page.detection(image)

implt(crop)

boxes = words.detection(crop)

lines = words.sort_words(boxes)

def recognise(img):

"""Recognition using character model"""

# Pre-processing the word

img = word_normalization(

img,

60,

border=False,

tilt=True,

hyst_norm=True)

# Separate letters

img = cv2.copyMakeBorder(

img,

0, 0, 30, 30,

cv2.BORDER_CONSTANT,
value=[0, 0, 0])

#implt(img)

#gaps1 = characters.segment(img, RNN=True)

gaps=segment(img)

#print(gaps)

chars = []

for i in range(len(gaps)-1):

char = img[:, gaps[i]:gaps[i+1]]

implt(char)

char, dim = letter_normalization(char, is_thresh=True, dim=True)

# TODO Test different values

if dim[0] > 4 and dim[1] > 4:

chars.append(char.flatten())

chars = np.array(chars)

word = ''

if len(chars) != 0:

pred = CHARACTER_MODEL.run(chars)

#print(pred)

for c in pred:

word += idx2char(c)

return word

implt(crop)
for line in lines:

print(" ".join([recognise(crop[y1:y2, x1:x2]) for (x1, y1, x2, y2) in line]))

You might also like