Application Example: Photo OCR Problem Description and Pipeline
Application Example: Photo OCR Problem Description and Pipeline
Photo OCR
Problem description
and pipeline
Machine Learning
The Photo OCR problem
Andrew Ng
Photo OCR pipeline
1. Text detection
2. Character segmentation
3. Character classification
A N T
Andrew Ng
Photo OCR pipeline
Character Character
Image Text detection
segmentation recognition
Application example:
Photo OCR
Sliding windows
Machine Learning
Text detection Pedestrian detection
Andrew Ng
Supervised learning for pedestrian detection
pixels in 82x36 image patches
Andrew Ng
Sliding window detection
Andrew Ng
Sliding window detection
Andrew Ng
Sliding window detection
Andrew Ng
Text detection
Andrew Ng
Text detection
Andrew Ng
Text detection
2. Character segmentation
3. Character classification
A N T
Andrew Ng
Application example:
Photo OCR
Getting lots of
data: Artificial
data synthesis
Machine Learning
Character recognition
A N T
I Q A
Andrew Ng
Artificial data synthesis for photo OCR
Abcdefg
Abcdefg
Abcdefg
Abcdefg
Abcdefg
Real data
Original audio:
[www.pdsounds.org] Andrew Ng
Synthesizing data by introducing distortions
Distortion introduced should be representation of the type of
noise/distortions in the test set.
Audio:
Background noise,
bad cellphone connection
Usually does not help to add purely random/meaningless noise
to your data.
intensity (brightness) of pixel
random noise
[Adam Coates and Tao Wang] Andrew Ng
Discussion on getting more data
1. Make sure you have a low bias classifier before expending the
effort. (Plot learning curves). E.g. keep increasing the number
of features/number of hidden units in neural network until
you have a low bias classifier.
2. “How much work would it be to get 10x as much data as we
currently have?”
- Artificial data synthesis
- Collect/label it yourself
- “Crowd source” (E.g. Amazon Mechanical Turk)
Andrew Ng
Discussion on getting more data
1. Make sure you have a low bias classifier before expending the
effort. (Plot learning curves). E.g. keep increasing the number
of features/number of hidden units in neural network until
you have a low bias classifier.
2. “How much work would it be to get 10x as much data as we
currently have?”
- Artificial data synthesis
- Collect/label it yourself
- “Crowd source” (E.g. Amazon Mechanical Turk)
Andrew Ng
Application example:
Photo OCR
Ceiling analysis: What
part of the pipeline to
work on next
Machine Learning
Estimating the errors due to each component (ceiling analysis)
Character Character
Image Text detection
segmentation recognition
What part of the pipeline should you spend the most time
trying to improve?
Component Accuracy
Overall system 72%
Text detection 89%
Character segmentation 90%
Character recognition 100%
Andrew Ng
Another ceiling analysis example
Face recognition from images
(Artificial example)
Camera Preprocess
image (remove background)
Eyes segmentation
Mouth
segmentation
Andrew Ng
Another ceiling analysis example
Camera Preprocess
image (remove background)
Eyes segmentation