0% found this document useful (0 votes)
64 views

Application Example: Photo OCR Problem Description and Pipeline

The document discusses photo optical character recognition (OCR) and methods to improve machine learning models for photo OCR tasks. It describes the photo OCR pipeline, which includes text detection, character segmentation, and character classification. It then discusses using artificial data synthesis and introducing distortions to generate additional training data when the amount of real data is limited. Finally, it provides examples of using ceiling analysis to determine which part of the machine learning pipeline should be focused on for improvement.
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
64 views

Application Example: Photo OCR Problem Description and Pipeline

The document discusses photo optical character recognition (OCR) and methods to improve machine learning models for photo OCR tasks. It describes the photo OCR pipeline, which includes text detection, character segmentation, and character classification. It then discusses using artificial data synthesis and introducing distortions to generate additional training data when the amount of real data is limited. Finally, it provides examples of using ceiling analysis to determine which part of the machine learning pipeline should be focused on for improvement.
Copyright
© © All Rights Reserved
You are on page 1/ 29

Application example:

Photo OCR
Problem description
and pipeline

Machine Learning
The Photo OCR problem

Andrew Ng
Photo OCR pipeline
1. Text detection

2. Character segmentation

3. Character classification
A N T

Andrew Ng
Photo OCR pipeline

Character Character
Image Text detection
segmentation recognition
Application example:
Photo OCR
Sliding windows

Machine Learning
Text detection Pedestrian detection

Andrew Ng
Supervised learning for pedestrian detection
pixels in 82x36 image patches

Positive examples Negative examples


Andrew Ng
Sliding window detection

Andrew Ng
Sliding window detection

Andrew Ng
Sliding window detection

Andrew Ng
Sliding window detection

Andrew Ng
Text detection

Andrew Ng
Text detection

Positive examples Negative examples

Andrew Ng
Text detection

[David Wu] Andrew Ng


1D Sliding window for character segmentation

Positive examples Negative examples


Andrew Ng
Photo OCR pipeline
1. Text detection

2. Character segmentation

3. Character classification
A N T

Andrew Ng
Application example:
Photo OCR
Getting lots of
data: Artificial
data synthesis
Machine Learning
Character recognition

A N T

I Q A

Andrew Ng
Artificial data synthesis for photo OCR

Abcdefg
Abcdefg
Abcdefg
Abcdefg
Abcdefg
Real data

[Adam Coates and Tao Wang] Andrew Ng


Artificial data synthesis for photo OCR

Real data Synthetic data

[Adam Coates and Tao Wang] Andrew Ng


Synthesizing data by introducing distortions

[Adam Coates and Tao Wang] Andrew Ng


Synthesizing data by introducing distortions: Speech recognition

Original audio:

Audio on bad cellphone connection

Noisy background: Crowd

Noisy background: Machinery

[www.pdsounds.org] Andrew Ng
Synthesizing data by introducing distortions
Distortion introduced should be representation of the type of
noise/distortions in the test set.
Audio:
Background noise,
bad cellphone connection
Usually does not help to add purely random/meaningless noise
to your data.
intensity (brightness) of pixel
random noise
[Adam Coates and Tao Wang] Andrew Ng
Discussion on getting more data
1. Make sure you have a low bias classifier before expending the
effort. (Plot learning curves). E.g. keep increasing the number
of features/number of hidden units in neural network until
you have a low bias classifier.
2. “How much work would it be to get 10x as much data as we
currently have?”
- Artificial data synthesis
- Collect/label it yourself
- “Crowd source” (E.g. Amazon Mechanical Turk)

Andrew Ng
Discussion on getting more data
1. Make sure you have a low bias classifier before expending the
effort. (Plot learning curves). E.g. keep increasing the number
of features/number of hidden units in neural network until
you have a low bias classifier.
2. “How much work would it be to get 10x as much data as we
currently have?”
- Artificial data synthesis
- Collect/label it yourself
- “Crowd source” (E.g. Amazon Mechanical Turk)

Andrew Ng
Application example:
Photo OCR
Ceiling analysis: What
part of the pipeline to
work on next
Machine Learning
Estimating the errors due to each component (ceiling analysis)

Character Character
Image Text detection
segmentation recognition

What part of the pipeline should you spend the most time
trying to improve?
Component Accuracy
Overall system 72%
Text detection 89%
Character segmentation 90%
Character recognition 100%
Andrew Ng
Another ceiling analysis example
Face recognition from images
(Artificial example)

Camera Preprocess
image (remove background)

Eyes segmentation

Face detection Nose segmentation Logistic regression Label

Mouth
segmentation

Andrew Ng
Another ceiling analysis example
Camera Preprocess
image (remove background)

Eyes segmentation

Logistic regression Label


Face detection Nose segmentation
Component Accuracy
Mouth Overall system 85%
segmentation Preprocess (remove
85.1%
background)
Face detection 91%
Eyes segmentation 95%
Nose segmentation 96%
Mouth segmentation 97%
Logistic regression 100%
Andrew Ng

You might also like