OpenCV 3 Blueprints - Sample Chapter
OpenCV 3 Blueprints - Sample Chapter
Steven Puttemans
P U B L I S H I N G
Joseph Howse
Utkarsh Sinha
$ 44.99 US
28.99 UK
pl
C o m m u n i t y
Quan Hua
Sa
m
OpenCV 3 Blueprints
OpenCV 3 Blueprints
ee
E x p e r i e n c e
OpenCV 3 Blueprints
Expand your knowledge of computer vision by building amazing
projects with OpenCV 3
D i s t i l l e d
Joseph Howse
Steven Puttemans
Quan Hua
Utkarsh Sinha
and his four cats grow thick coats of fur. He combs the cats every day. Sometimes,
the cats pull his beard.
Joseph has been writing for Packt Publishing since 2012. His books include
OpenCV for Secret Agents, OpenCV 3 Blueprints, Android Application Programming
with OpenCV 3, Learning OpenCV 3 Computer Vision with Python, and Python Game
Programming by Example.
When he is not writing books or grooming cats, Joseph provides consulting,
training, and software development services through his company, Nummist
Media (http://nummist.com).
Preface
Open source computer vision projects, such as OpenCV 3, enable all kinds of users
to harness the forces of machine vision, machine learning, and artificial intelligence.
By mastering these powerful libraries of code and knowledge, professionals and
hobbyists can create smarter, better applications wherever they are needed.
This is exactly where this book is focused, guiding you through a set of hands-on
projects and templates, which will teach you to combine fantastic techniques in
order to solve your specific problem.
As we study computer vision, let's take inspiration from these words:
"I saw that wisdom is better than folly, just as light is better than darkness."
Ecclesiastes, 2:13
Let's build applications that see clearly, and create knowledge.
Preface
Chapter 4, Panoramic Image Stitching Application Using Android Studio and NDK,
focuses on the project of building a panoramic camera app for Android with
the help of OpenCV 3's stitching module. We will use C++ with Android NDK.
Chapter 5, Generic Object Detection for Industrial Applications, investigates ways
to optimize your object detection model, make it rotation invariant, and apply
scene-specific constraints to make it faster and more robust.
Chapter 6, Efficient Person Identification Using Biometric Properties, is about building a
person identification and registration system based on biometric properties of that
person, such as their fingerprint, iris, and face.
Chapter 7, Gyroscopic Video Stabilization, demonstrates techniques for fusing data
from videos and gyroscopes, how to stabilize videos shot on your mobile phone,
and how to create hyperlapse videos.
[1]
The choice of hardware is crucial to these problems. Different cameras and lenses
are optimized for different imaging scenarios. However, software can also make or
break a solution. On the software side, we will focus on the efficient use of OpenCV.
Fortunately, OpenCV's videoio module supports many classes of camera systems,
including the following:
Mac: QuickTime
Chapter 1
Other depth cameras via the proprietary Intel Perceptual Computing SDK
Photo cameras via libgphoto2, which is open source under the GPL license.
For a list of libgphoto2's supported cameras, see http://gphoto.org/proj/
libgphoto2/support.php.
Note that the GPL license is not appropriate for use in closed
source software.
XIMEA API
The videoio module is new in OpenCV 3. Previously, in OpenCV
2, video capture and recording were part of the highgui module,
but in OpenCV 3, the highgui module is only responsible for GUI
functionality. For a complete index of OpenCV's modules, see the
official documentation at http://docs.opencv.org/3.0.0/.
However, we are not limited to the features of the videoio module; we can use
other APIs to configure cameras and capture images. If an API can capture an array
of image data, OpenCV can readily use the data, often without any copy operation
or conversion. As an example, we will capture and use images from depth cameras
via OpenNI2 (without the videoio module) and from industrial cameras via the
FlyCapture SDK by Point Grey Research (PGR).
[3]
We will learn about the differences among categories of cameras, and we will test the
capabilities of several specific lenses, cameras, and configurations. By the end of the
chapter, you will be better qualified to design either consumer-grade or industrialgrade vision systems for yourself, your lab, your company, or your clients. I hope to
surprise you with the results that are possible at each price point!
Radio waves radiate from certain astronomical objects and from lightning.
They are also generated by wireless electronics (radio, Wi-Fi, Bluetooth,
and so on).
Microwaves radiated from the Big Bang and are present throughout
the Universe as background radiation. They are also generated by
microwave ovens.
Far infrared (FIR) light is an invisible glow from warm or hot things such
as warm-blooded animals and hot-water pipes.
Near infrared (NIR) light radiates brightly from our sun, from flames, and
from metal that is red-hot or nearly red-hot. However, it is a relatively weak
component in commonplace electric lighting. Leaves and other vegetation
brightly reflect NIR light. Skin and certain fabrics are slightly transparent
to NIR.
[4]
Chapter 1
Visible light radiates brightly from our sun and from commonplace
electric light sources. Visible light includes the colors red, orange, yellow,
green, blue, and violet (in order of decreasing wavelength).
[5]
[6]
Chapter 1
As we see in this example, a simplistic model (an RGB pixel) might hide
important details about the way data are captured and stored. To build efficient
image pipelines, we need to think about not just pixels, but also channels and
macropixelsneighborhoods of pixels that share some channels of data and are
captured, stored, and processed in one block. Let's consider three categories of
image formats:
[7]
An image from a monochrome camera can be efficiently stored and processed in its
raw format or (if it must integrate seamlessly into a color imaging pipeline) as the Y
plane in a planar YUV format. Later in this chapter, in the sections Supercharging the
PlayStation Eye and Supercharging the GS3-U3-23S6M-C and other Point Grey Research
cameras, we will discuss code samples that demonstrate efficient handling of various
image formats.
Until now, we have covered a brief taxonomy of light, radiation, and colortheir
sources, their interaction with optics and sensors, and their representation as
channels and neighborhoods. Now, let's explore some more dimensions of image
capture: time and space.
[8]
Chapter 1
[9]
Field of view (FOV) is the extent of the lens's vision. Typically, FOV is measured as
an angle, but it can be measured as the distance between two peripherally observable
points at a given depth from the lens. For example, a FOV of 90 degrees may also be
expressed as a FOV of 2m at a depth of 1m or a FOV of 4m at a depth of 2m. Where
not otherwise specified, FOV usually means diagonal FOV (the diagonal of the lens's
vision), as opposed to horizontal FOV or vertical FOV. A longer lens has a narrower
FOV. Typically, a longer lens also has higher resolution and less distortion. If our
subject falls outside the FOV, we miss the subject completely! Toward the edges of
the FOV, resolution tends to decrease and distortion tends to increase, so preferably
the FOV should be wide enough to leave a margin around the subject.
The camera's throughput is the rate at which it captures image data. For many
computer vision applications, a visual event might start and end in a fleeting
moment and if the throughput is low, we might miss the moment completely or
our image of it might suffer from motion blur. Typically, throughput is measured
in frames per second (FPS), though measuring it as a bitrate can be useful, too.
Throughput is limited by the following factors:
Shutter speed (exposure time): For a well-exposed image, the shutter speed
is limited by lighting conditions, the lens's aperture setting, and the camera's
ISO speed setting. (Conversely, a slower shutter speed allows for a narrower
aperture setting or slower ISO speed.) We will discuss aperture settings after
this list.
The type of shutter: A global shutter synchronizes the capture across all
photosites. A rolling shutter does not; rather, the capture is sequential such
that photosites at the bottom of the sensor register their signals later than
photosites at the top. A rolling shutter is inferior because it can make an
object appear skewed when the object or the camera moves rapidly. (This is
sometimes called the "Jell-O effect" because of the video's resemblance to a
wobbling mound of gelatin.) Also, under rapidly flickering lighting, a rolling
shutter creates light and dark bands in the image. If the start of the capture is
synchronized but the end is not, the shutter is referred to as a rolling shutter
with global reset.
The interface between the camera and host computer: Common camera
interfaces, in order of decreasing bit rates, include CoaXPress full, Camera
Link full, USB 3.0, CoaXPress base, Camera Link base, Gigabit Ethernet,
IEEE 1394b (FireWire full), USB 2.0, and IEEE 1394 (FireWire base).
[ 10 ]
www.PacktPub.com
Stay Connected: