0% found this document useful (0 votes)
486 views

Computer-graphics-Samit Bhattacharya-Department of Computer Science Engineering-Oxford Press-2015 e

This document provides an overview of the field of computer graphics. It discusses how computer graphics is used in word processing interfaces, CAD tools, and scientific/network visualization. Computer graphics allows us to see things that are not possible with the naked eye, like DNA molecules or network traffic. The key components of computer graphics systems are characters, images, menus, and editing tools that make up interfaces.

Uploaded by

Dennis Yartel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
486 views

Computer-graphics-Samit Bhattacharya-Department of Computer Science Engineering-Oxford Press-2015 e

This document provides an overview of the field of computer graphics. It discusses how computer graphics is used in word processing interfaces, CAD tools, and scientific/network visualization. Computer graphics allows us to see things that are not possible with the naked eye, like DNA molecules or network traffic. The key components of computer graphics systems are characters, images, menus, and editing tools that make up interfaces.

Uploaded by

Dennis Yartel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 180

Computer

GRAPHICS
Samit Bhattacharya
Assistant Professor
Department of Computer Science Engineering
IIT Guwahati

1
i i

“FM” — 2015/9/15 — 17:22 — page v — #3


i i

Preface
The term computer graphics roughly refers to the field of study that deals with the display
mechanism (the hardware and software) of a computer. In the early days, for most of us, a
computer meant what we got to see on the monitor. Then came the laptops, in which the
display and the CPU tower (along with the peripheral keyboard unit) were combined into
a single and compact unit for easy portability. The cathode ray tube (CRT) displays were
replaced by the liquid crystal display (LCD) technology. However, the idea of computer was
still restricted to the display screen for a majority of the users.
For the younger generation, the personal computer (PC) is no longer a ‘computer’. It is
replaced by a plethora of devices, although the laptops have managed to retain their charm
and appeal due to portability. These devices come in various shapes and sizes with varying
degrees of functionality. The most popular of these is the ubiquitous smartphone. Although
much smaller in size compared to a PC, smartphones are similar to very powerful PCs of the
yesteryears; with powerful multicore processing units, high-resolution displays, and large
memory. Then we have the tablets (or tabs), which are slightly larger in size (although still
much smaller than a PC), and the fablets, having features of both a phone and a tab. Such
devices also include wearable computers such as the smart watch or the Google glass. Even
the televisions nowadays have many computing elements that has led to the concept of smart
TVs. This is made possible with a rapid change in technology, including display technology.
Instead of the CRT, we now have devices that use LCD, plasma panel, light-emitting diode
(LED), organic light-emitting diode (OLED), thin-film transistor (TFT), and so on, for the
design of display units.
However, regardless of the current state-of-the-art technology in computing is, the idea of
a ‘computer’ is shaped primarily by what we get to ‘see’ on the display unit of a computing
system. Since perception matters the most in the popularity of any system, it is important
for us to know the components of a computing system that give rise to this perception—the
display hardware and the associated software and algorithms. Therefore, it is very important
to learn the various aspects of computer graphics to understand the driving force behind the
massive change in consumer electronics that is sweeping the world at present.

ABOUT THE BOOK


Computer Graphics is a textbook aimed at the undergraduate students of computer science
engineering, information technology, and computer applications. It seeks to provide a thor-
ough understanding of the core concepts of computer graphics in a semester-level course.
The contents of this book are designed for a one-semester course on the subject keeping in
mind the difficulty faced by a first-time learner of the subject. The book aims to help students
in self-learning through illustrative diagrams, examples, and practice exercises.

i i

i i
i i

“Detailed-Contents” — 2015/9/15 — 18:10 — page xiii — #1


i i

Brief Contents
Preface v
Features of the Book x
Detailed Contents xv

1. Overview of Computer Graphics 1


2. Modeling Transformations 50
3. Color Models and Texture Synthesis 94
4. 3D Viewing 109
5. Clipping 130
6. Hidden Surface Removal 148
7. Rendering 167
8. Graphics Hardware and Software 194

Appendix A: Mathematical Background 251


Appendix C: Ray-tracing Method for Surface Rendering 268
Bibliography 275
Index 279
About the Author 285

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 1 — #1


i i

CHAPTER
Overview of
1 Computer
Graphics
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the field of computer graphics and its application areas
• Trace the historical development of the field
• Understand the various components of a computer graphics system
• Have a basic understanding of the display hardware in terms of the cathode ray tube
display technology
• Identify the stages of the image synthesis process

INTRODUCTION
With a computer, we can and usually do lot of things. We create documents and presenta-
tions. For example, consider the screenshot in Fig. 1.1 that was taken during the preparation
of this book with MS WordTM .
Notice the components present in the image. The primary components, of course, are the
(alphanumeric) characters. These characters were entered using a keyboard. While in a doc-
ument, the alphanumeric characters are the most important, there are other equally important
components that are part of any word processing software. In this figure, these components
are the menu options and the editing tool icons on top. Some of these options are shown
as text while the others are shown as images (icons). Thus, we see a mix of characters and
images that constitute the interface of a typical word processing system.
Next, consider Fig. 1.2, which is an interface of a Computer-aided Design (CAD) tool. It
shows the design of some machinery parts on a computer screen, along with some control
buttons on the right-hand side. The part itself is constructed from individual components,
with specified properties (dimension, etc.). An engineer can specify the properties of those
individual components and try to assemble them virtually on the computer screen, to check
if there is any problem in the specifications. This saves time, effort, and cost, as the engineer
does not need to actually develop a physical prototype and perform the specification checks.
This is the advantage of using a CAD tool.
Figure 1.3 shows two instances of visualization, another useful activity done with com-
puters. Figure 1.3(a) shows the visualization of a DNA molecule. It shows that, with the

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 2 — #2


i i

2 Computer Graphics

Fig. 1.1 Screen capture of a page during document preparation using MS Word

Fig. 1.2 A CAD system interface—the right-hand side contains buttons to perform various
engineering tasks

aid of computers, we could see something that is not possible with the naked eye. Such a
type of visualization is called scientific visualization, where we try to visualize things that
occur in nature and that we cannot otherwise see. Figure 1.3(b), on the other hand, shows
an instance of traffic in a computer network. Basically, it shows the status of the network
at that instant, such as the active nodes, the active links, the data flow path, and so on.
As you can see, this figure shows something, which is not natural (i.e., it shows informa-
tion about some man-made entity). Visualization of this latter kind is known as information
visualization.
Each of the aforementioned points is basically an example of the usage of computer
graphics. The spectrum of such applications is very wide. In fact, it is difficult to list

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 3 — #3


i i

Overview of Computer Graphics 3

(a)

(b)

Fig. 1.3 Two examples of visualization (a) Visualization of a DNA molecule (b) Network
visualization

all applications as virtually everything that we see around us involving computers con-
tains some applications of computer graphics. Apart from the examples we saw and the
typical desktop/laptop/tablet/palmtop applications that we traditionally refer to as com-
puters, computer graphics techniques are used in the mobile phones we use, information
kiosks at popular spots such as airports, ATMs, large displays at open air music con-
certs, air traffic control panels, the latest movies to hit the halls, and so on. The appli-
cations are so diverse and widespread that it is no exaggeration to say that for a lay-
man in this digital age, the term computer graphics has become synonymous with the
computer.
In all these examples, we see instances of images displayed on a computer screen. These
images are constructed with objects, which are basically geometric shapes (characters and
icons) with colors assigned to them. When we write a document, we are dealing with letters,
numbers, punctuations, and symbols. Each of these is an object, which is rendered on the
screen with a different style and size. In case of drawing, we deal with basic shapes such

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 4 — #4


i i

4 Computer Graphics

as circles, rectangles, curves, and so on. For animation videos or computer games, we are
dealing with virtual characters, which may or may not be human-like. The images or parts
thereof can be manipulated (interacted with) by a user with input devices such as mouse,
keyboard, joystick, and so on.
The question is how can a computer do all these stuff? We know that computers under-
stand only the binary language, that is, the language of 0s and 1s. Letters of an alpha-
bet, numbers, symbols, or characters are definitely not strings of 0s or 1s, or are they?
How we can represent such objects in a language understood by computers so that those
can be processed by the computer. How can we map from the computer’s language to
something that we can perceive (with physical properties such as shape, size, color)? In
other words, how we can create or represent, synthesize, and render imagery on a com-
puter display? This is the fundamental question that is studied in the field of computer
graphics.
This fundamental question can further be broken down into a set of four basic questions.
1. Imagery is constructed from its constituent parts. How to represent those parts?
2. How to synthesize the constituent parts to form a complete realistic imagery?
3. How to allow the users to manipulate the imagery constituents on-screen?
4. How to create the impression of motion?
Computer graphics seeks the answer to these questions. A couple of things are impor-
tant here. First, the term computer screen here is used in a very broad sense and encom-
passes all sorts of displays including small displays on the mobile devices such as smart
phones, tablets, etc., interactive white boards, interactive table tops, as well as large dis-
plays such as display walls. Obviously, these variations in displays indicate corresponding
variations in the underlying computing platforms. The second issue is, computer graph-
ics seeks efficient solutions to these questions. As the computing platforms vary, the
term efficiency refers to ways that make or try to make optimal resource usage for a
given platform. For example, displaying something on mobile phone screens requires tech-
niques that are different from displaying something on a desktop. This is because of the
differences in CPU speed, memory capacity, and power consumption issues in the two
platforms.
Let us delve a little deeper. In the early era of computers, displays constituted a termi-
nal unit capable of showing only characters. In subsequent developments, ability to show
complex 2D images was introduced. However, with the advent of technology, the memory
capacity and processor speeds of computing systems have increased greatly. Along with that,
the display technology has also improved significantly. Consequently, our ability to display
complex processes such as 3D animation in a realistic way has improved to a great extent.
There are two aspects of a 3D animation. One is to synthesize frames, the other is to com-
bine them and render in a way to generate the effects of motion. Both these are complex and
resource-intensive tasks, which are the main areas of activities in the present-day computer
graphics.
Thus, computer graphics can be described in brief as the process of rendering static
images or animation (sequence of images) on computer screens in an efficient way. In the
subsequent chapter, we shall look into the details of this process.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 5 — #5


i i

Overview of Computer Graphics 5

1.1 HISTORICAL DEVELOPMENT OF THE FIELD


The evolution of the field of computer graphics is intricately linked to the evolution
of the computer itself. We cannot separately describe the history of computer graph-
ics without mentioning the developments that took place in shaping up the concept and
technology of the present-day computers. However, we shall restrict ourselves to a con-
cise account of the development of the field, mentioning only the major milestones in
the process.
The term computer graphics was first coined by William Fetter of Boeing Corp. in
1960. Sylvan Chasen of Lockheed Corp. in 1981 proposed phase-wise classification of the
development of the field. He mentioned four distinct phases in the development process.
1. Conception to birth or the gestational period (1950–1963)
2. Childhood (1964–1970)
3. Adolescence (1970–1981)
4. Adulthood (1981–)
Conception to birth The gestational period clearly coincides with the early developmental
phase of the computing technology itself. What we take for granted (interactive graphi-
cal user interphase—GUI) in any computing system today, was not present even in the
imagination of people back then. An early system from this phase, which demonstrated
the power of computer graphics, was the SAGE (Semi Automatic Ground Environment)
air defense system of the US Air Force (see Fig. 1.4). The system was a product of the
Whirlwind project (started in 1945 at the MIT, USA). In this project, the system received
positional data related to an aircraft from a radar station. This radar data was shown as an
aircraft on a CRT screen, superimposed on a geographical region drawn on the screen. An
input device, called a light gun or light pen, was used by the operators to request iden-
tification information about the aircraft (see Fig. 1.4). When the light gun was pointed
at the symbol for the plane on the screen, an event was sent to the Whirlwind, which
then sent text about the plane’s identification, speed, and direction to be displayed on
the screen.
Although the SAGE (Whirlwind) system demonstrated traces of interactive graphics (with
the use of light pens to provide input to the system), the true potential of interactive computer
graphics caught people’s attention after the development of the Sketchpad by Ivan Suther-
land in 1963, as part of his doctoral dissertation at MIT. The Sketchpad used the light pen

Fig. 1.4 SAGE system with light pen


Source: https://design.osu.edu/carlson/history/lesson2.html

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 6 — #6


i i

6 Computer Graphics

to create engineering drawings directly on the CRT screen (see Fig. 1.5). Precise drawings
could be created, manipulated, duplicated, and stored. The Sketchpad was the first GUI long
before the term was coined and pioneered several concepts of graphical computing, includ-
ing memory structures to store objects, rubber-banding of lines, the ability to zoom in and
out on the display, and the ability to make perfect lines, corners, and joints. This achievement
made Sutherland to be acknowledged by many as the grandfather of interactive computer
graphics.
In addition to the SAGE and the Sketchpad systems, the gestational period saw devel-
opment of many other influential systems such as the first computer game (Spaceware
developed by Steve Russell and team in 1961 on a PDP-I platform) and the first CAD sys-
tem (DAC-1 by IBM, formally demonstrated in 1964 though the work started in 1959, see
Fig. 1.6).
Adolescence In 1971, Intel released the first commercial microprocessor (the 4004). This
brought in a paradigm shift in the way computers are made, having profound impact on
the advent of the field of computer graphics. In addition, the adolescence period (1970–
1981) saw both the developments of important techniques for realistic and 3D graphics as
well as several applications of the nascent field, particularly in the field of entertainment

Fig. 1.5 The use of the sketchpad software with a light pen to create precise drawings
Source: https://design.osu.edu/carlson/history/lesson3.html

Fig. 1.6 The first CAD system by IBM (called DAC-1)—DAC stands for Design Augmented
by Computer

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 7 — #7


i i

Overview of Computer Graphics 7

and movie making, which helped to popularize the field. Notable developments during this
period include the works on lighting model, texture and bump mapping, and ray tracing.
This period also saw the making of movies such as the Westworld (1973, first movie to
make use of computer graphics) and Star Wars (1977). The worldwide success of Star Wars
demonstrated the potential of computer graphics.
Adulthood The field entered its adulthood period (1981 onwards) standing on the plat-
form created by these pioneering works and early successes. The year saw the release of
the IBM PC by IBM, which helped computers to proliferate among the masses. In order
to cater to this new and emerging market, the importance of computer graphics was felt
more intensely. The focus was shifted from graphics for expert to graphics for laymen. This
shift in focus accelerated works for developing new interfaces and interaction techniques
and eventually gave rise to a new field of study: the human-computer interaction or HCI
in short.
The development of software and hardware related to computer graphics has become
a self-sustaining cycle now. As more and more user-friendly systems emerge, they create
more and more interest among people. This in turn brings in new enthusiasm and invest-
ments on innovative systems. The cycle is certainly helped by the huge advancements in
processor technology (from CPU to GPU), storage (from MB to TB), and display (CRT to
touch screen and situated walls) technology. The technological advancements have brought
in a paradigm shift in the field. It is now possible to develop algorithms to generate photo-
realistic 3D graphics in real time. Consequently the appeal and application of computer
graphics have increased manifold. The presence of all these factors implies that the field is
growing rapidly and will continue to grow in the foreseeable future. Table 1.1 highlights the
major developments in computer graphics.
Table 1.1 Major developments in computer graphics

Phase Period Major developments

Gestational period 1950–1963 The SAGE system


The Sketchpad system
Spaceware—the first computer game
First CAD system

Childhood 1964–1970 Mainly consolidation of the earlier ideas

Adolescence 1971–1980 First commercial microprocessor by Intel Corp.


Development of techniques for 3D realistic
graphics (lighting models, bump mapping, ray
tracing)
Application of computer graphics to movie
making (Westworld, Star Wars)

Adulthood 1981–Present Release of the first personal computer (IBM


PC) in 1981

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 8 — #8


i i

8 Computer Graphics

1.2 MAJOR ISSUES AND CONCERNS IN COMPUTER GRAPHICS


In the formative stages of the field, primary concern was generation of 2D scenes. Although
still in use, 2D graphics is no longer the thrust area. Its place has been taken over by
3D graphics and animations. Consequently, there are three primary concerns in computer
graphics today.
Modeling Creating and representing the geometry of objects in the 3D world
Rendering Creating and displaying a 2D image of the 3D objects
Animation Describing how the image changes over time
Modeling not only deals with modeling of solid geometric objects but also modeling
of phenomena such as smoke, rain, and fire. Rendering deals with displaying the modeled
objects/scenes on the screen. The issues here are many and includes color and illumination
(simulating optics), determination of visible surfaces (with respect to a viewer position),
introducing texture patterns on object surfaces (to mimic realism), transforming from 3D
description to 2D image and so on. Imparting motion on the objects to simulate movements
is dealt with in animation. Modeling of motion and interaction between objects are the key
concerns here.
Hardware-related issues While these are primarily issues that are addressed in graphics
software, there are many hardware-related issues such as the following:
1. The quality and cost of the display technology, with the trade-off between the two being
an important consideration
2. The selection of the appropriate interaction devices (what type of input devices should
we have to make the interaction intuitive)
3. Design of specialized graphic devices to speed up the rendering process
In fact, many graphic algorithms are nowadays executed in hardware for better per-
formance. How to design such hardware at an affordable cost is the primary concern
here.
In subsequent chapters of this book, we shall explore how the issues are addressed.
However, we shall restrict ourselves primarily to the discussion of the issues addressed
in software. Nevertheless, in the next section, we shall briefly introduce the basic
architecture of a graphic system, which will aid in the understanding of further
discussions.

1.3 PRELIMINARIES: BASICS OF GRAPHICS SYSTEM


In computer graphics, what do we do? We synthesize a 2D image that can be displayed on
a screen. Figure 1.7 shows the schematic of a generic system architecture that is followed
by modern-day graphics system. In the figure, the task of image generation is relegated to a
separate system called the display controller, which takes its input from the CPU (host com-
puter in the figure) as well as external input devices (mouse, keyboard, etc.). The generated
image is stored in digital form in a video memory, which is a (dedicated) part of the memory
hierarchy of the system. The stored image is taken as input by the video controller, which

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 9 — #9


i i

Overview of Computer Graphics 9

Host computer

Display commands Interaction data


Input
Display devices
controller

Video Video
memory controller

Display screen

Fig. 1.7 Generic architecture of a graphics system

converts the digital image to analog voltages that drive electro-mechanical arrangements,
which ultimately render the image on the screen.
Let us delve a little deeper to understand the working of the graphics system. The
process of generating an image for rendering is a multi-stage process, involving lots of
computation (we shall discuss these computations in subsequent parts of the book). If
all these computations are to be carried out by the CPU, then the CPU may get very
less time for doing other computational tasks. As a result, the system cannot do much
except graphics. In order to avoid such situations and increase system efficiency, the
task of rendering is usually carried out by a dedicated component of the system (the

What is a frame buffer? Why it is needed?


The video memory depicted in Fig. 1.7 for a video controller, the resulting image can get
raster scan system is more generally known distorted as the next input may come before
as the frame buffer. The buffer contains one the video controller finishes processing the cur-
location corresponding to each pixel location. rent input. To synchronize their operation, the
Thus the size is equal to the resolution of the frame buffer is used. In fact, if we use a sin-
screen. As we mentioned, the display processor gle frame buffer, the synchronization problem
in the display controller performs the compu- may occur. Consequently, at least two frame
tations and the video controller performs the buffers are used in practice (called double buffer-
actual rendering. The display processor works ing), called primary and secondary buffers. The
at the speed of the CPU (nanosecond scale). video controller takes input from the primary
However, the video controller is typically an buffer. During this process, the display con-
electro-mechanical arrangement, which is much troller fills up the secondary buffer. Once the
slower (millisecond scale). Consequently, there secondary buffer is filled up, it is now desig-
is a mismatch between the speeds of these nated as primary and the primary becomes
two components. If the output of the display secondary (role reversal) and the process
controller is fed directly as the input of the repeats.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 10 — #10


i i

10 Computer Graphics

graphics card in our computers) having its own processing unit (called GPU or graphics
processing unit). The CPU, when encountering the task of displaying something, simply
assigns the task to this separate graphics unit, which is termed as the display controller
in Fig. 1.7.
Thus the display controller generates the image to be displayed on the screen. The gen-
erated image is in digital format (strings of 0s and 1s). The place where it is stored is the
video memory, which in modern systems is part of the separate graphics unit (the VRAM
in the graphic card). The display screen, however, contains picture elements or pixels (such
as phosphor dots and gas-filled cells). The pixels are arranged in the form of a grid. When
these pixels are excited by electrical means, they emit lights with specific intensities, which
give us the sensation of the colored image on the screen. The mechanism for exciting pixels
is the responsibility of the video controller, which takes as input the digital image stored in
the memory and activates suitable electro-mechanical mechanism such that the pixels can
emit light.

Graphic Devices
Graphic devices can be divided into two broad groups, based on the method used for
rendering (or excitation of pixels): (a) vector scan devices and (b) raster scan devices.
Vector scan devices In vector scan (also known as random-scan stroke-writing, or cal-
ligraphic), an image is viewed as composed of continuous geometric primitives such as
lines and curves. Clearly, this is what most of us intuitively think about images. From
the system’s perspective, the image is rendered by rendering these primitives. In other
words, a vector scan device excites only those pixels of the grid that are part of these
primitives.
Raster scan devices In contrast, in raster scan devices, the image is viewed as repre-
sented by the whole pixel grid. In order to render a raster image, it is therefore necessary
to consider all the pixels. This is achieved by considering the pixels in sequence (typically
left to right, top to bottom). In other words, the video controller starts with the top-left
pixel. It checks if the pixel needs to be excited. Accordingly, it excites the pixel or leaves
it unchanged. It then moves to the next pixel on the right and repeats the steps. It con-
tinues till it reaches the last pixel in the row. Afterward, the controller considers the first
pixel in the next (below the current) row and repeats the steps. This continues till the
right-bottom pixel of the grid. The process of such sequential consideration of the pixel

What are vector and raster graphics?


There are two terms closely related the vec- the other hand, images represented in terms of
tor and raster scan methods, namely the vector a pixel grid are known as raster graphics. Note
graphics and the raster graphics. These terms that they indicate only the image representation
are used to indicate the nature of image rep- and not the underlying hardware rendering pro-
resentation. When an image is represented in cess, unlike the vector and random scan. Thus, we
terms of continuous geometric primitives such as can use a raster scan method to render a vector
lines and curves, we call it vector graphics. On graphics.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 11 — #11


i i

Overview of Computer Graphics 11

Comparison between vector and raster graphics


As Fig. 1.8 demonstrates, in vector graphics costlier than raster devices. Also, due to selec-
we need to excite only a subset of the whole tive exciting, vector graphics is good for render-
pixel grid. The problem is, designing a mecha- ing wireframe (i.e., outline) images. For complex
nism to selectively excite pixels on a pixel grid scenes, flicker is visible as the rendering mech-
requires high precision and complex hardware. anism has to make lots of random movements.
As opposed to this, the scanning-based render- Raster systems do not have these problems. For
ing in raster devices requires simpler hardware to these reasons,the displays that we use are mostly
implement. Consequently, the vector devices are raster systems.

grid is known as scanning. Each row in the pixel grid in a scanning system is called a
scan line.
This difference between the two rendering methods is illustrated in Fig. 1.8.
Refreshing An important related concept is refreshing. The lights emitted from the pixel
elements, after excitation, starts decaying over time. As a result, the scene looks faded
on the screen. Also, since the pixels in a scene get excited at different points of time,
they do not fade in sync. Consequently, the scene looks distorted. In order to avoid such
undesirable effects, what is done is to keep on exciting the pixels periodically. Such peri-
odic excitation of the same pixels is known as refreshing. The number of times a scene
is refreshed per second is called the refresh rate, expressed in Hz (Hertz, the frequency
unit). Typical refresh rate required to give a user the perception of static image is at
least 60 Hz.
So far, we have discussed the broad concepts of the display system, namely the video
controller, the pixel grid, and the raster and vector displays. The discussion was very generic
and applies to any graphic system. Let us understand these generic concepts in terms of an
actual system, namely the cathode ray tube (CRT) displays. Although CRTs are no longer in
wide use, discussion on CRT serves pedagogical purpose as it enables us to discuss all the
relevant concepts.

(a) (b) (c)

Fig. 1.8 Difference between vector and raster scan devices (a) Image to be rendered on the
pixel grid (b) Vector scan method—only those pixels through which the line passes are excited
(black circles) (c) Raster scan method—all pixels are considered during scanning—white cir-
cles show the pixels not excited; black circles show excited pixels; arrows show scanning
direction

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 12 — #12


i i

12 Computer Graphics

Differentiate between refresh rate and frame rate. Does higher frame rate ensure
better image quality?
The frame rate of a computer is how often a video No, frame rate is not a measure of how good the
processing device can calculate the intensity val- display is. Too high a frame rate is not beneficial
ues for the next frame to be displayed, that is, as any frames sent for rendering above the dis-
the rate at which the frame buffer is filled with play’s refresh rate are not rendered and simply
new intensity values. Refresh rate refers to how lost. Thus it is important to have a display capa-
often the display device can actually render the ble of high refresh rates in order to synchronize
image. the two.

1.3.1 CRT Displays


The CRT technology, invented in 1897 ruled the computer displays till recently. We are all
familiar with it. A typical CRT display is shown in Fig. 1.9, with a schematic of the working
of the technology. As the figure shows, CRT displays contain a long funnel-shaped vacuum
tube (that gives them the bulky shape). At one end of the tube (the back side) is a compo-
nent known as the electron gun, which is a heated metal cathode. The heating is achieved
by passing current through a filament inside the cathode. When heated, the metallic cath-
ode surface generates negatively charged electrons. Through the use of controlled deflection
mechanism (the first and second anodes in the figure with high positive voltages), these
electrons are focused into a narrow electron beam (the cathode ray). The front side of the
tube (the screen) is coated with phosphor dots (in a grid form). The electron beams are
guided towards specific points (phosphor dots) on the grid with the use of another deflec-
tion mechanism. In the deflection mechanism, the electron beam is deflected horizontally or
vertically using electrical or magnetic fields created by the vertical or horizontal deflection
plates shown in the figure. Once the beam strikes a phosphor dot, the dot emits photons
(light) with intensity proportional to that of the beam. The light emission by phosphor dots
gives us the impression of image on the screen.

Graphite
Vertical coating
First deflecting
Heater anode plates

Cathode Second
anode
Control Horizontal Screen
grid deflecting (Phosphor
plates coating inside)

(a) (b)

Fig. 1.9 (a) Typical CRT (b) Schematic representation of inner working of CRT

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 13 — #13


i i

Overview of Computer Graphics 13

In summary, therefore, in a CRT, the electron gun generates cathode rays, which hits
the phosphor dots to emit light, eventually giving us the sensation of image. Now, let us
go back to the terms we introduced in the previous section and try to understand those
concepts in light of the CRT displays. The video controller of Fig. 1.7 is the system
responsible for generating requisite voltage/fields to generate cathode rays of appropriate
intensity and guide the rays to hit specific phosphor dots (pixels) on the screen. In vec-
tor displays, the video controller guides the electron gun to only the pixels of interest. In
case of raster systems, the electron gun actually moves in a raster sequence (left to right,
top to bottom).
What we have discussed so far assumes that each pixel is a single phosphor dot. By vary-
ing the intensity of the cathode ray, we can generate light of different intensities. This gives
us images having different shades of gray at the most. In the case of color displays, the
arrangement is a little different. We now have three electron guns instead of one. Similarly,
each pixel position is composed of three phosphor dots, each corresponding to the red (R),

How is the resolution of a CRT screen determined?


As mentioned before, the phosphor dots (pix- The intensity gradually falls off as we move away
els) are arranged in the form of a grid on the from the center. This change in intensity follows
screen. This grid is typically referred to as resolu- a Gaussian distribution. Now, when two pixels are
tion of the screen and expressed as C × R (C = placed side by side, the lights emitted by each of
number of columns, R = number of rows). How them will be distinguishable only if the overlap of
do we determine the resolution? This is deter- their intensity distributions is outside of 60% of
mined on the basis of the properties of the pixels. peak intensity distribution of each. Otherwise, we
The intensity of the light emitted by a phosphor will not be able to distinguish between the spots
dot, after it is hit by the cathode ray, follows a clearly. The resolution is determined based on this
certain distribution around the point of impact principle. The idea is illustrated in the following
(typically assumed to be the center of the pixel). figure.

Peak Peak
intensity intensity

Pixel Pixel
center center

60% of peak intensity

Illustration of emitted intensity distribution around pixels in


a CRT device

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 14 — #14


i i

14 Computer Graphics

green (G), and blue (B) colors. As you know, these three are called the primary colors.
Any color can be generated by mixing these three in appropriate quantities. The same pro-
cess is simulated here. Each electron gun corresponds to one of the R, G, and B dots.
By controlling electron gun intensities separately, we can generate different shades of R,
G, and B for each pixel, resulting in a new color. Figure 1.10 shows the schematic of
the process.
There are two ways computer graphics with color displays are implemented. In the first
method, the individual color information for each of the R, G, and B components of a pixel
are stored directly in the corresponding location of the frame buffer. This method is called
direct coding. Although the video controller gets the necessary information from the frame
buffer directly to drive the electron guns, the method requires a large frame buffer to store
all possible color values (that means all possible combinations of the RGB values. This set is
also known as color gamut). An alternative scheme, used primarily during the early periods
of computers when the memory was not so cheap, makes use of a color look-up table (CLT).
In this scheme, a separate look-up table (a portion of memory) is used. Each entry (row) of
the table contains a specific RGB combination. There are N such combinations (entries) in
the table. The frame buffer location in this scheme does not contain the color itself. Instead,
a pointer to the appropriate entry in the table that contains the required color is stored. The
scheme is illustrated in Fig. 1.11.
Note that the scheme is based on a premise: we only require a small fraction of the whole
color gamut in practice and we, as designers, know those colors. If this assumption is not
valid (i.e., we want to generate high quality images with large number of colors), then the
CLT method will not be of much use.

Picture tube

Electron guns

Electron beams
Color signals
Electron beams Shadow mask

Screen
Phosphor dots

Fig. 1.10 Schematic of a color CRT. The three beams generated by the three electron guns
are passed through a shadow mask—a metallic plate with microscopic holes that direct the
beams to the three phosphor dots of a pixel.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 15 — #15


i i

Overview of Computer Graphics 15

Color R G B
Frame buffer index
0 0 0 0
0 0 0 0 0 0
0 7 7 7 6 1 102 255 53
0 7 7 7
2 255 255 204
0 0 0
0 0 4 255 102 153
0 ...

7 102 0 51

Fig. 1.11 Schematic of color look-up scheme

1.4 GRAPHICS PIPELINE: STAGES OF RENDERING PROCESS


In the preceding discussion, we were talking about the color values stored in the frame
buffer. How are these values obtained? As we mentioned, the display processor computes
these values in stages. These stages together are known as the graphics pipeline.
Object representation At the very beginning, we need to define the objects that will be
part of the image that we see on the display. There are several object representation tech-
niques available to cater to the task of efficient creation and manipulation of images (scenes).
Modeling/Transformation The objects are defined in their own (local) coordinate system.
We need to put them together to construct the image, which is having its own coordinate
system (known as the world coordinate). This process, of putting individual objects into a
scene, is known as the modeling/geometric transformations, which is the second stage of the
pipeline.
Lighting Once the (3D) scene is constructed, the objects need to be assigned colors, which
is our third stage. Color is a psychological phenomena linked to the way light behaves (i.e.,
the laws of optics). Thus, in order to assign color to our scene, we need to implement methods
that mimic the optical laws.

Highlight the advantage of color look-up scheme over direct coding with an example.
Suppose each of the R, G, and B is represented shall use only 256 colors (combinations of R, G
with 8 bits. That means we can have between 0 and B).We shall keep these 256 colors in the look-
and 255 different shades for each of these pri- up table. Thus the size of the table is 256 with
mary colors. Consequently, the total number of each row having 24 bits color value. Each of the
colors (the color gamut) that we can generate is table location requires 8 bits to access. Therefore,
256 × 256 × 256 = 16 M. to access any table location, we need to have 8
Direct coding: The size of each frame buffer bits in each frame buffer location. What will be our
location is 8 × 3 = 24 bits. Thus, the size of the storage requirement for the 100 × 100 screen? It
frame buffer for a system with resolution 100 × is the frame buffer size + the table size. In other
100 will be 24 × 100 × 100 = 234 Kb. words,the size of the frame buffer will be,(8 × 100
Color look-up scheme: Assume that out of 16 M × 100) + (256 × 24) = 84 Kb, much less than the
possible colors, we know that in our image we 234 Kb required for direct coding.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 16 — #16


i i

16 Computer Graphics

Viewing pipeline/transformation After the aforementioned stages, we have a 3D scene


with appropriate colors. The next task is to map it to 2D device coordinate. This is analogous
of taking a snapshot of a scene with a camera. Mathematically, the snapshot taking involves
several intermediate operations. First, we set-up a camera (also called view) coordinate sys-
tem. Then the world coordinate scene is transferred to the view coordinate system (known
as the viewing transformation). From there, we transfer the scene to the 2D view plane (the
projection transformation).
For projection, we need to define a region in the viewing coordinate space (called view
volume). Objects inside the volume are projected. The objects that lie outside are removed
from the consideration for projection. The process of removing objects outside view volume
is called clipping. Along with clipping, another task is performed at this stage. When we
project, we consider a viewer position. With respect to that position, some objects will be
fully visible, some partially visible while some will be altogether invisible, although all of
the objects are within the same volume. In order to capture this viewing effect, the process
of hidden surface removal (also known as the visible surface detection) is carried out. Once
both clipping and hidden surface removal are performed, the scene is projected on the view
plane.
From the view plane, the 2D projected scene is transferred to a region on the device
coordinate system (called viewport). The process is known as the window-to-viewport
transformation.
These series of transformations (viewing, projection, and viewport) along with the tasks
of clipping and hidden surface removal is sometimes referred to as the viewing pipeline and
constitute the fourth stage of the graphics pipeline.

Scan conversion The device coordinate is a continuous space. However, the display, as
we have seen before, contains a pixel grid, which is a discrete space. Therefore, we need
to transfer the viewport on the (continuous) device coordinate to the (discrete) screen coor-
dinate system. This process is called scan conversion (also called rasterization). An added
concern here is how to minimize distortions (called aliasing effect) that result from the
transformation from continuous to discrete spaces. Anti-aliasing techniques are used during
the scan conversion stage, to minimize such distortions. Scan conversion with anti-aliasing
together forms the last and final stage of the 3D graphics pipeline.
The stages of the pipeline are shown in Fig. 1.12. We shall discuss each of these stages
in subsequent chapters of the book. However, the sequence of the stages mentioned here is
purely theoretical. In practice, the sequence may not be followed strictly. For example, the

How does object space differ from image space?


The pipeline we discussed before is object space, start from pixels (images) and moves backward
as it starts from the object definition culminat- (to object definitions). Such methods are said to
ing in image generation. All intermediate oper- work in the image space. An example is the ray-
ations are done on the objects. On the other tracing method. However, in this book, we shall
hand, there are methods where the operations concentrate on the object space methods only.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 17 — #17


i i

Overview of Computer Graphics 17

Object representation Local coordinate


(First stage)

Local to world
Modeling transformation coordinate
(Second stage)

World coordinate

Lighting
(Third stage)
World to view
coordinate
Viewing
transformation

View coordinate
Clipping

Viewing pipeline View coordinate


Hidden surface
(Fourth stage) removal

3D view to 2D view
Projection coordinates
transformation

Window-to-viewport View to device


transformation coordinate

Scan conversion Device to screen


(Fifth stage) coordinate

Fig. 1.12 Stages of 3D graphics pipeline—Boxes in the second column show the substages
of the fourth stage; the third column shows the coordinate systems in which each stage
operates

assigning of colors (third stage) may be performed after projection to reduce computation.
Similarly, the hidden surface removal may be performed after projection.

1.5 ROLE OF GRAPHIC LIBRARIES


In the preceding discussions, we outlined the theoretical background of computer graphics.
However, a programmer need not always have to implement the stages of the pipeline, in
order to make the system work. Instead, the programmer can use the application program-
ming interfaces (APIs) provided by the graphics libraries to perform the pipeline stages.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 18 — #18


i i

18 Computer Graphics

Examples of graphic libraries include OpenGL (stands for Open source Graphics Library)
and DirectX (by Microsoft).
These APIs are essentially predefined sets of functions, which, when invoked with the
appropriate arguments, perform the specific tasks. Thus, these functions eliminate the need
for the programmer to know every detail of the underlying system (the processor, mem-
ory, and OS) to build a graphics application. For example, the function ‘xyz’ in OpenGL
assigns the color abc to a 3D point. Note that the color assignment does not require the pro-
grammer to know details such as how color is defined in the system, how such information
is stored (which portion of the memory) and accessed, how the operating system manages
the call, which processor (CPU/GPU) handles the task, and so on. Graphics applications
such as painting systems, CAD tools, video games, or animations are developed using these
functions.

SUMMARY
In this chapter, we have touched upon the background required to understand topics discussed
in later chapters. We have briefly seen some of the applications and discussed the history of
the field, along with the current issues. We also got introduced to the generic architecture of
a graphics system and brief overview of important concepts such as display controller, video
controller, frame buffer, pixels, vector and raster devices, CRT displays, and color coding meth-
ods. We shall make use of this information in the rest of the book. The other important concept
we learnt is the graphics pipeline. The rest of the book shall cover the stages of the pipeline. In
the following chapter, we introduce the first stage: the object representation techniques.

BIBLIOGRAPHIC NOTE
There are many online sources that give good introductory idea on computer graphics. The
website https://design.osu.edu/carlson/history/lessons.html includes in-depth discussion, with
illustrative images, of the historical evolution and application areas of computer graphics. The
textbooks on computer graphics by Hearn and Baker [2004] and Foley et al. [1995] also con-
tain comprehensive introduction to the field. There is a rich literature on application of computer
graphics techniques to various domains as diverse as aircraft design Bouquet [1978], energy
exploration Gardner and Nelson [1983], scientific data visualization Hearn and Baker [1991]
and visualization of music Mitroo et al. [1979]. A good source to learn about the past works,
current trends, and research direction in the field are the issues of the well-known journals and
conference proceedings in the field. Links to various bibliographics resources can be found in
http://www.cs.rit.edu/~ncs/graphics.html. Also see the Bibliographic Note section of Chapter 11
for more references on graphics hardware.

KEY TERMS
Color look-up scheme – color management scheme used in computer graphics in which the
color information is stored in a separate table.
CRT – stands for the cathode ray tube, a very popular technology till the recent times to design
display screens
Direct coding – color management scheme used in computer graphics in which the color
information is stored in the frame buffer itself

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 19 — #19


i i

Overview of Computer Graphics 19

Display controller – generic name given to the component of a graphics system that converts
abstract object definitions to bit strings
Electron gun – hardware to excite pixels on a CRT screen
Frame buffer – video memory of raster scan systems
Frame rate – rate (bits/second) at which the frame buffer is filled up
Graphics pipeline – set of stages in sequence that are used to synthesize an image from object
definitions
Input device – hardware for interacting with screen
Pixel – a picture element or each point on the grid used to design display screens
Refresh rate – rate (times/second) at which the screen is redrawn
Refreshing – process by which the screen is redrawn periodically
SAGE system – the first computer graphics system
Sketchpad – the first interactive computer graphics system
Vector and raster graphics – the two types of image representation techniques used in computer
graphics
Vector and raster scan – the two types of techniques used to render images on the screen
Video controller – hardware used to render the actual on-screen image from the bit strings stored
in video memory
Video memory – memory unit used to store the bit strings that represent a (synthesized) image
Visualization – the techniques to visualize real or abstract objects or events

EXERCISES
1.1 At the beginning of this chapter, we learnt a few areas of application of computer graphics.
As we mentioned, such applications are numerous. Find out at least 10 more applications
of computer graphics (excluding those mentioned at the beginning).
1.2 Suppose you are trying to develop a computer game for your iPhone. Make a list of all the
issues that are involved (Hint: combine the discussions of Sections 1.2 and 1.4).
1.3 In Figure 1.7, the generic architecture of a typical graphics system is shown. As we know,
almost all the electronic devices we see around us contain some amount of graphics.
Therefore, they can be dubbed as graphics systems. In light of the understanding of the
generic architecture, identify the corresponding components (i.e., the display, the two con-
trollers, video memory, I/O mechanism) for the following types of devices (you can select
any one commercial product belonging to each of the categories and try to find out the
name of the underlying technologies).
(a) Digital watch
(b) Smartphone
(c) Tablet
(d) ATM
(e) HD TV
1.4 Explain double buffering. Why do we need it?
1.5 While trying to display a scene on the screen, you observe too much flickering. What may
be the cause for this? How can the flickering be overcome?
1.6 In some early raster displays, refreshing was performed at 30 Hz, which was half that of
the minimum refresh rate required to avoid flicker. This was done due to technological lim-
itations as well as to reduce system cost. In order to avoid flickers in such systems, the
scan lines were divided into odd set (1, 3, 5 · · · ) and even set (2, 4, 6 · · · ). In each cycle,
only one set of lines was scanned. For example, first scan odd set, then even set, then odd
set, and so on. This method is called interlacing. List with proper explanation all the factors
that determined if the method would work.

i i

i i
i i

“Chapter-1” — 2015/9/15 — 9:11 — page 20 — #20


i i

20 Computer Graphics

1.7 Assume you are playing an online multiplayer game on a system having a 100 × 100 color
display with 24 bits for each color. A double buffering technique is used for rendering. Your
system is receving 5 MBps of data from the server over the network. Assuming no data
loss, will you experience flicker?
1.8 What is the main assumption behind the working of the color look-up table? Suppose you
have a graphic system with a display resolution of 64 × 64. The color look-up table has
16 entries. Calculate the percentage saving of space due to the use of the table. Assume
colors are represented by 24 bits.
1.9 Discuss why vector graphics is good for drawing wire frames but not filled objects. Suppose
that you are a gaming enthusiast. Will you prefer a vector-based system or a raster-based
system?
1.10 In a raster scan system, the scanning process starts (from top-left pixel) with the application
of a vertical sync pulse (Vp). The time between the excitation of the last (right-most) pixel
in the current scan line and that of the first (left-most) pixel of the next scan line is known
as the horizontal retrace time (HT). Scanning for the next line starts with the application of
a horizontal sync pulse (Hp). The time it takes to reset the scanning process for the next
frame (i.e. the time gap between the excitation of the bottom-right pixel of the current frame
and the top-left pixel of the next frame) is known as the vertical retrace time (VT). Calculate
the desirable value of M for a CRT raster device having resolution M × 100 with HT = 5 s,
VT = 500 s and 1 s electron gun movement time between two pixels along a scan line.
1.11 Assume you have a raster device with screen resolution = 100 × 100. It is a color display
with 9 bits/pixel. What should be the access rate (i.e., time required to access each bit) of
the video memory to avoid flicker in this system?
1.12 Consider the two objects shown in Fig. 1.13(a).
We want to render a scene in which the bar (the right object in (a)) is inside the hole of
the cube (left object in (a)), as shown in (b). Discuss the tasks performed in each pipeline
stage for rendering the scene.

(a) (b)

Fig. 1.13 The object and the scene of Exercise 1.12

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 50 — #1


i i

CHAPTER
Modeling
3 Transformations

Learning Objectives
After going through this chapter, the students will be able to
• Get an idea of modeling transformations
• Learn about the four basic modeling transformations—translation, rotation, shearing, and
scaling—in both two and three dimensions
• Understand the homogeneous coordinate system used for representing modeling trans-
formations
• Have a basic understanding of the matrix representations of modeling transformations
• Learn and derive composite transformations from the basic transformations through
matrix multiplications, both in two and three dimensions

INTRODUCTION
We have come across different methods and techniques to represent objects in Chapter 2.
These techniques, however, allow us to represent objects individually, in what is called
local/object coordinates. In order to compose a scene, the objects need to be assembled
together in the so called scene/world coordinate system. That means, at the time of defin-
ing the objects, the shape, size, and position of the object is not important. However, when
individual objects are assembled in a scene, these factors become very important. Conse-
quently, we have to perform operations to transform objects (from its local coordinate to
the scene/world coordinate). The stage of the graphics pipeline in which this transformation
takes place is known as the modeling transformation. Figure 3.1 illustrates the idea.
Thus, modeling transformation effectively implies applying some operations on the object
definition (in local coordinate) to transform them as a component of the world coordinate
scene. There are several such operations possible. However, all these operations can be
derived from the following basic operations (the letter in the parenthesis beside each name
shows the common notation for the transformations).
Translation (T) Translates the object from one position to another position
Rotation (R) Rotates the object by some angle in either the clockwise or anticlockwise
direction around an axis

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 51 — #2


i i

Modeling Transformations 51

Y
Y
Y

X X X
Z Z
Z

(a) (b)

Fig. 3.1 In (a), two objects are defined in their local coordinates. In (b), these objects are
assembled (in world scene coordinate) to compose a complex scene.
Note:In a scene,objects are used multiple times,at different places,and in different sizes.The oper-
ations required to transform objects from their local coordinate to world coordinate are collectively
called modeling transformations.

Scaling (S) Reduces/Increases the size of the object


Shearing (Sh) Changes shape of the object (although, strictly speaking, this is not a basic
transformation as it can be derived from a composition of rotation and scaling, we shall treat
it as basic in this book)
As you can see, these operations change the geometric properties (shape, size, and
location) of the objects. Hence, these are also called geometric transformations.
We shall adopt a different approach in describing the transformations. Although our main
focus is 3D graphics pipeline, we start with the description of 2D transformations and then
show how these are applied in 3D. This is done for simplicity and we shall take this approach
in the later chapters also. In this chapter, we shall learn about the four basic modeling trans-
formations, their representation, and the composition of these basic transformations to derive
new transformations.

3.1 BASIC TRANSFORMATIONS


Among the four basic transformations, translation is the simplest. In order to translate a point
p(x, y) to a new location p′ (x′ , y), we displace the point by an amount tx and ty along the X
and Y directions, respectively, as shown in Fig. 3.2. If the displacement is along the positive
X-axis, it is positive displacement, otherwise the displacement is negative. The same is true
for vertical displacements.
It is clear from the figure that the new point can be obtained by simply adding the dis-
placements to the corresponding coordinate values. Thus, we have the relationships shown
in Eq. 3.1 between the old and new coordinates of the point for translation.

x′ = x + tx (3.1a)

y = y + ty (3.1b)

Unlike translation where linear displacement takes place, rotation involves angular dis-
placement. In other words, the point moves from one position to another on a circular track
about some axis. For simplicity, let us assume that we want to rotate a point around the
Z-axis counterclockwise by an angle φ. The scenario is shown in Fig. 3.3.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 52 — #3


i i

52 Computer Graphics

Y
x′ p′(x′, y′)
ty
x
p(x, y)
y y′
X

tx

Fig. 3.2 Illustration of the translation operation

As the figure shows, the new coordinates can be expressed in terms of the old coordinates
as in Eq. 3.2, where r is the radius of the circular trajectory.

x = r cos θ, y = rsinθ (3.2a)



x = r cos (θ + φ) = r cos θ cos φ − r sin θ sin φ = x cos φ − y sin φ (3.2b)
y′ = r sin (θ + φ) = r sin θ sin φ + r cos θ cos φ = x sin φ + y cos φ (3.2c)

By convention, counterclockwise angular movements are taken as positive, as in the


aforementioned derivation, whereas clockwise movement is taken as negative. Thus, in
case of clockwise movement, the displacement angle (φ) in Eq. 3.2 should be replaced
with (−φ).
In both the transformations (translation and rotation), we have shown the transformations
applied on a point. For an object, we shall simply apply the operation on the points that
make up the surfaces of the object (e.g., apply operation on all the points on the vertex list
for a regular object or on the control points for an object defined with spline surfaces, etc.).
Note that, in this way we can change the orientation of an object by applying rotation on the
surface points.
With translation and rotation, we can change the position and orientation of objects.
Scaling allows us to change (increase or decrease) the object size. Mathematically, scal-
ing a point is defined as multiplying its coordinates by some scalers, called the scal-
ing factors. Thus, given a point p(x, y), we can scale it by a factor sx along X-axis
(direction) and sy along Y-axis (direction) to get the new point p′ (x′ , y′ ), as shown
in Eq. 3.3.

x′ = sx x (3.3a)

y = sy y (3.3b)
Y
x′

f r x, y

q
X

Fig. 3.3 The counterclockwise rotation operation about Z-axis by an angle φ

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 53 — #4


i i

Modeling Transformations 53

When the scaling factor is same along both the X and Y directions, the scaling is called
uniform. Otherwise, it is differential scaling. Thus, in order to scale up/down any object,
we need to apply Eq. 3.3 on its surface points, with scaling factor greater/less than one, as
illustrated in Fig. 3.4. Note in the figure that the application of scaling repositions the object
also (see the change in position of vertices in the figure).
We have so far seen transformations that can change the position and size of an object.
With shearing transformation, we can change the shape of an object. The general form
of the transformation to determine the new point (x′ , y′ ) from the current point (x, y) is
shown in Eq. 3.4, where shx and shy are the shearing factors along the X and Y directions,
respectively.

x′ = x + shx y (3.4a)

y = y + shy x (3.4b)

Similar to the scaling operation, we can apply Eq. 3.4 to all the surface points of the object
to shear it, as illustrated in Fig. 3.5. Note in the figure that shearing may also reposition the
object like scaling (see the change in position of vertices in the figure).

Y Y
sx = 1/2
sy = 1/3
4

2
2
1
X X
3 9 1 3
(a) (b)

Fig. 3.4 Illustration of the scaling operation. Figure 3.4(a) is shrunk to the shape shown in
Fig. 3.4(b). Note that the new vertices are obtained by applying Eq. 3.3 on the vertices of the
object.

Y Y
shx = 1/2
shy = 0

4 4

2 2

X X
3 9 3 9
(a) (b)

Fig. 3.5 The shearing of the object along horizontal direction (characterized by a positive
shear factor along X-direction and a shear factor of 0 along the Y-direction). Figure 3.5(a) is
distorted to the shape shown in Fig. 3.5(b). The new vertices are obtained by applying Eq. 3.4
on the vertices of the object.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 54 — #5


i i

54 Computer Graphics

3.2 MATRIX REPRESENTATION AND HOMOGENEOUS COORDINATE SYSTEM


The equation form of the transformations in Table 3.1 is not very useful in developing graph-
ics packages/libraries in a modular way. Instead, an alternative equivalent representation is
used, in which each of these transformations  is represented
 as a matrix. For example, scaling
sx 0
can be represented in matrix form as S = . Given a point P(x, y), we can represent
0 sy
 
x
it as a (column) vector P = . Then, the new vertices of a scaled object can be obtained
y
by multiplying the matrices together (i.e., P′ = S.P). A similar approach can be adopted for
rotation and shearing also.
However, if we choose to adopt 2 × 2 matrices for representing transformations, problem
occurs for translation. We cannot represent translation with a 2×2 matrix. In order to address
this issue, we take recourse to one mathematical trick known as the homogeneous coordinate
system. A homogeneous coordinate system is an abstract representation technique in which
we represent a 2D point P(x, y) with a 3-element vector Ph (xh , yh , h), with the relationship
x = xhh , y = yhh . The term h is the homogenous factor and can take any non-zero value.

Table 3.1 Four basic types of geometric transformations along with their characteristics

Transformations [Notation] General form Remarks

Translation x ′ = x + tx
[T(tx , ty )] y ′ = y + ty

Rotation x ′ = x cos φ − y sin φ • φ = angle of rotation


[R(φ)] y ′ = x sin φ + y cos φ • Clockwise rotation → negative
angle (replace φ with −φ)
• Anticlockwise rotation → posi-
tive angle

Scaling x ′ = sx x • sx , sy : Scaling factors along


[S(sx , sy )] y ′ = sy y the X and Y axes, respectively,
can take any real value (=1 if
no scaling)
• Scaling factor > 1, size
increases
• Scaling factor < 1, size
decreases
• Along with size, position may
change

Shear x ′ = x + shx y • shx , shy : Shearing factors


[Sh = (shx , shy )] y ′ = y + shy x along the X and Y axes,
respectively, can take any real
value (= 0 if no scaling).
• Along with shape, position may
change.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 55 — #6


i i

Modeling Transformations 55

Homogeneous coordinate system


A convenient way to represent transformations in The homogeneous factor h can take any
matrix form. It is an abstract 3D representation of non-zero value. To get back the original coor-
2D points (in general, n-D point represented by dinates, perform the following operations: x =
n + 1 vector).   xh y
,y = h.
  xh h h
x
P= → Ph  yh  The point (xh , yh , 0) is assumed to be at infinity
y
h and the point (0, 0, 0).

Any point of the form (xh , yh , 0) is assumed to be at infinity and the point (0, 0, 0) is not
allowed in this coordinate system.
When we represent geometric transformations in homogeneous coordinate system, our
earlier 2 × 2 matrices will transform to 3 × 3 matrices (in general, any N × N transforma-
tion matrix is converted to N + 1 × N + 1 matrix). Also, for geometric transformations, we
consider h = 1 (in later chapters, we shall see other transformations with h 6= 1). With these
changes, the matrices for the four basic transformations are given in Table 3.2.

Table 3.2 Matrix representation (in homogeneous coordi-


nates) of the four basic geometric transformations

Transformations Homogeneous matrix form


 
1 0 tx
 
Translation [T(tx , ty )] 0 1 t y 
 
 
0 0 1
 
cos φ − sin φ 0
 
Rotation [R(φ)]

 sin φ cos φ 0

 
0 0 1
 
s 0 0
 x 
Scaling [S(sx , sy )]  0 sy 0
 
 
0 0 1
 
1 shx 0
 
Shear [Sh(shx , shy )] shy 1 0
 
 
0 0 1

3.3 COMPOSITION OF TRANSFORMATIONS


Consider Fig. 3.6. The Figure 3.6(a) shows an object (the rectangle ABCD with length
2 units and height 1 unit) in its local coordinate. This object is used to define the chimney of
the house in Fig. 3.6(b) (in world coordinate scene).

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 56 — #7


i i

56 Computer Graphics

C′ B′
1
2 5 A′
D′
B
C
1
D
A 5
(a) (b)

Fig. 3.6 Example of composite transformation. The object of the Fig. 3.6(a) is transformed
to the object in Fig. 3.6(b), after application of a series of transformations

Clearly, the transformation of the object ABCD (in local coordinate) to A′ B′ C′ D′ (in
world coordinate) is not possible with a single basic transformation. In fact, we need two
transformations: scaling and translation.
How we can calculate the new object vertices? We shall follow the procedure as before,
namely multiply the current vertices with the transformation matrix. Only, here we have a
transformation matrix that is the composition of two matrices, namely the scaling matrix and
the translation matrix. The composite matrix is obtained by multiplying the two matrices in
sequence, as shown in the following steps.
Step 1: Determine the basic matrices
Note that the object is halved
 in length while the height remains the same. Thus,
0.5 0 0
the scaling matrix is, S =  0 1 0 . The current vertex D(0, 0) has now posi-
0 01
tioned at D′ (5, 5). Thus, there are 5 unit displacements along both horizontal
 and
105
vertical directions. Therefore, the translation matrix is, T =  0 1 5 .
001
Step 2: Obtain the composite matrix
The composite matrix is obtained by multiplying the basic matrices in sequence.
We follow the right-to-left rule in forming the multiplication sequence. The
first transformation applied on the object is the rightmost in the sequence. The
next transformation is placed on the left and we continue in this way till the
last transformation. Thus, our composite matrix for the example is obtained as
follows.     
105 0.5 0 0 0.5 0 5
M = T.S =  0 1 5   0 1 0  =  0 1 5 
001 0 01 0 01
Step 3: Obtain new coordinate positions
Next, multiply the surface points with the composite matrix as before, to obtain
the new surface points. In this case, we simply multiply the current vertices with the
composite matrix to obtain the new vertices.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 57 — #8


i i

Modeling Transformations 57
    
0.5 0 5 2 6

A = MA =  0 15   0 = 5
 
0 01 1 1
    
0.5 0 5 2 6
B′ = MB =  0 1 5   1  =  6 
0 01 1 1
    
0.5 0 5 0 5

C = MC =  0 15   1 = 6
 
0 01 1 1
    
0.5 0 5 0 5
D′ = MD =  0 1 5   0  =  5 
0 01 1 1
Note that the results obtained are in homogeneous coordinates. In order to obtain the
Cartesian coordinates, we divide the homogeneous coordinate values with the homogeneous
factor, which is 1 for geometric transformations. Thus, the Cartesian coordinates of the final
vertices are as follows:
A′ = (6/1, 5/1) = (6, 5),
B′ = (6/1, 6/1) = (6, 6),
C′ = (5/1, 6/1) = (5, 6), and
D′ = (5/1, 5/1) = (5, 5)
In composite transformations, we multiply basic matrices. We know that matrix multi-
plication is not commutative. Therefore, the sequence is very important. If we form the
sequence wrongly, then we will not get the correct result. In the previous example, if we
form the composite matrix M as M = S.T, then we will get the wrong vertices (do the
calculations and check for yourself).
How have we decided in the previous example, the sequence, namely first scaling and
then translation? Remember that in scaling, the position changes. Therefore, if we translate
first to the final position and then scale, the vertex positions would have changed. In stead, if
we first scale with respect to the fixed point D (the origin) and then translate the object (by
applying the same displacement to all the vertices), then the problem of repositioning of the
vertices will not occur. That is precisely what we did.
The example before is a special case where the fixed point was the origin itself. In general,
the fixed point can be anywhere in the coordinate space. In such cases, we shall apply the
aforementioned approach with slight modification.
Suppose we want to scale with respect to the fixed point F(x, y). In order to determine the
composite matrix, we assume the following sequence of steps.
1. The fixed point is translated to origin (−x and −y units of displacements in the horizontal
and vertical directions, respectively).
2. Scaling is performed with respect to origin.
3. The fixed point is translated back to its original place.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 58 — #9


i i

58 Computer Graphics

C B C′ B′
1 1
5 A 5 A′
D D′

5 5

(a) (b)

Fig. 3.7 Example of scaling with respect to an arbitrary fixed point (a) Object definition
(b) Object position in world coordinate

Thus, the composite matrix M = T(tx = x, ty = y).S(sx , sy ).T(tx = −x, ty = −y). Figure
3.7 illustrates the concept. This is a modification of Fig. 3.6. In this, the object is now defined
(Fig. 3.7(a)) with the vertices A(7, 5), B(7, 6), C(5, 6), and D(5, 5). Its world coordinate posi-
tion is shown in Fig. 3.7(b). Note that the scaling is done keeping D(5, 5) fixed. Hence, the
composite matrix M = T(tx = 5, ty = 5)S(sx = 0.5, sy = 1)T(tx = −5, ty = −5).
A similar situation arises in the case of rotation and shearing. In rotation, so far we
assumed that the object is rotated around the Z-axis. In other words, we assumed rotation
with respect to the origin through which the Z-axis passes. Similar to scaling, we can derive
the rotation matrix with respect to any fixed point in the XY coordinate space through which
the rotation axis (parallel to the Z-axis) passes. We first translate the fixed point to origin
(aligning the axis of rotation with the Z-axis), rotate the object, and then translate the point
back to its original place. Thus, the composite matrix is M = T(tx = x, ty = y).R(φ).T(tx =
−x, ty = −y), where (x, y) is the fixed point coordinate. Composite matrix for shearing
with respect to any arbitrary (other than origin) fixed point is derived in a similar way:
M = T(tx = x, ty = y).Sh(shx , shy ).T(tx = −x, ty = −y).
What happens when more than one basic transformation is applied to an object with
respect to any arbitrary fixed point? We apply the same process to obtain the compos-
ite transformation matrix. We first translate the fixed point to origin, perform the basic

Rotation, scaling, and shearing with respect to any arbitrary fixed point (other than origin)
The transformation matrix is obtained as a composition of basic transformations. All follow the same
procedure.
1. Translate the fixed point to origin.
2. Perform the transformation (rotation/scaling/shearing).
3. Translate the fixed point back to its original place.
The transformation matrix at the fixed point (x, y) is

 T(tx = x, ty = y).R(φ).T(tx = −x, ty = −y) Rotation
M = T(tx = x, ty = y).S(sx , sy ).T(tx = −x, ty = −y) Scaling
 T(t = x, t = y).Sh(sh , sh ).T(t = −x, t = −y) Shearing
x y x y x y

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 59 — #10


i i

Modeling Transformations 59

A′

A 1
1
5 5

5 5

Fig. 3.8 Example of composite transformations with respect to an arbitrary fixed point (5,5).
The object in Fig. 3.8(a) is transformed in Fig. 3.8(b). Note that two basic transformations are
involved: scaling and rotation.

transformations in sequence, and then translate the fixed point back to its original place.
An example is shown in Fig. 3.8.
Figure 3.8(a) shows the object (cylinder) with length 2 units and diameter 1 unit, defined
in its own (local) coordinate. The cylinder is placed on the roof of the house in Fig. 3.8(b)
(world coordinate), after scaling it horizontally by half and rotating it 90◦ anticlockwise with
respect to the fixed point (5,5). How to compute the new (world coordinate) position of the
object? We apply the approach outlined before.
Step 1: Obtain the composite matrix.
(a) Translate the fixed point (5,5) to origin.
(b) Scale by 1/2 in X-direction.
(c) Rotate anticlockwise by 90◦ .
(d) Translate the fixed point back to (5,5).
Composite matrix M =
T(tx = 5, ty = 5)R(90◦ )S(sx = 0.5, sy = 1)T(t − x = −5, ty = −5)
      
105 0 −1 0 0.5 0 0 1 0 −5 0 −10 10
= 0 1 5 1 0 0  0 1 0 0 1 −5 = 0.5 0 2.5
001 0 0 1 0 0 1 00 1 0 0 1

Step 2: Multiply surface point (column vectors) with composite matrix to obtain new
position (left as an exercise).

3.4 TRANSFORMATIONS IN 3D
Three-dimensional transformations are similar to 2D transformations, with some minor
differences.
1. We now have 4 × 4 transformation matrices (in homogeneous coordinate system) instead
of 3 × 3. However, the homogeneous factor remains the same (h = 1).
2. In 2D, all rotations are defined about Z-axis (or an axis parallel to it). However, in 3D, we
have three basic rotations, with respect to each of the principle axes X, Y, and Z. Also,

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 60 — #11


i i

60 Computer Graphics

the transformation matrix for rotation about any arbitrary axis (any axis of rotation other
than the principle axes) is more complicated than in 2D.
3. The general form of the shearing matrix is more complicated than in 2D.
In shearing, we can now define distortion along one or two directions keeping one direc-
tion fixed. For example, we can shear along X and Y directions, keeping Z direction fixed.
Therefore, the general form looks different from the one in 2D.

3.4.1 3D Shearing Transformation Matrix


In the general form, there are six shearing factors, each can take any real value or zero (if
no shear along that particular direction). The factors shxy and shxz are used to shear along
Y and Z directions, respectively, leaving the x coordinate value unchanged. Similarly, shyx
and shyz are used to shear along X and Z directions, respectively, leaving the y coordinate
value unchanged; The factors shzx and shzy are used to shear along X and Y directions,
respectively, leaving the z coordinate value unchanged.
 
1 shxy shxz 0
shyx 1 shyz 0
Shearing[shxy , shxz , shyx , shyz , shzx , shzy ] = 
shzx shzy 1 0

0 0 0 1

The transformation matrices for translation, rotation, and scaling in 3D are shown in
Table 3.3, which are similar to their 2D counterparts, except that there are three rotation
matrices in 3D.
The composite matrix for scaling and shearing with respect to any arbitrary fixed point is
determined in a similar way as in 2D, namely translate the fixed point to origin, perform the
operation and then translate the point back to its original place. Rotation about any arbitrary
axis, however, is more complicated as discussed next.
Table 3.3 The matrix representation (in homogeneous coor-
dinates) of the three basic geometric transformations in 3D

Transformations Homogeneous matrix form


 
1 0 0 tx
 
0 1 0 ty 
 
Translation [T(tx , ty , tz )] 



0 0 1 tz 
 
0 0 0 1
 
1 0 00
 
0 cos φ − sin φ 0
 
Rotation about X-axis [RX (φ)] 



0 sin φ cos φ 0
 
0 0 0 1

(Contd)

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 61 — #12


i i

Modeling Transformations 61

Table 3.3 (Contd)

Transformations Homogeneous matrix form


 
cos φ 0 sin φ 0
 
 0 1 0 0
 
Rotation about Y-axis [RY (φ)] 



− sin φ 0 cos φ 0
 
0 0 0 1
 
cos φ − sin φ 0 0
 
 sin φ cos φ 0 0
 
Rotation about Z-axis [RZ (φ)] 



 0 0 1 0
 
0 0 0 1
 
sx 0 0 0
 
 0 sy 0 0
 
Scaling [S(sx , sy , sz )] 



 0 0 sz 0
 
0 0 0 1

3.4.2 3D Rotation About Any Arbitrary Axis


The schematic of 3D rotation about any arbitrary axis is shown in Fig. 3.9, where we want to
rotate an object by an angle θ counterclockwise around an axis of rotation passing through
the two points P1 and P2.
As shown in Fig. 3.9, the transformation matrix is a composition of five transformations.
1. Translation to origin (T[−x, −y, −z]: (x, y, z) is the coordinate of P2).
2. Alignment of the axis of rotation with the Z-axis (it can be X- or Y-axis also).
(a) Rotation about X-axis, to put the axis of rotation on the XZ plane [RX (α): α is the
angle of rotation about X-axis].
(b) Rotation about Y-axis, to align the axis of rotation with Z-axis [RY (β): β is the angle
of rotation about Y-axis].
3. Rotation of the object about Z-axis [RZ (θ): theta is the angle of rotation of the object
about the axis of rotation].
4. Reverse rotation about Y− and X-axes to bring the axis of rotation back to its original
orientation.
5. Translation of the axis of rotation back to its original position.
Therefore, the composite transformation matrix is,

M = T −1 R−1 −1
X (α)RY (β)RZ (θ)RY (β)RX (α)T

Note that the inverse rotations essentially change the sign of the angle. For example, if
α is defined counterclockwise, then the inverse rotation will be clockwise. In other words,
R−1
X (α) = RX (−α).

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 62 — #13


i i

62 Computer Graphics

Y Y Y

P1 P1
P2
P2 P2
X X X
P1″
Initial position Step 1: Translate Step 2: Align the line
Z Z the line to Z with Z-axis (rotate
origin about X- and Y-axis)

Y Y Y

P1
P1′
P2
P2 P2′
X X X
P1″ Step 5: Translate
Step 3: Rotate the Step 4: Rotate the
the line back to its
Z object (about Z-axis) Z line to its original Z original positions
orientation

Fig. 3.9 Schematic of rotation about any arbitrary axis in 3D

Example 3.1
An object ABCD is defined in its own coordinate as A(1, 1, 0), B(3, 1, 0), C(3, 3, 0), and D(1, 3, 0).
The object is required to construct a partition wall A′ B′ C ′ D′ in a world-coordinate scene (A′ cor-
responds to A and so on). The new vertices are A′ (0, 0, 0), B′ (0, 4, 0), C ′ (0, 4, 4), and D′ (0, 0, 4).
Calculate the composite transformation matrix to perform the task.

Solution1 The situation is depicted in Fig. 3.10.

Y
Y
B′
D C

C′
A B
A′
Initial position Final position
Z D′
Z

Fig. 3.10 The initial and final positions of the object

Initially the square is in the XY plane with each side equal to 2 units and center at (2, 2, 0). The final
square is on the YZ plane with side equal to 4 units and the center at (0, 2, 2). The transformations
required are as follows:
1 It can be done in multiple ways. The solution presented here is one of those.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 63 — #14


i i

Modeling Transformations 63

1. Translate center to origin → T(−2, −2, 0).


2. Rotate by 90◦ anticlockwise around z-axis → RZ (90◦ ).
3. Rotate by 90◦ anticlockwise around y-axis → RY (90◦ ).
4. Scale by 2 in Y and Z direction → S(1, 2, 2).
5. Translate center to (0, 2, 2) → T(0, 2, 2).
The composite matrix M = T(0, 2, 2)S(1, 2, 2)RY (90◦ )RZ (90◦ )T(−2, −2, 0).

Example 3.2
Consider a circular track (assume negligible width, radius = 4 units, centered at origin). The track
is placed on the XZ plane. A sphere (of unit radius) is initially resting on the track with center on
the +Z axis, with a black spot on it at (0, 0, 6). When pushed, the sphere rotates around its own
axis (parallel to Y axis) at a speed of 1◦ /min as well as along the track (complete rotation around
the track requires 6 hrs). Assume all rotations are anticlockwise. Suppose an observer is present at
(3, 0, 7). Will the black spot be visible to the observer after the sphere rotates and slides down for
an hour and half?

Solution The situation is illustrated in Fig. 3.11.

Z
Black spot Observer

2
Y

Initial position

Fig. 3.11 Initial configuration

In the problem, we are required to determine the position of the black spot after one and half hours.
We need to determine the transformation matrix M, which, when multiplied to the initial position
of the black spot, gives its transformed location. Clearly, M is a composition of two rotations, one
for rotation of the sphere around its own axis (Raxis ) and the other for the rotation of the sphere
around the circular track (Rtrack ).
Since Raxis is performed around an axis parallel to Y-axis, we can formulate Raxis as
a composition of translation (of the axis to the Y-axis), the actual rotation with respect to
the Y axis and reverse translation (of the axis to its original place). Therefore, Raxis =
T(0, 0, 5).RY (θ).T(0, 0, −5). Since the sphere can rotate around its axis at a speed of 1◦ /min,
in one and half hours, it can rotate 90◦ . Therefore, θ = 90◦ .

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 64 — #15


i i

64 Computer Graphics

At the same time, the sphere is rotating along the circular track with a speed of 360◦ /6 = 60◦ /hour.
Therefore, in one and half hours, the sphere can move 90◦ along the track.
Therefore,
M = Rtrack (90◦ )Raxis (90◦ )
= RY (90◦ )T(0, 0, 5)RY (90◦ )T(0, 0, −5)
    
0 010 1000 0 0 1 0 10 0 0
0 1 0 0 0 1 0 0  0 1 0 0 0 1 0 0
=    
−1 0 0 0 0 0 1 5 −1 0 01 0 0 0 1 −5
0 001 0001 0 0 0 1 00 0 1
 
−1 0 0 5
0 1 0 0
=
0

0 −1 5
0 0 0 1
Thus, the position of the point after one and half hours is
    
−1 0 0 5 0 5
 0 1 0 0 0
  
0
 
P′ = MP =   0 0 −1 5 6 = −1.
0 0 0 1 1 1

The transformed position of the black spot (5, 0, −1) is clearly not visible from the observer’s
position.

SUMMARY
In this chapter, we have learnt to construct a scene from basic object definitions. The
scene is constructed by applying transformations on the objects. In the process, we learnt
the concepts of local coordinate (the coordinate system in which the object is defined)
and world coordinate (the coordinate in which the objects are assembled to construct
the scene).
There are four basic transformations, which are used either individually or in
sequence to perform various transformations on an object. The four are translation,
rotation, scaling, and shearing, which can change the position, shape, and size of an
object.
While we can represent transformations in analytical form, it is more convenient to represent
them as matrices for implementing graphics systems. In order to be able to represent all trans-
formations in matrix form, we use the homogeneous coordinate system, which is essentially an
abstraction and a mathematical trick. 2D transformations are represented as 3 × 3 matrices in
homogeneous coordinate system.
When we compose two or more basic transformations, we follow the right-to-left rule,
meaning that the first transformation is placed as the rightmost, the second transformation is
placed on the left, and so on till the last transformation. Then we multiply the transformation

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 65 — #16


i i

Modeling Transformations 65

matrices together to obtain the composite transformation. However, while composing the basic
transformations, it is very important to arrange them in proper sequence. Otherwise, the
composite matrix will be wrong.
The transformations in 3D are performed in almost a similar manner as 2D transformations.
The only notable differences are (a) the matrices are 4 × 4, (b) there are three basic rotation
matrices for each of X, Y and Z axis (as opposed to one in 2D), (c) the way shearing matrix
looks, and (d) the way rotation about any arbitrary axis takes place.
After the first two stages (object definition and geometric transformations), we now know
how to construct a scene. The next task is to assign colors to it, so that it looks realistic. Color
assignment is the next stage of the graphics pipeline, which we shall discuss in the next chapter.

BIBLIOGRAPHIC NOTE
The Graphics Gems series of books (Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994]
and Paeth [1995]) contains useful additional information on geometric transformation. Blinn and
Newell [1978] contains discussion on homogeneous coordinates in computer graphics. More
discussion on the use of homogeneous coordinates in computer graphics can be found in Blinn
[1993].

KEY TERMS
Composite transformation – composition (by matrix multiplication) of two/more basic modeling
transformations to obtain a new transformation
Differential scaling – when the amounts of change to an object size along the axial directions
(X , Y , and Z) are not the same
Homogeneous coordinate system – an abstract mathematical technique to represent any
n-dimensional point with a (n + 1) vector
Local/Object coordinates – the Cartesian coordinate reference frame used to represent individ-
ual objects
Modeling transformation – transforming objects from local coordinates to world coordinates
through some transformation operation
Rotation – the basic modeling transformation that changes the angular position of an object
Scaling factor – the amount of change of object size along a particular axial direction
Scaling – the basic modeling transformation that changes the size of an object
Scene/World coordinates – the Cartesian coordinate reference frame used to represent a scene
comprising of multiple objects
Shearing factor – the amount of change in shape of an object along a particular axial direction
Shearing – the basic modeling transformation that changes the shape of an object
Translation – the basic modeling transformation that changes the position of an object
Uniform scaling – when the amounts of change to an object size along the axial directions (X , Y ,
and Z) are the same

EXERCISES
3.1 What is the primary objective of the modeling transformation stage? In which coordinate
system(s) does it work?
3.2 Discuss how matrix representation helps in implementing modeling transformation in
computer graphics.

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 66 — #17


i i

66 Computer Graphics

3.3 Suppose you want to animate the movement of a pendulum, fixed at the point (0,5). Ini-
tially, the pendulum was on the Y-axis (along −Y direction) with its tip touching the origin.
The movement is gradual with a rate of 10◦ /s. It first moves counterclockwise for 9 s, then
returns (gradually) to its original position, then moves clockwise for 9 s, then again returns
(gradually) to its original position and continues in this manner. Determine the transforma-
tion matrix for the pendulum in terms of t. Use the matrix to determine the position of the
pendulum tip at t = 15s.
3.4 Derive the matrices for the following 2D transformations.
(a) Reflecting a point about origin
(b) Reflecting a point about the X-axis
(c) Reflecting a point about the Y-axis
(d) Reflecting a point about any arbitrary point
3.5 Explain homogeneous coordinate system. Why do we need it in modeling
transformation?
3.6 Although we have treated the shearing transformation as basic, it can be considered as a
composite transformation of rotation and scaling. Derive the shearing transformation matrix
from rotation and scaling matrices.
3.7 Consider a line with end points A(0, 0) and B(1, 1). After applying some transformation on
it, the new positions of the end points have become A′ (0, −1) and B′ (−1, 0). Identify the
transformation matrix.
3.8 A triangle has its vertices at A(1, 1), B(3, 1), and C(2, 2). Modeling transformations are
applied on this triangle which resulted in new vertex positions A′ (−3, 1), B′ (3, 1), and
C′ (2, 0). Obtain the transformation matrix.
3.9 A square plate has vertices at A(1, 1), B(−1, 1), C(−1, 3), and D(1, 3). It is translated by
5 units along the +X-direction and then rotated by 45◦ about P(0, 2). Determine the final
coordinates of the plate vertices.
3.10 Consider an object made up of a triangle ABC with A(1, 2), B(3, 2), and C(2, 3),
on top of a rectangle DEBA with D(1, 1) and E(3, 1). Calculate the new posi-
tion of the vertices after applying the following series of transformations on the
object.
(a) Scaling by half along the X-direction, with respect to the point (2, 2)
(b) Rotation by 90◦ counterclockwise, with respect to the point (3, 1)
3.11 Consider Fig. 3.12. In this figure, the thin circular ring (with radius = 1 unit) is rolling down
the inclined path with a speed of 90◦ /s. Assuming the ring rolls down along a straight

0,0,20
p

X
0,0,0
20,0,0

Fig. 3.12 Initial position of the ring

i i

i i
i i

“Chapter-3” — 2015/9/15 — 7:06 — page 67 — #18


i i

Modeling Transformations 67

line without any deviation, determine the transformation matrix to obtain the coordinate
of any surface point on the ring at time t. Use the matrix to determine the position of
p at t = 10s.
3.12 Consider a sphere of diameter 5 units, initially (time t = 0) centered at the point
(10,0,0). The sphere rotates around the Z-axis counterclockwise with a speed of 1◦ /min.
An ant can move along the vertical great circular track (parallel to the XZ plane) on
the sphere counterclockwise. It is initially (t = 0) located at the point (10,0,5) and
can cover 1 unit distance along the track in 1 sec. (assume π ≈ 3). Determine the
composite matrix for the ant’s movement and use it to compute the ant’s position
at t = 10 min.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 94 — #1


i i

CHAPTER
Color Models
5 and Texture
Synthesis
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the physiological process behind the perception of color
• Learn about the idea of color representation through the use of color models
• Understand additive color models and learn about RGB and XYZ models
• Understand subtractive color models and learn about the CMY model
• Learn about the HSV color model, which is popularly used in interactive painting/drawing
software applications
• Get an overview of the three texture synthesis techniques-projected texture, texture
mapping, and solid texturing

INTRODUCTION
The fourth stage of the graphics pipeline, namely the coloring of 3D points, has many
aspects. One aspect is the use of the lighting models to compute color values. This has
been discussed in Chapter 4. In the related discussions, we mentioned the basic principle
governing color computation, namely that color of a point is a psychological phenomenon
resulting from the intensities of the reflected light incident on our eyes. In this chapter, we
shall see in some detail what happens inside our eye that gives us a sensation of color. Along
with that, we shall also have a look at different color models, that are essentially alternative
representations of color aimed at simplifying color manipulation. Color models are the sec-
ond aspect of understanding the fourth stage. In addition, we shall have some discussion on
texture synthesis, which acts as an improvement of the simple lighting models to generate
more realistic effects.

5.1 PHYSIOLOGY OF VISION


We came across the fact that color is a psychological phenomenon. The physiology of our
visual system gives rise to this psychological behavior. Figure 5.1 shows a schematic of the
physiology of our visual system, marking the important components.
The light rays incident on the eye pass through the cornea, the pupil, and the lens in that
sequence to finally reach the retina. In between, the rays get refracted (by the cornea and

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 95 — #2


i i

Color Models and Texture Synthesis 95

Vitreous
Lens
humor

Iris
Retina

Pupil
Central
fovea

Optical
nerve

Cornea
Aqueous
humor Ciliary
muscles
Sclera

Fig. 5.1 Schematic diagram of our visual system


the lens) so as to focus the images on the retina. The amount of light entering the eye is
controlled by the iris (by dilating or constricting the pupil size). The retina is composed of
optical nerve fibres and photoreceptors that are sensitive to light. There are two types of
photoreceptors—rods and cones. The concentration of rods is more in the peripheral region
of the retina, whereas the cones are mainly present in a small central region of the retina
known as the fovea. More than one rod can share an optic nerve. In such cases, the nerve
pools the stimulation by all the connected rods and aids sensitivity to lower levels of light.
In contrast, there are more or less one optic nerve fibre for each cone, which aids in image
resolution or acuity. Vision accomplished mainly with cones is known as photopic vision,
whereas the vision accomplished mainly by rods is called scotopic vision. Only in photopic
vision, we can perceive colors; in scotopic vision, weak lights are visible as a series of grays.
As we know, what we call visible light actually refers to a spectrum of frequencies of
electromagnetic (light) waves. At the one end of the spectrum is the red light (frequency:
4.3 × 1014 Hz, wavelength: 700 nm) and at the other end is the violet light (frequency: 7.5 ×
1014 Hz, wavelength: 400 nm). Light waves within this range are able to excite the cones in
our eye, giving the photopic vision (i.e., perception of color). There are three types of cones
present in the eye: L or R, which are most sensitive to the red light; M or G, which are most
sensitive to the green light (wavelength: 560 nm); and S or B, which are most sensitibe to
blue light (wavelength: 430 nm). Perception of color results from the stimulation of all the
three cone types together (thus, this is also referred to as the tristimulus theory of vision).

5.2 COLOR MODELS


In computer graphics, we are interested in synthesizing colors so as to create realistic scenes.
The presence of metamers (see the following boxed text) makes this possible in principle.
Metamers imply that we can create any color without actually worrying about the optics

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 96 — #3


i i

96 Computer Graphics

What is metamerism? How is the idea helpful in computer graphics?


Light incident on our eye is composed of dif- metamerism and the different spectra that result
ferent frequencies (light spectrum). The compo- in the sensation of the same color are known as
nent frequencies excite the three cone types metamers.
L, M, and S in different ways, giving us the Metamers imply that we don’t need to know the
sensation of a particular color. However, a par- exact physical process behind the perception of a
ticular color perception can result from differ- particular color. Instead, we can always come up
ent spectra. That means, sensation of a color with an artificial way to generate that sensation
C resulting from an incident light spectrum S1 (metamers). Thus, it is possible to generate real-
can also result from a different light spec- istic scenes artificially, without knowing the actual
trum S2 . This optical behavior is known as optical process.

behind it (how it is perceived in our eye). What we can do is to come up with a set of
basic (or primary) colors. We then mix these colors in appropriate amounts to synthesize
the desired color. This gives rise to the notion of color models (i.e., ways to represent and
manipulate colors).

5.2.1 RGB Color Model


As we saw before, there are three cone types in the retina: L, M, and S. Cones of type L get
excited mostly by red light, M by green light, and S by blue light. Therefore, the incident
light excites these three cones in different ways, which results in the photopic vision (i.e.,
color perception). Therefore, we can think of color as a mixture of the three primary colors:
red, green, and blue. We need to mix the primary colors in appropriate amounts to synthesize
a desired color. This model of color, where we assume that any color is a combination of
the red, green, and blue colors in appropriate amount, is known as the RGB model (see
Fig. 5.2a). This is an additive model in the sense that any color is obtained by adding proper
amounts of red, green, and blue colors.

0.4 Y
fR
fB
Green (0, 1, 0) Yellow (1, 1, 0)
Color-matching
RGB amounts

fG
0.2
White (1, 1, 1)
Cyan (0, 1, 1)
X
Black Red (1, 0, 0)
0 (0, 0, 0)
400 500 600 700 l (nm)
Blue (0, 0, 1)
Magenta (1, 0, 1)
Wavelength
Z
(a) (b)

Fig. 5.2 The RGB model. In (a), the basic idea is to illustrate with the primary light waves and the
amount of the light required to generate colors. The 3D cube due to the RGB model is shown in (b).
Color is represented as a point in the cube, corresponding to specific amount of red, green, and blue.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 97 — #4


i i

Color Models and Texture Synthesis 97

Since there are three primaries, we can think of a color as a point in a three dimensional
color space. The three axes of the space correspond to the three primary colors. If we are
using the normalized values of the colors (within the range [0, 1]), then we can visualize the
RGB model as a 3D color cube, as shown in Fig. 5.2(b). The cube is the color gamut (i.e., set
of all possible colors) that can be generated by the RGB model. The origin or the absence of
the primaries represent the black color whereas we get the white color when all the primaries
are present in equal amount. The diagonal connecting the black and white colors represents
the shades of gray. The yellow color is produced when only the red and green colors are
added in equal amounts in the absence of blue. Addition of only blue and green in equal
amounts with no red produces cyan, while the addition of red and blue in equal amounts
without green produces magenta.

5.2.2 XYZ Color Model


Although the RGB model is very intuitive owing to its direct correspondence to the tris-
timulus color theory, it has its limitations. For some of the colors in the visible light
spectrum (colors around the 500 nm wavelength, see Fig. 5.2a), the additive model fails.
The colors in this region can not be obtained by adding any amounts of the three pri-
mary colors. In fact, this is a problem for any color model based on naturally occur-
ring primary colors: no model can account for all the colors in the visible spectrum.
In order to overcome this problem, the Commission Internationale de l’Eclairage (CIE)
proposed three hypothetical lightwaves (usually denoted by the letters X, Y, and Z) for
use as primary colors. The idea is, we should be able to generate any wavelength λ in
the visible range by positive combinations of the three primaries, as illustrated in Fig.
5.3. This model is known as the XYZ model due to the name given to the primary
colors.
By definition, the XYZ model is additive. Therefore, any color C can be represented in
the XYZ color space (which is three-dimensional like the RGB color space) as an additive
combination of the primaries using the unit vectors along the axes (corresponding to the

0.5

0.4
primary amounts
Color-matching

Z
0.3

0.2 Y X

0.1

0
390 430 470 510 550 590 630 670 710
Wavelength

Fig. 5.3 The figure illustrates the three hypothecial lightwaves designed for the XYZ model
and the amounts in which they should be mixed to produce a color in the visible range.
Compare this figure with Fig. 5.2(a) and notice the difference.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 98 — #5


i i

98 Computer Graphics

three primaries), as shown in Eq. 5.1.

C = X X̂ + Y Ŷ + Z Ẑ (5.1)

For convenience, the amounts X , Y , and Z of the primary colors used to generate C
are represented in normalized forms. Calculations of the normalized forms are shown
in Eq. 5.2.

X
x= (5.2a)
X +Y +Z
Y
y= (5.2b)
X +Y +Z
Z
z= (5.2c)
X +Y +Z

Since x + y + z = 1, we can represent any color by specifying just the x and y


amounts. The normalized x and y are called the chromaticity values. If we plot the chro-
maticity values, we get a tongue-shaped curve as shown in Fig. 5.4. This curve is known
as the CIE chromaticity diagram. The spectral (pure) colors are represented by points
along the curve. However, the line joining the violet and red spectral points, called the

0.9 520
530 Green
515
0.8 540

510 550
0.7 Cyan
505 560
0.6 570
500 Yellow
0.5 580
y
590
495
0.4
600 Orange
E 610
0.3 490 620
Achromatic point Red
650
(x = 1/3, y = 1/3) 700
0.2 485
Blue
480 Line of
0.1
purples
470
460 Violet
0.0 400
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
x

Fig. 5.4 CIE chromaticity diagram for the visible color spectrum

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 99 — #6


i i

Color Models and Texture Synthesis 99

Importance of the CIE chromaticity diagram


The CIE chromaticity diagram represents the diagram and then join those points using lines
whole range of perceptible colors in 2D. As we to visualize the color gamut represented by the
have already mentioned, no other set of natu- set of primaries. In this way, we can visualize and
ral primaries (representing a color model) can compare the color gamuts generated by different
generate all possible colors. Thus, any sets of pri- models (sets of primaries). Thus, the chromatic-
maries (such as red, green, and blue) generates ity diagram allows us to compare different color
only a subset of the colors represented by the models. As an example, the following figure shows
CIE chromaticity diagram. Given the x and y val- the RGB color gamut within the chromaticity
ues, we can plot the primaries on the chromaticity diagram.

530
510
Green G 550

560

500
Yellow
580
B t
RG amu Orange
r g 600
o
col
490 R
White point
Red 700

les
fp urp
480 eo
B Lin
Blue
400 Violet

purple line, is not part of the spectrum. Interior points in the diagram represent all possi-
ble visible colors. Therefore, the diagram is a 2D representation of the XYZ color gamut.
The point E represents the white-light position (a standard approximation for average
daylight).

5.2.3 CMY Color Model


As we saw before, the RGB color model is additive. That means, a color is generated by
adding appropriate amounts of the primaries red, green, and blue. The video display devices
such as the CRT are designed using this color model. In those devices, each pixel location
contains three dots, each corresponding to one of the primary colors. These dots are excited
individually, resulting in the emission of the corresponding primary lights with appropriate
intensity. When we see it, they appear to represent the color. However, in many hard-copy
devices such as printers and plotters, a different process is used. In such cases, a color picture
is produced by coating a paper with color pigments (inks). When we look at the paper,

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 100 — #7


i i

100 Computer Graphics

reflected lights from the points containing the pigments comes to our eye giving us the per-
ception of color. This process is, however, not additive. The color perception results from
the subtraction of primaries.
We can form a subtractive color model with the primaries cyan, magenta, and yellow.
Consequently, the model is called the CMY model. The primary cyan is a combination of
blue and green (see Fig. 5.2b). Thus, when white light is reflected from a cyan pigment on a
paper, the reflected light contains these two colors only and the red color is absorbed (sub-
tracted). Similarly, green component is subtracted by the magenta pigment and the primary
yellow subtracts the blue component of the incident light. Therefore, we get the color due to
the subtraction of the red, green, and blue components from the white (reflected) light by the
primaries.
We can depict the CMY model as a color cube in 3D in the same way we did it for the
RGB model. The cube is shown in Fig. 5.5. Note how we can find the location of the cor-
ner points. Clearly origin (absence of the primaries) represents the white light. When all the
primaries are present in equal amounts, we get black color since all the three components of
the white light (red, green, and blue) are absorbed. Thus, the points on the diagonal joining
white and black represents different shades of gray. When only cyan and magenta are present
in equal amount without any yellow color, we get blue because red and green are absorbed.
Similarly, presence of only cyan and yellow in equal amount without any magenta color
results in green and the red color results from the presence of only the yellow and magenta
in equal amounts without cyan.
The CMY model for hardcopy devices is implemented using an arrangement of three ink
dots, much in the same way three phosphor dots are used to implement the RGB model
on a CRT. However, sometimes four ink dots are used instead of three, with the fourth
dot representing black. In such cases, the color model is called the CMYK model with
K being the parameter for black. For black and white or gray-scale images, the black dot
is used.

Magenta (0, 1, 0) Blue (1, 1, 0)

Black (1, 1, 1)
Red (0, 1, 1)
X
White Cyan (1, 0, 0)
(0, 0, 0)
Yellow (0, 0, 1)
Green (1, 0, 1)
Z

Fig. 5.5 CMY color cube—Any color within the cube can be described by subtracting the
corresponding primary values from white light

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 101 — #8


i i

Color Models and Texture Synthesis 101

We can convert the CMY color representation to the RGB representation and vice versa,
through simple subtraction of column vectors. In order to convert CMY representation to
RGB, we use the following subtraction.
     
C 1 R
M  = 1 − G
Y 1 B

The opposite conversion (from RGB to CMY representation) is done in a likewise manner,
as given here.
     
R 1 C
G = 1 − M 
B 1 Y

5.2.4 HSV Color Model


If you are using a drawing application, you need some interface to color the pictures you
draw. The interface can be built based on the color models we discussed in the preceding
sections. For example, there can be an interface through which you enter the R, G, and B
values in some format and the system generates the corresponding color using the additive
model. Clearly, this is not very intuitive from a users’ perspective: in order to generate a
color, you need to know the amount of the primary colors.
Artists follow a different approach to color their pictures. In order to generate a specific
color, they first choose a saturated (pure) color (e.g., red, green, blue, yellow, etc.). Next,
they add white to the pure color to decrease saturation. This is known as tint. In order to gen-
erate shades, black is added to the pure color. Both white and black can also be added to the
pure color to generate tones. Thus, any color is generated by the process of tinting, shading,
or toning. From an artist’s (user’s) perspective, this approach is more intuitive than the one
where we need to know the amount of the primary colors required to generate a color.
The intuitive color generation process is replicated with the HSV color model. The
acronym HSV stands for Hue, Saturation, and Value. The color space represented by the
model is the hexcone shown in Fig. 5.6. The pure colors are represented by the boundary
of the hexagon on top. These colors are identified by the hue (H) value, which is the angle
the color make with the axis that connects the white (at the center) and red colors on the
hexagon (white to red is the positive axis direction about which the hue angle is measured).
Thus, red have a hue value of 0 whereas cyan has a value of 180. The saturation (S) value
determines the purity of the color: any color on the boundary has the highest S value of 1.0;
as we move towards the center of the hexcone, the saturation value decreases towards the
minimum (0.0). The parameter value (V) indicates the brightness (intensity) of the color. On
the hexagonal plane, the brightness is the maximum (1.0). As we move towards the apex of
the hexcone, V decreases till it reaches 0.0.
We can relate the operations on the HSV model to that of the intuitive coloring process.
When we are on any cross-sectional plane (same or parallel to the top hexagon) of the hex-
cone (fixed V) and moving from any boundary color (fixed H) towards the center of the

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 102 — #9


i i

102 Computer Graphics

V
Green
Yellow
120°

1.0 Red
Cyan White

Blue Magenta
240°

H
Black S
0.0

Fig. 5.6 HSV color space. Movement from boundary towards center on the same plane
represents tinting; movement across planes parallel to the V axis represents shading; and
movement across planes from boundary towards center represent toning.

plane, we are reducing S. Reduction in S value is equivalent to the tinting process. Alterna-
tively, when we move from a point on any cross-sectional plane (same or parallel to the top
hexagon) towards the hexcone apex, we are only changing V. This is equivalent to the shad-
ing process. Any other movement (from boundary to the center across planes) in the hexcone
represents the toning process.

5.3 TEXTURE SYNTHESIS


When we apply lighting models to assign colors (or shades of gray), we get objects with
smooth surfaces. However, the surfaces that we see around us contain complex geometric
patterns or textures on top of the surface colors and are usually rough. As an example,
consider the textured surface shown in Fig. 5.7. Such textures or roughness cannot be
synthesized with the lighting model alone. Various techniques are used to make the syn-
thesized surfaces look realistic. In this section, we shall discuss the various texture synthesis
techniques.

Fig. 5.7 A block made from wood. Notice the patterns (texture) on the surface. It is not
posible to generate such patterns with only the lighting model.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 103 — #10


i i

Color Models and Texture Synthesis 103

MIPMAP
MIPMAPs are special types of projected texturing method. MIP stands for Multum In Parvo or many
things in a small space. In this technique, a series of texture maps with decreasing resolutions, for the
same texture image, are stored, as illustrated in the following figure.

Original texture

1/4

1/16
1/64
etc
1 pixel

These different maps are used to paste textures


at different places of the surface, so as to get a
realistic effect. The method is useful for imposing
textures on surfaces with perspective projection,
as shown in the following figure. As we can see in
the following figure, the texture on the near part is
synthesized with the largest texture map; smaller
sized maps are subsequently used to synthe-
size textures on the far side. Obviously, MIPMAP
takes more storage space for storing the different
versions of the same texture image.

We can broadly categorize the various texture synthesis techniques into the following
three types1 .
1. Projected texture
2. Texture mapping
3. Solid texture

5.3.1 Projected Texture


As we know, a computer screen can be viewed as a 2D array of pixels. The scenes are gener-
ated by assigning appropriate color values to these pixels. Thus we have a 2D array of pixel
values. On the surface, we want to generate a particular texture. What we can do is somehow
create the texture pattern and paste it on the surface.

1 Thenames of the categories, however, are not standard. You may find different names used to denote the
same concepts.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 104 — #11


i i

104 Computer Graphics

Example 5.1
Texture mapping method
Consider the situation shown in Fig. 5.8. On the left side is a (normalized) texture map defined in
the (u, w) space. This map is to be ‘pasted’ on a 50 × 50 square area in the middle of the object
surface as shown in the right side figure. What are the linear mappings we should use?
w y

100
1.0

100

u z 100
0 1.0

Fig. 5.8

Solution As we can see, the target surface is a square of side 100 units, on a plane parallel to the
XY plane. Therefore, the parametric representation of the target area (middle of the square) is,

x = θ with 25 ≤ θ ≤ 75
y = φ with 25 ≤ φ ≤ 75
z = 100

Now let us consider the relationships between the parameters in the two spaces with respect to the
corner points.
The point u = 0, w = 0 in the texture space is mapped to the point θ = 25, φ = 25.
The point u = 1, w = 0 in the texture space is mapped to the point θ = 75, φ = 25.
The point u = 0, w = 1 in the texture space is mapped to the point θ = 25, φ = 75.
The point u = 1, w = 1 in the texture space is mapped to the point θ = 75, φ = 75.
We can substitute these values into the linear mappings θ = Au + B, φ = Cw + D to determine
the constant values. The values thus determined are: A = 50, B = 25, C = 50, and D = 25 (left as
an exercise for the reader). Therefore, the mappings we can use to synthesize the particular texture
on the specified target area on the cube surface are θ = 50u + 25, φ = 50w + 25.

This idea is implemented in the projected texture method. We create a texture image, also
known as texture map from synthesized or scanned images. The map is a 2D array of color
values. Each value is called a texel. There is a one-to-one correspondance between the texel
array and the pixel array. We now replace the pixel color values with the corresponding texel
values to mimic the ‘pasting of the texture on the surface’. The replacement can be done in
one of the following three ways.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 105 — #12


i i

Color Models and Texture Synthesis 105

1. We can replace the pixel color value on a surface point with the corresponding texel value.
This is the simplest of all.
2. Another way is to blend the pixel and texel values. Let C be the color after blending the
pixel value Cpixel and the texel value Ctexel . Then, we can use the following equation for
smooth blending of the two: C = (1 − k)Cpixel + kCtexel where 0 ≤ k ≤ 1.
3. Sometimes a third approach is used in which we perform a logical operation (AND, OR)
between the two values (pixel and texel) represented as bit strings. The outcome of the
logical operation is the color of the surface point.
Projected texture method is suitable when the target surfaces are relatively flat and facing
the reference plane (roughly related to the screen, as we shall see later). However, for curved
surfaces, it is not very useful and we require some other method, as discused in the next
section.

5.3.2 Texture Mapping


On a curved surface, simple pasting of pre-synthesized texture patterns does not work. We
go for a more general definition of the texture map. Now, we asume the texture map to be
defined in a 2D texture space, whose principle axes are typically denoted by the letters u
and w. The object surface is represented in parametric form, usually denoted by the symbols
θ and φ. Two mappings are then defined from the texture space to the object space as:
θ = f (u, w), φ = g(u, w).
In the simplest case, the mapping functions are assumed to be linear. Therefore, we can
write: θ = Au + B, φ = Cw + D. A, B, C, and D are constants, whose values can be
determined from the relationships between known points in the two spaces (for example, the
corner points of the texture map and the corresponding surface points as illustrated in the
example).

5.3.3 Solid Texture


While texture mapping is useful to deal with curved surfaces, it is still difficult to use in many
situations. In texture mapping, we need to define the mapping between the two spaces. How-
ever, for complex surfaces, it is difficult to determine the mapping. Also in situations where
there should be some ‘continuity’ of the texture between adjacent surfaces (see Fig. 5.9),
the methods we discused so far are not suitable. In such cases, we use the solid texturing
method.
In this method, a texture is defined in a 3D texture space (note that in the previous meth-
ods, texture maps were defined in 2D), whose principle axes are usually denoted by the
letters u, v, and w. When rendering an object, the object is placed in the texture space
through transformations. Thus a point P(x,y,z) on the object surface is tranformed to a point
P′ (u,v,w) in the texture space. The color value associated with P′ is then used to color the
corresponding surface point.

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 106 — #13


i i

106 Computer Graphics

Fig. 5.9 An example situation where solid texturing method is required. The white lines
show the surface boundaries. Note the continuation of the texture patterns across adjacent
surfaces.

SUMMARY
In this chapter, we have discussed the fundamental idea behind the sensation of color, through a
brief discussion on the physiology of vision. We learnt that the three cone type photoreceptors
are primarily responsible for our perception of color, which gives rise to the Tristimulus theory
of color. We also learnt that the existence of metamers, which are the various spectra related
to the generation of a color, makes it possible to synthesize colors artificially, without mimicking
the exact natural process.
Next, we discussed about the idea of representing colors through the color models. These
models typically use a combination of three primary colors to represent any arbitrary color. Two
types of combinations are used in practice: additive and subtractive. We discussed two additive
color models, namely the RGB and the XYZ models. The CMY model is discussed to illustrate
the idea of the subtractive model. Finally, we discussed the HSV model, which is primarily used
to design user interfaces for interactive painting/drawing applications.
The third topic we learnt about in this chapter is synthesis of textures/patterns on a surface, to
make the surfaces look realistic. Three types of textures synthesis techniques are introduced. In
the projected texture method, a texture pattern (obtained from a synthesized or scanned image)
is imposed on a surface through the use of blending functions. In the texture mapping technique,
a texture map/pattern defined in a two-dimensional space is mapped to the object space. The
solid texturing method extends the texture mapping idea to three dimensions, in which a 3D
texture pattern is mapped to object surfaces in three dimensions.
Once the coloring is done, we transform the scene defined in the world coordinate system
to the eye/camera coordinate system. This transformation, known as the view transformation,
is the fourth stage of the 3D graphics pipeline, which we shall learn in Chapter 6.

BIBLIOGRAPHIC NOTE
More discussion on human visual system and our perception of light and color can be found
in Glassner [1995]. Wyszecki and Stiles [1982] contains further details on the science of color.
Color models and its application to computer graphics are described in Durrett [1987], Hall
[1989] and Travis [1991]. Algorithm for various color applications are presented in the com-
puter gems series of books (Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994] and
Paeth [1995]). More on texture-mapping can be found in Demers [2002].

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 107 — #14


i i

Color Models and Texture Synthesis 107

KEY TERMS
Additive model – a color model that represents arbitrary colors as a sum (addition) of primary
colors
CIE chromaticity diagram – the range of all possible colors in two-dimension
CMY color model – a subtractive color model in which the cyan, magenta, and yellow are the
primary colors
Color gamut – the set of all possible colors that can be represented by a color model
Color models – the ways to represent and manipulate colors
Cones – one type of photoreceptors that help in image resolution or acuity
HSV color model – a color model typically used to design the user interfaces in painting
applications
Metamerism – the phenomenon that color perception can result from different spectra
Metamers – different spectra that gives rise to the perception of the same color
MIPMAP – Multum In Parvo Mapping, which is a special type of projected texturing technique
Photoreceptors – the parts of the eye that are sensitive to light
Primary colors – the basic set of colors in any color model, which are combined together to
represent arbitrary colors
Projected texture – the technique to synthesize texture on a surface by blending the surface color
with the texture color
RGB color model – an additive color model in which the red, green, and blue colors are the three
primary colors
Rods – one type of photoreceptors that are sensitive to lower levels of light
Solid texturing – the texture mapping technique applied in three-dimension
Subtractive model – a color model that represents arbitrary colors as a difference (subtraction)
of primary colors
Texel – the color of each pixel in the texture map grid
Texture image/texture map – a grid of color values obtained from a synthesized/scanned image
Texture mapping – the technique to synthesize texture on a surface by mapping a texture defined
in the texture space to the object space
Tristimulus theory of vision – the theory that color perception results from the activation of the
three cone types together
Visible light – a spectrum of frequencies of the electromagnetic light wave (between 400 nm to
700 nm wavelength)
XYZ color model – an additive standardized color model in which there are three hypothetical
primary colors denoted by the letters X, Y, and Z.

EXERCISES
5.1 Explain the process by which we sense colors.
5.2 It is not possible to exactly mimic the lighting process that occurs in nature. If so, how does
computer graphics work?
5.3 What is the basis for developing models with three primary colors such as RGB?
5.4 Discuss the limitation of the RGB model that is overcome with the XYZ model.
5.5 Briefly discuss the relationship between the RGB and the CMY models. When is the CMY
model useful?
5.6 Explain the significance of using the HSV model instead of the RGB or CMY models.
5.7 Mention the three broad texture synthesis techniques.
5.8 Explain the key idea of the projected texture method. How are the texels and pixels
combined?
5.9 Explain the concept of MIPMAP. How is it different from projected texture methods?

i i

i i
i i

“Chapter-5” — 2015/9/15 — 17:50 — page 108 — #15


i i

108 Computer Graphics

5.10 Discuss how texture mapping works. In what respect is it different from projected texture
methods? When do we need it?
5.11 Discuss the basic idea behind solid texturing. When do we use it?
5.12 Consider a cube ABCDEFGH with side length 8 units; E is at origin, EFGH on the XY plane,
AEHD on the YZ plane and BFEA on the XZ plane (defined in a right-handed coordinate
system). The scene is illuminated with an ambient light with an intensity of 0.25 units and
a light source at location (0,0,10) with an intensity of 0.25 unit. An observer is present at
(5,5,10). Assume ka = kd = ks = 0.25 and ns = 10 for the surfaces. We want to render
a top view of the cube on the XY plane. A texture map is defined in the uw space as a
circle with center at (1.0,1.0) and radius 1.0 unit. Any point p (u,w) within this circle has
u . We need to map this texture on a circular region of radius 3 units in the mid-
intensity u+w
dle of the top of the cube. What would be the color of the points P1(4,2,8) and P2(3,1,8),
assuming Gouraud shading (ignore attenuation)? [Hint: See Example 4.1 in Chapter 4 and
Example 5.1].

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 109 — #1


i i

CHAPTER
3D Viewing
6
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the 3D viewing transformation stage and its importance in computer
graphics
• Set-up the 3D viewing coordinate reference frame
• Understand the mapping from the world coordinate frame to the viewing coordinate
frame
• Get an overview of the parallel and perspective projections with subcategories
• Learn to perform parallel projection of a 3D scene in the view coordinate frame to the
view plane
• Learn to perform perspective projection of a 3D scene in the view coordinate frame to
the view plane
• Understand the concept of canonical view volumes
• Learn to map objects from clipping window to viewport

INTRODUCTION
Let us recollect what we have learnt so far. First, we saw how to represent objects of varying
complexities in a scene (Chapter 2). Then, we saw how to put those objects together through
modeling transformations (Chapter 3). Once the objects are put together to synthesize the
scene in the world coordinate system, we learnt how to apply colors to make the scene real-
istic (Chapters 4 and 5). All these discussions up to this point, therefore, equipped us to
synthesize a realistic 3D scene in the world coordinate system. When we show an image on
a screen, however, we are basically showing a projection of a portion of the 3D scene.
The process is similar to that of taking a photograph. The photo that we see is basically a
projected image of a portion of the 3D world we live in. In computer graphics, this process
is simulated with a set of stages. The very first of these stages is to transform the 3D world
coordinate scene to a 3D view coordinate system (also known as the eye or camera coordi-
nate system). This process is generally known as the 3D viewing transformation. Once this
transformation is done, we then project the transformed scene onto the view plane. From the
view plane, the objects are projected onto a viewport in the device coordinate system. In this

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 110 — #2


i i

110 Computer Graphics

chapter, we shall discuss about the 3D viewing, projection, and viewport transformation
stages.

6.1 3D VIEWING TRANSFORMATION


Let us go back to our photograph analogy and try the understand the process in a little more
detail. What is the first thing we do? We point our camera to a particular direction with a
specific orientation, so as to capture the desired part of the scene. Then, we set our focus
and finally click the picture. Focusing is the most important part here. Through this mech-
anism, we get to know (or at least estimate) the quality and coverage of the picture taken.
To set focus, we look-at the scene through the viewing mechanism provided in the camera.
Note the difference: instead of looking at the scene directly, we are looking at it through the
camera. In the former case, we are looking at the scene in its world coordinate system. In
the latter case, we are looking at a different scene, one that is changed by the arrangement
of lenses in the camera, to aid us in our estimation. Therefore, in taking a photograph with
a camera, we actually change or transform the 3D world coordinate scene to a description
(in another coordinate system) characterized by the camera parameters (position and orien-
tation). The latter coordinate system is generally called the view coordinate system and the
transformation is known as the viewing transformation.
In order to simulate the viewing transformation in computer graphics, we need two things.
First, we need to define the view coordinate system and then, we perform the transformation.
Let us first see how we can define (or setup) the view coordinate system.

6.1.1 Setting up a View Coordinate System


The process that we are trying to simulate is visualized in Fig. 6.1. Note the three axes
xview , yview , and zview orthogonal to each other. These three are the basis vectors essentially
defining the view coordinate system.
Although we have used the common notation (x, y, and z) in Fig. 6.1 to denote the three
basis vectors of the view coordinate system, they are usually denoted by the three vectors uE,
Ev, and nE. In the subsequent discussion, we shall use the latter notation.

Yworld Camera
Yview

Xview

Zview
Xworld

Object

Zworld

Fig. 6.1 Visualization of the view coordinate system, defined by the mutually orthogonal
xview , yview , and zview axes. The xworld , yworld , and zworld axes define the world coordinate
system.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 111 — #3


i i

3D Viewing 111

View-up vector
View-up
point
View plane
Look-at point

World coordinate
origin
n

Camera position

Fig. 6.2 Illustration of the determination of the three basis vectors for the view coordinate
system
How do we set-up the view coordinate system. The first thing is to determine the ori-
gin, where the three axes meet. This is easy. We assume that the camera is represented
as a point in the 3D world coordinate system. We simply choose this point as our origin
(denoted by oE). However, determining the vectors uE, Ev, and nE that meet at this origin is
tricky.
When we try to bring something into focus with our camera, the first thing we do is to
choose a point (in the world coordinate). This is the center of interest or the look-at point
(denoted by pE). Then, using simple vector algebra, we can see that nE = oE − pE, as depicted in
nE
Fig. 6.2. Finally, we normalize nE as n̂ = to get the unit basis vector.
|En|
Next, we specify an arbitrary point (denoted by pEup ) along the direction of our head while
looking through the camera. This is the view-up direction. With this point, we determine the
VEup
view-up vector VEup = pEup − oE (see Fig. 6.2). Then, we get the unit basis vector v̂ = .
|VEup |
We know that the unit basis vector û is perpendicular to the plane spanned by n̂ and
v̂ (see Fig. 6.2). Hence, û = v̂ × n̂ (i.e., the vector cross-product assuming a right-
handed coordinate system). Since both n̂ and v̂ are unit vectors, we do not need any further
normalization.

Example 6.1
Consider Fig. 6.3. We are looking at the square object with vertices A(2,1,0), B(2,3,0), C(2,3,3)
and D(2,1,3). The camera is located at the point (1,2,2) and the look-at point is the cen-
ter of the object (2,2,2). The up direction is parallel to the positive z direction. What is
the coordinate of the center of the object, after its transformation to the viewing coordinate
system?

Solution First, we determine the three unit basis vectors for the viewing coordinate system.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 112 — #4


i i

112 Computer Graphics

B(2, 3, 0)
Y

C(2, 3, 3)
(2, 2, 2)

(1, 2, 2) A(2, 1, 0)

X
D(2, 1, 3)
Z

Fig. 6.3

The camera position oE is (1,2,2) and the look-at point is (2,2,2). Therefore nE = oE − pE
= (−1, 0, 0) = n̂.
Since it is already mentioned that the up direction is parallel to the positive z direction, we can
directly determine that v̂ = (0, 0, 1) without any further calculations. Note that this is another way
of specifying the up vector (instead of specifying a point in the up direction and computing the
vector).
Finally, the cross product of the two vectors (i.e., v̂ × n̂) gives us û = (0, 1, 0).
After the three basis vectors are determined, we compute the transformation matrix Mw2v which
is a composition of translation and rotation (i.e., Tw2v = R.T).
Since the camera position is (1,2,2), we have
 
1 0 0 −1
1 0 0 −2
T =1 0 0 −2

000 1

From the unit basis vectors that we have already derived [i.e., n̂(−1, 0, 0), û(0, 1, 0), v̂(0, 0, 1)],
we have
 
0 100
 0 0 1 0
R= −1 0 0 0

0 001
Therefore,
    
0 1 0 0 1 0 0 −1 10 0 −2
 0 0 1 01
  0 0 −2   1 0
  0 −2 
Mw2v = R.T =  = 
 −1 0 0 01 0 0 −2   −1 0 0 1
0 0 0 1 0 0 0 1 00 0 1

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 113 — #5


i i

3D Viewing 113

Thus, the new coordinates of the object center (2,2,2) is,


    
1 0 0 −2 2 0
 1 0 0 −2   2   0 
C ′ = Mw2v C =   −1 0 0 1   2  =  −1 
   

0 00 1 1 1

In other words, the object center gets transformed to the point (0,0,−1) in the view coordinate
system.

Determination of the viewing coordinate system


We are given the camera parameters: 1. Determine n̂ (unit basis vector opposite to the
n
E
1. Camera position (origin of the coordinate) o
E looking direction): n
E=o E−p E ; n̂ =
|n |
E
2. View-up point pEup
2. Determine v̂ (unit basis vector along the view-
3. Center of interest or the look-at point p
E
VEup
up direction): VEup = pEup − o
E ; v̂ =
From these parameters, the three unit basis vec- |VEup |
tors are determined as follows. 3. Determine û (the third unit basis vector): û =
v̂ × n̂

6.1.2 Viewing Transformation


Once we set up the coordinate system, the next task is to transform the scene description
from the world coordinate to the view coordinate system. In order to understand the process,
let us consider Fig. 6.4. The point P is an arbitrary point in the world coordinate, which we
need to transform to the view coordinate. Assume that the view coordinate origin (the cam-
era position) has the coordinates (ovx , ovy , ovz ) and the three basis vectors are represented
as û(ux , uy , uz ), v̂(vx , vy , vz ), and n̂(nx , ny , nz ). We have to set up the transformation matrix

v
View plane

u
P

World coordinate
origin
n

Camera position

Fig. 6.4 Visualization of the transformation process. P is any arbitrary point, which has to
be transformed to the view coordinate system. This requires translation and rotation.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 114 — #6


i i

114 Computer Graphics

How can we generate different 3D viewing effects?


We have learnt about the process of tranform- but systematically change the basis vector n̂
ing a world coordinate description to the view (by choosing different look-at points).
coordinate. There are several quantities involved 2. We can keep n̂ fixed but change the camera
in this process. By manipulating one or more of position (hypothetically by moving it along n̂)
these quantities, we can generate different view- in order to generate panning effects seen in
ing effect. animations.
3. In order to view an object from different posi-
1. If we wish to generate a composite display con-
tions, we can move the camera position around
sisting of multiple views from a fixed camera
the object (note that n̂ also changes).
position, we can keep the camera position fixed

Mw2v , which, when multiplied to P, gives the transformed point P′ in the view coordinate
system (i.e., P′ = Mw2v .P).
In order to do so, we need to find out the sequence of transformations required to align
the two coordinate systems. We can achieve this by two operations: translation and rotation.
We first translate the view coordinate origin to the world coordinate origin. The necessary
translation matrix T is (in homogeneous form),
 
1 0 0 −ovx
1 0 0 −ovy 
T = 1 0 0 −ovz 

000 1

Then, we rotate the view coordinate frame to align it with the world coordinate frame.
The rotation matrix R is (in homogeneous form),
 
ux uy uz 0
vx vy vz 0
R= nx ny nz 0

0 0 0 1

This sequence is then applied in reverse order on P. Thus, we get Mw2v = R.T. Hence,
P′ = (R.T)P.

6.2 PROJECTION
When we see an image on a screen, it is two-dimensional (2D). The scene in the view coor-
dinate system, on the other hand, is three-dimensional (3D). Therefore, we need a way to
transform a 3D scene to a 2D image. The technique to do that is projection. In general, pro-
jection allows us to transform objects from n dimensions to (n − 1) dimensions. However,
we shall restrict our discussion on projections from 3D to 2D.
In computer graphics, we project the 3D objects onto the 2D view plane (see Fig. 6.2). In
order to do that, we define an area (usually rectangular) on the view plane that contains the
projected objects. This area is known as the clipping window. We also define a 3D volume

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 115 — #7


i i

3D Viewing 115

in the scene, known as the view volume. Objects that lie inside this volume are projected on
the clipping window. Other objects are discarded (through the clipping process that we shall
discuss in Chapter 7). Note that the entire scene is not projected; instead, only a portion of
it enclosed by the view volume is projected. This approach gives us flexibility to synthesize
images as we want. The trick lies in choosing an appropriate view volume, for which we
require an understanding of the different types of projections.

6.2.1 Types of Projections


The basic idea of projection is as follows: we want to project a 3D object on a 2D view
plane. From each point on the object, we generate straight lines towards the view plane.
These lines are known as projectors. These lines intersect the view plane. The intersection
points together give us the projected image.
Depending on the nature of the projectors, projections can be broadly classified into two
types: parallel and perspective. In parallel projection, the projectors are parallel to each
other. This is not the case in perspective projection, in which the projectors are not parallel
and converge to a center of projection. The idea is illustrated in Fig. 6.5. Note that the center
of projection is at infinity for parallel projection.

Projection
Object

Direction of projection

Projectors

View plane

(a)
Object

View plane

Projection

Point of projection

(b)

Fig. 6.5 Two types of projection (a) Parallel (b) Perspective

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 116 — #8


i i

116 Computer Graphics

View plane
View plane B
D′ D B View plane
Object after
B′ A′
projection

B′
A
A′ Center of
Center of C′ C A Vanishing points projection
projection
(a) (b) (c)

Fig. 6.6 The different types of anomalies associated with perspective projection. Foreshortening is
depicted in (a). Note the projected points (A′ ,B′ ) for object AB and (C′ ,D′ ) for object CD, although both
are of the same size. In (b), the concept of vanishing points is illustrated. View confusion is illustrated
in (c).
Since the projectors converge at a point, perspective projection gives rise to several
anomalies (i.e., the appearance of the object in terms of shape and size gets changed).

Perspective foreshortening If two objects of the same size are placed at different distances
from the view plane, the distant object appears smaller than the near objects (see Fig. 6.6(a)).

Vanishing points Lines that are not parallel to the view plane appear to meet at some point
on the view plane after projection. The point is called vanishing point (see Fig. 6.6(b)).

View confusion If the view plane is behind the center of projection, objects in front of the
center of projection appear upside down on the view plane after projection (see Fig. 6.6(c)).
As you can see, the anomalies actually help in generating realistic images since this is the
way we perceive objects in the real world. In contrast, the shape and size of objects are
preserved in parallel projection. Consequently, such projections are not used to generate

Projections

Parallel Perspective

One-point Three-point
Two-point

Orthographic Oblique

Cavalier Cabinet

Multi-view Axonometric

Isometric Dimetric Trimetric

Fig. 6.7 Taxonomy of different projection types

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 117 — #9


i i

3D Viewing 117

realistic scenarios (such as in computer games or animations). Instead, they are more useful
for graphics systems that deal with engineering drawings (such as CAD packages).
Although we have mentioned two broad categories of projection, there are many sub-
categories under parallel and perspective projection. The complete taxonomy of projections
is shown in Fig. 6.7.
When projectors are perpendicular to the view plane, the resulting projection is called
orthographic. There are broadly two types of orthographic projections. In multiview ortho-
graphic projection, principle object surfaces are parallel to the view plane. Three types of
principle surfaces are defined: top (resulting in top view), front (resulting in front view), and
side (resulting in side view). The three views are illustrated in Fig. 6.8.
In contrast, no principal surface (top, front, or side) is parallel to the view plane in
axonometric orthographic projection. Instead, they are at certain angles with the view plane
(see Fig. 6.9). Note that the principal faces can make three angles with the view plane as

Top view

The object
Side view

Front view

Fig. 6.8 Three types (top, side, and front view) of multiview orthographic projections

Angle q
Principle
Principle surface
surface

Angle a Angle b

View plane
Axonometric
view

Fig. 6.9 Axonometric orthographic projection where principal surfaces are at certain angles
with the view plane

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 118 — #10


i i

118 Computer Graphics

illustrated in Fig. 6.9. Depending on how many of these angles are equal to each other, three
types of axonometric projections are defined: isometric, when all the three angles are equal
to each other; dimetric, when two of the three angles are equal to each other; and trimetric,
when none of the angles is equal to the other.
When the projectors are not perpendicular to the view plane (but parallel to each other),
we get oblique parallel projection, as illustrated in Fig. 6.10. In oblique projection, if lines

View plane

Projectors

Object

Fig. 6.10 Oblique parallel projection where projectors are not perpendicular to the view
plane

VP

(a)

VP VP

(b)
VP

VP VP

(c)
VP - Vanshing point

Fig. 6.11 Three types of perspective projections, depending on the number of vanishing
points (a) One-point (b) Two-point (c) Three-point

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 119 — #11


i i

3D Viewing 119

perpendicular to the view plane are foreshortened by half after projection, it is known as
cabinet projection. When there is no change in the perpendicular lines after projection, it is
called cavalier projection.
Recall that in perspective projection, we have the idea of vanishing points. These are
basically perceived points of convergence of lines that are not parallel to the view plane.
Depending on the orientation of the object with respect to the view plane, we can have
between one to three vanishing points in the projected figure. Depending on the number of
vanishing points, we define perspective projections as one-point (see Fig. 6.11a), two-point
(see Fig. 6.11b) or three-point (see Fig. 6.11c).
In the next section, we shall outline the fundamental concepts involved in computing
projected points on the view plane. However, we shall restrict our discussion to the two
broad classes of projections. Details regarding the individual projections under these broad
classes will not be discussed, as such details are not necessary for our basic understanding
of projection. In all subsequent discussions, the term parallel projection will be used to refer
to parallel orthographic projections only.

6.2.2 Projection Transformation


Let us come back to our camera analogy. What we capture on film is 2D, after projection
of the objects on the film plane (view plane). The same idea is implemented in computer
graphics. After the objects are transformed to the view coordinate system, those are pro-
jected on the view plane. This is achieved through the projection transformations. Similar
to all the other transformations we have encountered so far (modeling and view transforma-
tions), projection transformations are also performed with matrix multiplication (between
the coordinate vectors and the projective transformation matrices). In this section, we shall
see how the matrices are derived.
Recall that projection requires us to define a view volume, which is the region in the 3D
view coordinate system that encloses all the objects to be projected. The shape of the view
volume depends on the type of projection we want. For parallel projection, the view volume
takes the shape of a rectangular parallelepipe as shown in Fig. 6.12(a). The shape is defined
by its six planes (near, far, top, bottom, right and left). The near plane is the view plane on
which the clipping window is present. A frustum is used to represent the view volume for
perspective projection, as shown in Fig. 6.12(b), defined by the six planes.
In terms of the view volumes, let us try to understand the derivation of the projection
matrices. We begin with parallel projection. Consider Fig. 6.13. The point P is in the view
volume with coordinate (x, y, z). It is projected as the point P′ (x′ , y′ , z′ ) on the clipping win-
dow. Assuming that the near plane is at a distance d along the −z direction, the coordinate
of P′ can be derived simply as x′ = x, y′ = y and z′ = −d.
Therefore, the transformation matrix for parallel projection is,

 
1 0 0 0
0 1 0 0 
Tpar =
0

0 0 −d 
0 0 0 1

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 120 — #12


i i

120 Computer Graphics

Left Top Left Top

Far Far
Y Y

Z Bottom Z Bottom
X X
Right Right
View coordinate Center of projection
origin Near Near

(Clipping window) (Clipping window)

(a) (b)

Fig. 6.12 The shape of the view volumes for the two basic projection types. The paral-
lelepipe in (a) is used for parallel projection and the frustum in (b) is used for perspective
projection.

P′(X′, Y′, Z′) P(X, Y, Z)

Z
X
d

Near
(Clipping window)

Fig. 6.13 Illustration for derivation of the transformation matrix for parallel projection

Hence, in terms of matrix multiplication, we can write,


 ′′     
x 1 0 0 0 x
 ′′ 
′′ y 
0 1 0 0  y
P  ′′  = Tpar P =    
z 0 0 0 −d  z
w 0 0 0 1 1

Note that the multiplication is performed in homogeneous coordinate system. Therefore,


x′′ y′′ z′′
the real coordinates of P′ are x′ = , y′ = , and z′ = .
w w w
Obviously, derivation of the transformation matrix for perspective projection involves a
little more calculation. Consider Fig. 6.14, which shows a side view seen along the −x
direction. We need to derive the transformation matrix that projects the point P (x, y, z) to
the P′ (x′ , y′ , z′ ).
The original and the projected points are part of two similar triangles. As a result, we
y −z d
can say ′ = or y′ = y . In a similar way, we can derive the projected x coordinate
y −d z

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 121 — #13


i i

3D Viewing 121

P′(X ′, Y ′, Z ′)
Y P(X, Y, Z)

Z
d Near
X
Center of projection (Clipping window)

Fig. 6.14 Illustration for the derivation of the transformation matrix for perspective projection

d
x′ as x′ = x . It is obvious that z′ = −d. We can use these expressions to construct the
z
transformation matrix in homogeneous coordinate form as,

1 0 0 0
 
0 1 0 0
Tpsp
 
= 0 0 −1 0
1
 
0 0 0
d

Finally, in terms of matrix multiplication, we can write,

1 0 0 0 x
 ′′   
x
y′′  0 1 0 0  y 
P′′ 
z′′  = Tpsp P = 
 
 0 0 −1 0  
 z
1

w 0 0 0 1
d

As in the case of parallel projection, we need to compute the coordinates of the projected
x′′ y′′ z′′
point as x′ = , y′ = , and z′ = , since the transformation matrix is in homogeneous
w w w
form.

Example 6.2
What would be the coordinates of the object center in Example 6.1 on a view plane z = −0.5, if
we want to synthesize images of the scene for parallel projection. Assume that the view volume is
sufficiently large to encompass the whole transformed object.

Solution The coordinates of the point in the view coordinate system is (0, 0, −1). We also know
that the transformation matrix for parallel projection
 
100 0
0 1 0 0 
Tpar =  
0 0 0 −0.5
000 1
.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 122 — #14


i i

122 Computer Graphics

Hence, the coordinates of the point after projection will be,


    
100 0 0 0
0 1 0 0  0   0 
Tpar P = 
 0 0 0 −0.5   −1  =  −0.5 
   

000 1 1 1

In other words, the point will be projected to (0,0,−0.5) on the view plane.

Example 6.3
What would happen to the point if we perform a perspective projection on the view plane z = −0.5
with the view coordinate origin to be the center of projection. Assume that the view volume is
sufficiently large to encompass the whole transformed object.
Solution We proceed as before. The transformation matrix for perspective projection

1 0 0 0
 
0 1 0 0
Tpsp
 
= 0 0 −1 0 .
1
 
0 0 0
0.5
The point before projection is at (0,0,−1). The new coordinate of the point after projection will be,

1 0 0 0  0   0 
 
0 1 0 0 0   0 
Tpsp P =  0
  =
0 −1 0   
  −1   1 
1

0 0 0 1 −2
0.5
The derived point is in homogeneous  form as before.
 However, we have −2 as the homogeneous
0 0 1
factor. Thus, the projected point is , , . In other words, the point will be projected to
−2 −2 −2
(0,0,−0.5) on the view plane.

6.2.3 Canonical View Volume and Depth Preservation


An important stage in the graphics pipeline is clipping, which we shall discuss in Chap-
ter 7. The objective of this stage is to remove all the objects that lie outside the view
volume. As we shall see, such removal often requires lots of calculation to determine
object surface-view volume boundary intersection points. If such intersection calculations
are to be performed with respect to any arbitrary view volume, computation time may
increase significantly. Instead, what we can do is to define the clipping procedures with

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 123 — #15


i i

3D Viewing 123

respect to a standard view volume. This standard volume is known as the canonical view
volume (CVV).
Figure 6.15 shows the CVV for parallel projection. Note that the CVV is a cube within
the range [−1,1] along the x, y, and z directions1 . As you can see, any arbitrary view vol-
ume can be transformed to the CVV using the scaling operations along the three axial
directions.
Canonical view volume for perspective projection is a little tricky. For ease of clipping
computations, perspective view frustums are also transformed to parallel CVV (i.e., the
cube within the range [−1,1] along the x, y, and z directions). Clearly, this transforma-
tion is not as straightforward as in the case of parallel projection and involves composition
of two types of modeling transformations: shearing and scaling. The idea is illustrated
in Fig. 6.16.
With the idea of CVV, let us now try to understand the sequence of transformations that
take place when a point P in the world coordinate is projected on the view plane. First, it
gets transformed to the view coordinate system. Next, the view volume in which the point
lies is transformed to CVV. Finally, the point in the CVV is projected. In matrix notation,
we can write this series of steps as: P′ = Tproj .Tcvv .TVC P, where Tproj is the projection
transformation matrix, Tcvv is the matrix for transformation to the canonical view volume
and TVC is the matrix for transformation to the view coordinate.

Xmax, Ymax, −far

1, 1, −1

−1, −1, 1

Xmin, Ymin, −near


CVV
Arbitrary view volume

Fig. 6.15 Canonical view volume for parallel projection

Y 1, 1, −1

Z
X −1, −1, 1

Arbitrary view frustum CVV

Fig. 6.16 The canonical view volume for perspective projection.Note that the frustum should
be sheared along x and y directions and scaled along z direction to obtain the CVV.

1 Another variation is also used for parallel projection in which the CVV is a unit cube that lies within the
range [0,1] along each of the three axial directions.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 124 — #16


i i

124 Computer Graphics

There is one more thing that we need to remember in projection. We mentioned


that in projection, 3D points are mapped to 2D. This is achieved by removing the z
(depth) component of the points. However, in the implementation of graphics pipeline,
the depth component is not removed. In other words, while computing the transformed
coordinates of a point, its original z (depth) value is preserved in a separate stor-
age known as the z(depth)-buffer. The depth information is required to perform a later
stage of the graphics pipeline, namely hidden surface removal, which we shall discuss
in Chapter 8.

6.3 WINDOW-TO-VIEWPORT TRANSFORMATION


So far, we have discussed the steps involved in transforming a point in the world coordinate
to the clipping window on the view plane. Note that the clipping window is the near plane of
the canonical view volume (see Fig. 6.15). For the ease of computations, it is assumed that
the window is at zero depth (i.e., z = 0). Moreover, since the x and y extents of the window
are fixed [between −1 to 1], the coordinates of the points in the window have a fixed range,
irrespective of their actual position in the world coordinate. For this reason, the clipping
window on the CVV is often called the normalized window.
Clipping window is an abstract and intermediate concept in the process of image syn-
thesis. The points on the clipping window constitute the objects that we want to show
on the screen. However, the scene may or may not occupy the whole screen. The idea
is illustrated in Fig. 6.17, in which the content of the clipping window is displayed
on a portion of the screen. Hence, we should distinguish between the following two
concepts.

Window This is the same as (normalized) clipping window. The world coordinate objects
that we want to display are projected on this window.

Viewport The objects projected on the window may be displayed on the whole screen or
a portion of it. The rectangular region on the screen, on which the content of the window is
rendered, is known as the viewport.

Objects in world coordinate

Device
space

Window Viewport

Fig. 6.17 The difference between window and viewport. Window contains the objects to be
displayed (left figure). Viewport is the region on the screen, where the window contents are
displayed (right figure).

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 125 — #17


i i

3D Viewing 125

Window Viewport
Wy2 Vy2

Wx, Wy Vx, Vy

Wy1 Vy1
Wx1 Wx2 Vx1 Vx2

Fig. 6.18 Window-to-viewport mapping. Note that the window coordinates are generalized
in the illustration instead of the normalized coordinates.

Note that the viewport is defined in the device space. In other words, it is defined with
respect to the screen origin and dimensions. So, one more transformation is required to
transfer points from the window (in the view coordinate system) to the viewport (in the
device coordinate system). Let us try to understand the derivation of the transformation
matrix.
Consider Fig. 6.18. The point (Wx, Wy) in the window is transformed to the viewport
point (Vx, Vy). The window lies within [Wx1, Wx2] along the X axis and [Wy1, Wy2]
along the Y axis. The viewport ranges between [Vx1, Vx2] and [Vy1, Vy2] along the X
and Y directions, respectively. In order to maintain the relative position of the point in the
viewport, we must have,

Wx − Wx1 Vx − Vx1
=
Wx2 − Wx1 Vx2 − Vx1
This relation can be rewritten as,

Vx = sx.Wx + tx

where,

Vx2 − Vx1
sx = and tx = sx.(−Wx1) + Vx1.
Wx2 − Wx1
Maintenance of relative position of the point in the viewport also implies that,

Wy − Wy1 Vy − Vy1
=
Wy2 − Wy1 Vy2 − Vy1

From this relation, we can derive the y-coordinate of the point in viewport Vy in a way
similar to that of Vx as,

Vy = sy.Wy + ty

where,

Vy2 − Vy1
sy = and ty = sy.(−Wy1) + Vy1.
Wy2 − Wy1

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 126 — #18


i i

126 Computer Graphics

From the expressions for Vx and Vy as derived here, we can form the viewport
transformation matrix Tvp as,
 
sx 0 tx
Tvp =  0 sy ty
0 0 1

Thus, to get the point on the viewport Pvp (x′ , y′ ) from the window point Pw (x, y), we need
to perform the following matrix multiplication.
 ′′    
x sx 0 tx x
y′′  = Tvp Pw =  0 sy ty y
w 0 0 1 1

x′′ ′ y′′
The coordinates of Pvp are computed as x′ = ,y = , since the matrices are in
w w
homogeneous form.

Example 6.4
Let us assume that the point is projected on a normalized clipping window (as you can see, the
projected point in either parallel or perspective projection is (0,0,−0.5), which lies at the cen-
ter of the normalized window). We want to show the scene on a viewport having lower left and
top right corners at (4,4) and (6,8) respectively. What would the position of the point be in the
viewport?

Solution Since the clipping window is normalized, we have Wx1 = −1, Wx2 = 1, Wy1 = −1
and Wy2 = 1. Also, from the viewport specification, we have Vx1 = 4, Vx2 = 6, Vy1 = 4 and
6−4 8−4
Vy2 = 8. Therefore, sx = = 1, sy = = 2, tx = 1. − (−1) + 4 = 5 and
1 − (−1) 1 − (−1)  
105
ty = 2. − (−1) + 4 = 6. Thus, the viewport transformation matrix is, Tvp = 0 2 6. The new
001
coordinate of the point after viewport transformation will be,

    
105 0 −2.5
 0 2 6   0  =  −3 
001 −0.5 −0.5
Since the derivedpoint is in homogeneous
 form with −0.5 as the homogeneous factor, the trans-
−2.5 −3
formed point is , . In other words, the viewport coordinates of the projected point
−0.5 −0.5
will be (5,6).

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 127 — #19


i i

3D Viewing 127

Steps performed in the 3D viewing stage


In the 3D viewing stage of the graphics pipeline, (normalized) clipping window (the near face of the
objects from the world coordinate are tranfomed canonical view volume).
to the viewport objects through a series of trans-
formations. These are, Viewport transformation From the clipping
window, the projected objects are transformed
View transformation The world coordinate
to the viewport defined in the device coordinate
objects are transformed to the view coordinate
system.
system. This step involves setting-up of the view
Thus, there are altogether four transforma-
coordinate system and transforming the objects
tions that take place in this stage. Along with
to it.
those, this stage also involves setting-up of the
Canonical view volume We define a view vol- view coordinate system and the view volume. Two
ume and transform it to the canonical view vol- other important components of this stage are clip-
ume. ping in which objects outside the view volume
are removed (before projection) and hidden sur-
Projection transformation The objects within face removal, both of which we shall discuss in
the canonical view volume are projected on the Chapters 7 and 8.

SUMMARY
In this chapter, we learnt about the process of transforming objects from world coordinate to
viewport, which is an important and essential part of the image synthesis process. The trans-
formation takes place in distinct stages, which are analogous to the process of capturing a
picture with your camera. The very first step is to set-up the view coordinate system. In this
stage, the three orthogonal unit basis vectors of the view coordinate system are determined
from three input parameters: the camera position or the view coordinate origin, the look-at point,
and the view-up point (or view-up vector). After the formation of the view coordinate system, we
transform the object to the view coordinate system.
The transformed objects are still in 3D. We transform them to 2D view plane through pro-
jection. We learnt about the two basic types of projections: parallel and perspective. In parallel
projection, the projectors are parallel to each other. The projectors converge to a center of pro-
jection in perspective projection.Before projection,we first define a view volume,a 3D region that
encloses the objects we want to be part of the image. For parallel projection, the view volume
takes the shape of a rectangular parallelepipe. It takes the shape of a frustum for perspective
projection. For computational efficiency in subsequent stages of the graphics pipeline, the view
volumes are transformed to canonical view volumes which is a cube with all its points lying within
the range [−1,−1,−1] to [1,1,1]. Transformation to canonical view volume requires scaling for
parallel view volume and a combination of shear and scale for perspective view volume.
The objects in the canonical volume is projected on its near or view plane, which acts as the
normalized clipping window. The window is in view coordinate system. A final transformation is
applied on the points in the window to transform it to the points in the viewport, which is defined
in the device coordinate system.
As we have seen,totally four transformations take place in this stage:world to view coordinate
transformation, view volume to canonical view volume transformation, projection transformation,
and window-to-viewport transformation. When we define a view volume, the objects that lie out-
side neeed to be removed. This is done in the clipping stage, that we shall discuss in Chapter 7.
Moreover, to generate realistic images, we need to perform hidden surface removal, which we
shall learn in Chapter 8.

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 128 — #20


i i

128 Computer Graphics

BIBLIOGRAPHIC NOTE
Please refer to the bibliographic note of Chapter 7 for further reading.

KEY TERMS
Axonometric projection – a type of parallel projection in which the principal object surfaces are
not parallel to the view plane
Canonical view volume – a standardized view volume
Center of interest/Look-at point – the point in the world coordinate frame with respect to which
we focus our camera while taking a photograph
Center of projection – the point where the projectors meet in perspective projection
Clipping window – a region (usually rectangular) of the view plane
Oblique projection – a type of parallel projection in which the projectors are not perpendicular to
the view plane
Orthographic projection – a type of parallel projection in which projectors are perpendicular to
the view plane.
Parallel projection – a type of projection in which the projectors are parallel to each other
Perspective foreshortening – an effect due to perspective projection in which the closer objects
appear larger
Perspective projection – a type of projection in which the projectors meet at a point
Projection – the process of mapping an object from an n-dimensional space to an n − 1
dimensional space
Projectors – lines that originate from the object points to be projected and intersect the view plane
Vanishing points – an effect due to perspective projection in which lines that are not parallel
appear to meet at a point on the view plane
View confusion – an effect due to perspective projection in which the objects appear upside down
after projection
View coordinate – the coordinate reference frame used to represent a scene with respect to
camera parameters
View plane – the plane on which a 3D object is projected
View volume – a 3D region in space (in the view coordinate system) that is projected on the view
plane
View-up vector – a vector towards the direction of our head while taking a photograph with a
camera
Viewing transformation – the process of mapping a world coordinate object description to the
view the coordinate frame
Viewport – a rectangular region on the display screen where the content of the window is rendered
Window – a term used to denote the clipping window on the view plane

EXERCISES
6.1 Discuss the similarities between taking a photograph with a camera and transforming a
world coordinate obejct to view plane. Is there any difference?
6.2 Mention the inputs we need to construct a view coordinate system.
6.3 Explain the process of setting-up of the view coordinate system.
6.4 How do we transform objects from world coordinate to view coordinate? Explain.
6.5 What are the broad categories of projections? Discuss their difference(s).

i i

i i
i i

“Chapter-6” — 2015/9/15 — 15:33 — page 129 — #21


i i

3D Viewing 129

6.6 Why are perpective projections preferable over parallel projections to generate realistic
effects? When do we need parallel projection?
6.7 Illustrate with diagrams the view volumes associated with each of the two broad projection
types.
6.8 Why do we need canonical view volumes? Discuss how we can transform arbitrary volumes
to canonical forms.
6.9 Derive the transformation matrices for parallel and perspective projections.
6.10 How is viewport different from window? Derive the window-to-viewport transformation
matrix.
6.11 Does the projection transformation truly transform objects from 3D to 2D? Discuss.
6.12 Consider a spherical object centered at (1,1,1) with a radius of 1 unit. A camera is located
at the point (1,1,4) and the look-at point is (1,1,2). The up direction is along the negative Y
direction. Answer the following.
(a) Determine the coordinates of the point Pv in the view coordinate system to which the
point P(1,1,2) is transformed.
(b) Assume a sufficiently large view volume that encloses the entire sphere after transfor-
mation. Its near plane is z = −1 and the clipping window is defined between [−10,−10]
to [10,10]. What would the position Pw of the point Pv on the clipping window be after
we perform (i) parallel and (ii) perspective projection? Ignore canonical transformations.
(c) Assume a view port defined between [2,3] (lower left corner) and [5,6] (top right corner)
in the device coordinate system. Determine the position of Pw on this viewport after
window-to-viewport transformation.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 130 — #1


i i

CHAPTER
Clipping
7
Learning Objectives
After going through this chapter, the students will be able to
• Understand the idea of clipping in two and three dimensions
• Learn about point clipping in two and three dimensions
• Learn the Cohen–Sutherland line clipping algorithm in two and three dimensions
• Understand the working of the parametric line clipping algorithm with the Liang–Barsky
algorithm
• Know about the fill area clipping issues
• Learn about the Sutherland–Hodgeman fill area clipping algorithm for two and three
dimensions
• Understand the steps of the Weiler–Atherton fill area clipping algorithm
• Learn the algorithm to convert a convex polygon into polygonal meshes,which is required
for three-dimensional fill area clipping

INTRODUCTION
In Chapter 6, we discussed the concept of view volume. As you may recall, before projec-
tion on the view plane, we define a 3D region (in the view coordinate system) that we call as
view volume. Objects within this region are projected on the view plane while objects that
lie outside the volume boundary are discarded. An example is shown in Fig. 7.1. The process
of discarding objects that lie outside the volume boundary is known as clipping. How does
the computer discard (or clip) objects? We employ some programs or algorithms for this
purpose, which are collectively known as the clipping algorithms. In this chapter, we shall
learn about these algorithms.
Two things are to be noted here. Recall the important concept we learnt in Chapter 6,
namely the canonical view volume (the cube). The algorithms we discuss in this chapter
shall assume that the clipping is done against canonical view volumes only. Moreover, for
the ease of understanding the algorithms, we shall first discuss the clipping algorithms
in 2D. Clipping in 3D is performed by extending the 2D algorithms, which we shall
discuss next.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 131 — #2


i i

Clipping 131

Object partially inside the volume boundary.


The portion that lies outside (above the
dotted line) has to be clipped
Object fully outside
the volume boundary

Z
X
Object fully inside
View coordinate the volume boundary
origin

Fig. 7.1 Concept of clipping. The objects that lie outside the boundary, either fully or partially,
are to be discarded or clipped. This is done with the clipping algorithms.

7.1 CLIPPING IN 2D
Unlike the view volume which is a 3D concept, we shall assume a view window, which is a
square-shaped region on the view plane, to discuss 2D clipping algorithms. This is equiva-
lent of assuming that the view volume and all the objects are already projected on the view
plane. The view volume is projected to form the window. Other objects are projected to form
points, lines, and fill-areas (e.g., an enclosed region such as a polygon). Thus, our objective
is to clip points, lines, and fill-areas with respect to the window.
The simplest is point clipping. Given a point with coordinate (x,y), we simply check if
the coordinate lies within the window boundary. In other words, if wxmin ≤ x ≤ wxmax
AND wymin ≤ y ≤ wymax , we keep the point; otherwise, we clip it out. (wxmin , wxmax ) and
(wymin , wymax ) are the minimum and maximum x and y coordinate values of the window,
respectively.
Line clipping is not so easy, however. We can represent any line segment with its end
points. For clipping, we can check the position of these end points to decide whether to clip
the line or not. If we follow this approach, either of the following three scenarios can occur,
which is illustrated in Fig. 7.2.
1. Both the end points are within the window boundary. In such cases, we don’t clip the line.
2. One end point is inside and the other point is outside. Such lines must be clipped.
3. Both the end points are outside. We cannot say for sure if the whole line is outside the
window or part of it lies inside (see Fig. 7.2). Thus, we have to check for line–boundary
intersections to decide if the line needs to be clipped or not.
As you can see in Fig. 7.2, when both the line end points are outside of the window, the
line may be fully or partially outside. We cannot determine this from just the position of
the end points. What we can do is to determine if there are intersection points of the line
and the window boundaries (see Appendix for calculation of intersection points between
two lines). Thus, given a line with both end points outside, we have to check for line–
window intersection for all the four window boundary line segments. Clearly, the process

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 132 — #3


i i

132 Computer Graphics

L2
Wymax

L1
L4

L3
Wymin

Wxmin Wxmax

Fig. 7.2 Three scenarios for line clipping. For the line L1 , both the line endpoints are inside
the window, so we don’t clip the line. For L2 , one end point is inside and the other one is
outside, so we clip it. In case of L3 and L4 , both the end points are completely outside the
window. However, L3 is partially inside and needs further checking to determine the portion
to be clipped.

is time-consuming. In a real-world application which may require thousands of line clipping


in rendering a scene, it is not practical and we need more efficient clipping algorithms.

7.1.1 Cohen–Sutherland Line Clipping Algorithm


The Cohen–Sutherland line clipping algorithm is one efficient way of performing line clip-
ping. In this algorithm, the world space (the window and its surrounding) is assumed to be
divided into nine regions. The regions are formed by extending the window boundaries, as
shown in Fig. 7.3. Each of these regions has a 4-bit unique code as identifier. Each bit in
the code indicates the position (above, below, right, or left, in that order from left to right)
of the region with respect to the window, as shown in Fig. 7.3. For example, a code of 1001
indicates that the corresponding region is situated above left of the window.

Above left Above Above right


1001 1000 1010

Left Window Right 0001 0000 0010

Below left Below Below right 0101 0100 0110

Above Below Right Left

The 4-bit code and significance of each bit

Fig. 7.3 The nine regions of the Cohen–Sutherland algorithm. The left figure on top shows
the regions while the right figure shows the region codes. The four bit code is explained in
the bottom figure. Note that the window gets a code 0000.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 133 — #4


i i

Clipping 133

Given the two end points of a line, the algorithm first assigns region codes to the
end points. Let an end point be denoted by P(x, y) and the window be specified by
(xmin , xmax , ymin , ymax ) (i.e., the x and y extents of its boundary). Then, we can determine
region code of P through the following simple check.

Bit 3 = sign(y − ymax )


Bit 2 = sign(ymin − y)
Bit 1 = sign(x − xmax )
Bit 0 = sign(xmin − x)

where sign(a) = 1 if a is positive, 0 otherwise.


Once the region codes are assigned to both the end points, the following checks are
performed and the corresponding action taken.

Algorithm 7.1 Cohen–Sutherland line clipping algorithm

1: Input: A line segment with end points PQ and the window parameters (xmin , xmax , ymin , ymax )
2: Output: Clipped line segment (NULL if the line is completely outside)
3: for each end point with coordinate (x,y), where sign(a) = 1 if a is positive, 0 otherwise do
4: Bit 3 = sign (y − ymax )
5: Bit 2 = sign (ymin − y)
6: Bit 1 = sign (x − xmax )
7: Bit 0 = sign (xmin − x)
8: end for
9: if both the end point region codes are 0000 then
10: RETURN PQ.
11: else if logical AND (i.e., bitwise AND) of the end point region codes 6= 0000 then
12: RETURN NULL
13: else
14: for each boundary bi where bi = above, below, right, left, do
15: Check corresponding bit values of the two end point region codes
16: if the bit values are same, then
17: Check next boundary
18: else
19: Determine bi -line intersection point using line equation
20: Assign region code to the intersection point
21: Discard the line from the end point outside bi to the intersection point (as it is outside the
window)
22: if the region codes of both the intersection point and the remaining end point are 0000 then
23: Reset PQ with the new end points
24: end if
25: end if
26: end for
27: RETURN modified PQ
28: end if

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 134 — #5


i i

134 Computer Graphics

1. If both the end point region codes are 0000, the line is completely inside the window.
Retain the line.
2. If logical AND (i.e., bitwise AND) of the end point region codes is not equal to 0000, the
line is completely outside the window. Discard the entire line.
However, when none of these above cases occur, the line is partially inside the window
and we need to clip it. For clipping, we need to calculate the line intersection point with
window boundaries. This is done by taking one end point and following some order for
checking, e.g., above, below, right, and left. For each boundary, we compare the correspond-
ing bit values of the two end point region codes. If they are not the same, the line intersects
that particular boundary. Using the line equation, we determine the intersection point and
assign the region code to the intersection point as before. In the process, we discard the line
segment outside the window. Next, we compare the two new end points to see if they are
completely inside the window. If not, we take the other end point and repeat the process. The
pseudo-code of the algorithm is shown in Algorithm 7.1.

Example 7.1
Consider the line segment AB in Fig. 7.4.

4
A(5,3)

2
B(6,2)

2 4

Fig. 7.4
From the figure, we see that xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, A(5,3) and
B(6,2). The first step is to determine the region codes of A and B (lines 3–8 of Algorithm 7.1).
Let’s consider A first. We can see that for A,

Bit 3 = sign (3 − 4) = sign (−1) = 0


Bit 2 = sign (2 − 3) = sign (−1) = 0
Bit 1 = sign (5 − 4) = sign (1) = 1
Bit 0 = sign (2 − 5) = sign (−3) = 0

since sign(a) = 0 if a ≤ 0. Thus, the region code of A is 0010. Similarly the region code of B
is derived as 0010. The next step (lines 9–27 of Algorithm 7.1) is the series of checks. The first
check fails as both the end points are not 0000. However, the second check succeeds since the
logical AND of AB is 0010 (i.e., 6= 0000). Hence, we do not need to go any further. The line is
totally outside the window boundary. We do not need to clip it and discard it as a whole.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 135 — #6


i i

Clipping 135

Example 7.2
Consider the line segment PQ in Fig. 7.5.

P(3,3)

Q(5,2)
2

2 4

Fig. 7.5

We have xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, P(3,3) and Q(5,2). We first determine
the region codes of P and Q (lines 3–8 of Algorithm 7.1). Let us consider P first. We can see that
for P,

Bit 3 = sign (3 − 4) = sign (−1) = 0


Bit 2 = sign (2 − 3) = sign (−1) = 0
Bit 1 = sign (3 − 4) = sign (−1) = 0
Bit 0 = sign (2 − 3) = sign (−1) = 0

Thus, the region code of P is 0000. Similarly the region code of Q is derived as 0010. The next step
(lines 9–27 of Algorithm 7.1) is the series of checks. The first check fails as both the end points
are not 0000. The second check also fails as the logical AND of PQ is 0000. Hence, we need to
determine line–boundary intersection.
From the end points, we can derive the line equation as: y = − 12 x + 29 (see Appendix for the
derivation of line equation from end points). Now, we have to check for the intersection of this
line with the boundaries following the order: above, below, right, left. The aforementioned bit val-
ues (bit 3) of P and Q are the same. Hence, the line does not cross above boundary (lines 16–19
of Algorithm 7.1). Similarly, we see that it does not cross the below boundary. However, for the
right boundary, the two corresponding bits (bit 1) are different. Hence, the line crosses the right
boundary.
The equation of the right boundary is x = 4. Putting this value in the line equation, we get
the intersection point as Q′ (4, 25 ). We discard the line segment Q′ Q since Q is outside the right
boundary (line 21 of Algorithm 7.1). Thus, the new line segment becomes PQ′ . We determine
the region code of Q′ as 0000. Since both P and Q′ have region code 0000, the algorithm resets
PQ by changing Q to Q′ (lines 22–23 of Algorithm 7.1). Finally, we check the left boundary.
Since there is no intersection (bit 0 is same for both end points), the algorithm returns PQ′
and stops.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 136 — #7


i i

136 Computer Graphics

Example 7.3
Consider the line segment MN in Fig. 7.6.

M(1, 3)

N(5, 2)
2

2 4

Fig. 7.6
Here we have xmin = 2, xmax = 4, ymin = 2, and ymax = 4 and the two end points M(1,3) and
N(5,2). We determine the region code for M first as,

Bit 3 = sign (3 − 4) = sign (−1) = 0


Bit 2 = sign (2 − 3) = sign (−1) = 0
Bit 1 = sign (1 − 4) = sign (−3) = 0
Bit 0 = sign (2 − 1) = sign (1) = 1

Thus, the region code of M is 0001. Similarly the region code of N is derived as 0010. The next
step (lines 9–27 of Algorithm 7.1) is the series of checks. The first check fails as both the end
points are not 0000. The second check also fails as the logical AND of MN is 0000. Hence, we
need to determine line–boundary intersection points.
From the end points, we can derive the line equation as: y = − 14 x + 13 4 (see Appendix for the
derivation). Next, we check for the intersection of this line with the boundaries following the order:
above, below, right, left. These bit values (bit 3) of M and N are the same. Hence, the line does not
cross above boundary (lines 16–19 of Algorithm 7.1). Similarly, we see that it does not cross the
below boundary (bit 2 is same for both). However, for the right boundary, the two corresponding
bits (bit 1) are different. Hence, the line crosses the right boundary.
The equation of the right boundary is x = 4. Putting this value in the line equation, we get
the intersection point as N ′ (4, 94 ). We discard the line segment N ′ N since N is outside the right
boundary (line 21 of Algorithm 7.1). Thus, the new line segment becomes MN ′ . We determine the
region code of N ′ as 0000.
We now have two new end points M and N ′ with the region codes 0001 and 0000, respectively.
The boundary check is now performed for the left boundary. Since the bit values are not the same,
we check for intersection of the line segment MN ′ with the left boundary. The equation of the left
boundary is x = 2. Putting this value in the line equation, we get the intersection point as M ′ (2, 11
4 ).
We discard the line segment MM ′ since M is outside the left boundary (line 21 of Algorithm 7.1).
Thus, the new line segment becomes M ′ N ′ . We determine the region code of M ′ as 0000.
Since both M ′ and N ′ have the region code 0000, the algorithm resets the line segment to M ′ N ′
(lines 22–23 of Algorithm 7.1). As no more boundary remains to be checked, the algorithm returns
M ′ N ′ and stops.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 137 — #8


i i

Clipping 137

7.1.2 Liang–Barsky Line Clipping Algorithm


The Cohen–Sutherland algorithm works well when the number of lines, which can be clipped
without further processing, is large compared to the size of the input set of lines. However, it
still has to perform some boundary–line intersection calculations. There are other faster line-
clipping methods developed to reduce the intersection calculation further, based on more
efficient tests. The algorithm proposed by Cyrus and Beck (see the bibliographic note for
reference) was among the earliest attempts in this direction, which is based on parametric
line equation. Later, a more efficient version was proposed by Liang and Barsky. The basic
idea of the Liang–Barsky algorithm is as follows.
Given a line segment with endpoints P(x1 , y1 ) and Q(x2 , y2 ), we can represent the line in
parametric form as,

x = x1 + u1x where 1x = x2 − x1
y = y1 + u1y where 1y = y2 − y1

where 0 ≤ u ≤ 1 is the parameter. Given the window parameters (xmin , xmax , ymin , ymax ),
the following relationships should hold for the line to be retained.

xmin ≤ x1 + u1x ≤ xmax


ymin ≤ y1 + u1y ≤ ymax

We can rewrite the Example 7.4 relations in a compact form as pk ≤ qk where k = 1, 2, 3, 4.


Thus,

p1 = −1x, q1 = x1 − xmin
p2 = 1x, q2 = xmax − x1
p3 = −1y, q3 = y1 − ymin
p4 = 1y, q4 = ymax − y1

where k = 1, 2, 3, 4 corresponds to the left, right, below, and above window boundaries, in
that order. If for any k for a given line, pk = 0 AND qk < 0, discard the line as it is com-
pletely outside the window. Otherwise, we calculate two parameters u1 and u2 , that define the
line segment within the window. In order to calculate u1 , we first calculate the ratio rk = pqkk
for all those edges for which pk < 0. Then, we set u1 = max{0, rk }. Similarly for u2 , we
calculate the ratio rk = qpkk for all those edges for which pk > 0 and then set u2 = min{1, rk }.
If u1 > u2 , the line is completely outside, so we discard it. Otherwise, the end points of the
clipped lines are calculated as follows.
1. If u1 = 0, there is one intersection point which is calculated as x2 = x1 + u2 1x, y2 =
y1 + u2 1y (note that the other end point remains the same).
2. Otherwise, there are two intersection points (i.e., both the end points need to be changed).
The two new end points are calculated as x′1 = x1 + u1 1x, y′1 = y1 + u1 1y and
x2 = x1 + u2 1x, y2 = y1 + u2 1y.
The pseudocode of the algorithm is shown in Algoritm 7.2.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 138 — #9


i i

138 Computer Graphics

Algorithm 7.2 Liang–Barsky line clipping algorithm

1: Input: A line segment with end points P(x1 , y1 ) and Q(x2 , y2 ), the window parameters
(xmin , xmax , ymin , ymax ). A window boundary is denoted by k where k can take the values 1, 2, 3,
or 4 corresponding to the left, right, below, and above boundary, respectively.
2: Output: Clipped line segment
3: Calculate 1x = x2 − x1 and 1y = y2 − y1
4: Calculate p1 = −1x, q1 = x1 − xmin
5: Calculate p2 = 1x, q2 = xmax − x1
6: Calculate p3 = −1y, q3 = y1 − ymin
7: Calculate p4 = 1y q4 = ymax − y1
8: if pk = 0 and qk < 0 for any k = 1, 2, 3, 4 then
9: Discard the line as it is completely outside the window
10: else
11: Compute rk = pqkk for all those boundaries k for which pk < 0. Determine parameter u1 =
max{0, rk }.
12: Compute rk = qpkk for all those boundaries k for which pk > 0. Determine parameter u2 =
min{1, rk }.
13: if u1 > u2 then
14: Eliminate the line as it is completely outside the window
15: else if u1 = 0 then
16: There is one intersection point, calculated as x2 = x1 + u2 1x, y2 = y1 + u2 1y
17: Return the two end points (x1 , y1 ) and (x2 , y2 )
18: else
19: There are two intersection points, calculated as: x′1 = x1 + u1 1x, y′1 = y1 + u1 1y and
x2 = x1 + u2 1x, y2 = y1 + u2 1y
20: Return the two end points (x′1 , y′1 ) and (x2 , y2 )
21: end if
22: end if

Example 7.4
We will show here the running of the algorithm for one of the three examples (Example 7.2) used
to illustrate the Cohen–Sutherland algorithm. The working of the Liang–Barsky algorithm for the
other two examples can be understood in a likewise manner and left as an exercise for the reader.
Now let us reconsider the line segment PQ in Example 7.2. The figure is reproduced here for
convenience.

P(3,3)

Q(5,2)
2

2 4

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 139 — #10


i i

Clipping 139

From the figure, we see that xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, P(3,3) and Q(5,2).
We first calculate 1x = 2, 1y = −1, p1 = −2, q1 = 1, p2 = 2, q2 = 1, p3 = 1, q3 = 1, p4 = −1,
and q4 = 1 (lines 3–7 of Algorithm 7.2). Since the condition pk = 0 and qk < 0 for any k is not
true, the first condition (lines 8–9 of Algorithm 7.2) fails. Hence, we calculate u1 and u2 .
Note that p1 , p4 < 0. Hence we calculate r1 = − 12 and r4 = −1. Thus, u1 = max{0, − 21 ,
−1} = 0 (line 11 of Algorithm 7.2). Also, p2 , p3 > 0. Hence we calculate r2 = 21 and r3 = 1.
Thus, u2 = min{1, 12 , 1} = 21 (line 12 of Algorithm 7.2). Since u1 < u2 , the line is not eliminated
(lines 13–14 of Algorithm 7.2). However, u1 = 0. Thus, the next condition satisfies (line 15 of
Algorithm 7.2). So, we calculate the single intersection point: x2 = 4, y2 = 25 . Thus, the end
points of the clipped line segment returned are (3,3) and (4, 52 ) (lines 16–17 of Algorithm 7.2).

7.1.3 Fill-area Clipping: Sutherland–Hodgeman Algorithm


As we saw, the previous algorithms are used for clipping lines. In many situations, we have
to clip polygons with respect to the window. Although we can use the line clippers to the
individual edges of the polygon, the approach is not necessarily efficient and better; there
are more efficient algorithms. In the following discussion, we look-at one such algorithm,
namely the Sutherland–Hodgeman polygon clipping algorithm.
The basic idea of the algorithm is as follows: we start with four clippers or the lines that
define the window boundaries. Each clipper takes as input a list of ordered pair of vertices
(i.e., edges) and produces another list as output. We impose an order to the clipper for check-
ing. Let it be left clipper, followed by right clipper, followed by bottom clipper, followed by
top clipper. The original polygon vertices are given as input to the first (i.e., left) clipper. To
create the vertex list, a naming convention of the vertices is followed (either clockwise or
anti-clockwise). Let us assume anti-clockwise naming of the vertices. For each clipper, the
output vertex list is generated in the following way.
Let the input vertex list to a clipper be denoted by the set V = {v1 , v2 , · · · , vn } where vi
denotes the the ith vertex. Then, for each edge in the list (i.e., successive vertex pair) (vi , vj ),
we do the following:
1. If vi is inside and vj outside of the clipper, return the intersection point of the clipper
with the edge (vi , vj ).
2. If both the vertices are inside the clipper, return vj .
3. If vi is outside and vj inside of the clipper, return the intersection point of the clipper
with the edge (vi , vj ) and vj .
4. If both the vertices are outside the clipper, return NULL.
The terms inside and outside are to be interpreted differently for different clipper. For the
left clipper, if a vertex is on its right side, then the vertex is inside; otherwise, it is outside.
For the right clipper, a vertex is inside if it is on the left side; otherwise it is outside. For the
top clipper, a vertex below means the vertex is inside, else the vertex is outside. Similarly,
for the bottom clipper, an inside vertex implies it is above the clipper, otherwise it is outside.
In all the cases, we assume that a vertex is inside if it is on the clipper. The pseudocode of
the Sutherland–Hodgeman algorithm is shown in Algorithm 7.3.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 140 — #11


i i

140 Computer Graphics

Algorithm 7.3 Sutherland–Hodgeman fill-area clipping algorithm

1: Input: Four clippers: cl = xmin , cr = xmax , ct = ymax , cb = ymin corresponding to the left, right,
top, and bottom window boundaries, respectively. The polygon is specified in terms of its vertex list
Vin = {v1 , v2 , · · · , vn }, where the vertices are named anti-clockwise.
2: for each clipper in the order cl , cr , ct , cb do
3: Set output vertex list Vout = NULL, i = 1, j = 2
4: repeat
5: Consider the vertex pair vi and vj in Vin
6: if vi is inside and vj outside of the clipper then
7: ADD the intersection point of the clipper with the edge (vi , vj ) to Vout
8: else if both the vertices are inside the clipper then
9: ADD vj to Vout
10: else if vi is outside and vj inside of the clipper then
11: ADD the intersection point of the clipper with the edge (vi , vj ) and vj to Vout
12: else
13: ADD NULL to Vout
14: end if
15: until all edges (i.e., consecutive vertex pairs) in Vin are checked
16: Set Vin = Vout
17: end for
18: Return Vout

Example 7.5
Consider the polygon with vertices {1,2,3} (named anti-clockwise) shown in Fig. 7.7. We wish
to determine the clipped polygon (i.e., the polygon with vertices {2′ ,3′ ,3′′ ,1′ ,2}) following the
Sutherland–Hodgeman algorithm.

1
1′
3″ 3′
2
2′ 3

Fig. 7.7
We check the vertex list against each clipper in the order left, right, top, bottom (the outer for
loop, line 2 of Algorithm 7.3). For the left clipper, the input vertex list Vin = {1, 2, 3}. The pair
of vertices to be checked for the left clipper are {1,2}, {2,3}, and {3,1} (the inner loop, line 4 of
Algorithm 7.3). For each of these pairs, we perform the checks (lines 6–13 of Algorithm 7.3) to
determine Vout for the left clipper. We start with {1,2}. Since both the vertices are on the right side
of the left clipper (i.e., both are inside), we set Vout = {2}. Similarly, after checking {2,3}, we set
Vout = {2, 3} and after checking {3,1}, the final output list becomes Vout = {2, 3, 1}.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 141 — #12


i i

Clipping 141

In the next iteration of the outer loop (check against right clipper), we set Vin = Vout = {1, 2, 3}
and Vout = NULL. Thus the three pair of vertices to be checked are {1,2}, {2,3}, and {3,1}.
In {1,2}, both the vertices are inside (i.e., they are on the left side of the right clipper); hence
Vout = {2}. For the next pair {2,3}, we notice that vertex 2 is inside while vertex 3 is out-
side. Thus, we compute the intersection point 2′ of the right clipper with the edge {2,3} and set
Vout = {2, 2′ }. For the remaining pair {3,1}, vertex 3 is outside (on the right side) and vertex 1
inside (on the left side). Thus, we calculate the intersection point 3′ of the edge with the clipper
and set Vout = {2, 2′ , 3′ , 1}. The inner loop stops as all the edges are checked.
Next, we consider the top clipper. We set Vin = Vout = {2, 2′ , 3′ , 1} and Vout = NULL. The pair
of vertices to be checked are {2,2′ }, {2′ ,3′ }, {3′ ,1}, and {1,2}. Since both the vertices of {2,2′ } are
inside (i.e., below the clipper), Vout = {2′ }. Similarly, after checking {2′ ,3′ }, we set Vout = {2′ , 3′ }
as both are inside. In the pair {3′ ,1}, the vertex 3′ is inside whereas the vertex 1 is outside (i.e.,
above the clipper). Hence, we calculate the intersection point 3′′ between the clipper and the edge
and set Vout = {2′ , 3′ , 3′′ }. In the final edge {1,2}, the first vertex is outside while the second ver-
tex is inside. Thus, we calculate the intersection point 1′ between the edge and the clipper and set
Vout = {2′ , 3′ , 3′′ , 1′ , 2}. After this, the inner loop stops.
Finally, we check against the bottom clipper. Before the checking starts, we set Vin = Vout =
{2′ , 3′ , 3′′ , 1′ , 2} and Vout = NULL. As all the vertices are inside (i.e., above the clipper), after the
inner loop completes, the output list becomes Vout = {2′ , 3′ , 3′′ , 1′ , 2} (i.e., same as the input list,
check for yourself).
Thus, after the input polygon is checked against all the four clippers, the algorithm returns the
vertex list {2′ ,3′ ,3′′ ,1′ ,2} as the clipped polygon.

7.1.4 Fill-area Clipping: Weiler–Atherton Algorithm


The Sutherland–Hodgeman algorithm works well when the fill-area is a convex polygon to
be clipped against a rectangular clipping window. The Weiler–Atherton algorithm provides
a more general fill-area clipping procedure. It can be used for any type of polygon fill-area
(concave or convex) against any polygonal clipping window.
In the Sutherland–Hodgeman algorithm, we processed the edges of the fill-area following
only a particular order and performed clipping. However, the processing is done differently
in the Weiler–Atherton algorithm. Here, we start with processing the fill-area edges in a
particular order (typically anti-clockwise). We continue along the edges till we encounter an
edge that crosses to the outside of the clip window boundary. At the intersection point, we
make a detour: we now follow the edges of the clip window (along the same direction main-
taining the traversal order). We continue our traversal along the edges of the clip boundary
till we encounter another fill-area edge that crosses to the inside of the clip window. At this
point, we resume our polygon edge traversal again along the same direction. The process
continues till we encounter a previously processed intersection point. So, the two rules of
traversal followed in the Weiler–Atherton algorithm are the following:
1. From an intersection point due to an outside-to-inside fill-area edge (with respect to a clip
boundary), follow the fill-area polygon edges.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 142 — #13


i i

142 Computer Graphics

2. From an intersection point due to an inside-to-outside fill-area edge (with respect to a clip
boundary), follow the window boundaries.
In both these cases, the traversal direction remains the same. At the end of the process-
ing, when we encounter a previously processed intersection point, we output the vertex
list representing a clipped area. However, if the whole fill-area polygon is not fully cov-
ered at this point, we resume our traversal along the polygon edges in the same direction
from the last intersection point of an inside-outside polygon edge. The pseudocode of the
Weiler–Atherton algorithm is shown in Algorithm 7.4.

Algorithm 7.4 Weiler–Atherton fill-area clipping algorithm

1: Start from a vertex inside the window.


2: Process the edges of the polygon fill-area in any particular order (clockwise or anti-clockwise). Con-
tinue the processing till an edge of the fill-area is found that crosses a window boundary from inside
to outside. The intersection point of the edge with the window boundary is the exit-intersection point.
Record the intersection point.
3: From the exit-intersection point, process the window boundaries in the same direction (clockwise or
anti-clockwise). Continue processing till another intersection point (of a fill-area edge with a window
boundary) is found.
4: if the intersection point is a new point not yet processed then
5: Record the intersection point
6: Continue processing the fill-area edges till a previously processed vertex is encountered
7: end if
8: Form the output vertex list Vout for this section of the clipped fill-area
9: if all the polygon fill-area edges have been processed then
10: Output Vout
11: else
12: Return to the exit-intersection point
13: Continue processing the fill-are a edges in the same order (clockwise or anti-clockwise) till another
intersection point (of a fill-area edge with a window boundary) is found
14: Go to the line 4
15: end if

Example 7.6
Let us consider the fill-area shown in Fig. 7.8. The vertices of the polygon are named anti-
clockwise. Note that this is a concave polygon, which is to be clipped against the rectangular
window. Let us try to perform clipping following the steps of the Algorithm 7.4. In the figure, the
traversal of the fill-area edges and window boundaries is shown with arrows.
We first start with the edge {1,2}. Both the vertices are inside the window. Since there is no inter-
section, we add these vertices to the output vertex list Vout and continue processing the fill-area
edges anti-clockwise.
Next we process the edge {2,3}. As we can observe, the edge goes from inside of the
clip window to outside. We record the intersection point 2′ and add it to Vout . 2′ is the

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 143 — #14


i i

Clipping 143

2
2′
1
1′
6
3
6′
5
3′
4

Fig. 7.8
exit-intersection point. At this point, we make a detour and proceed along the window boundary
in the anti-clockwise direction.
While traversing along the boundary, we encounter the intersection point 1′ of the fill-area edge
{6,1} with the boundary. This is a new intersection point not yet processed. We add this to Vout .
Then, we start processing the fill-area edges again.
The fill-area edge processing takes us to the vertex 1. We have already processed this vertex.
Thus, we have completed determining one clipped segment of the fill area, represented by the ver-
tex list Vout = {1, 2, 2′ , 1′ , 1}. We output this list. Since some edges of the fill-area are not yet
processed, we return to the exit-intersection point 2′ and continue processing the fill-area edges.
We first check the edge segment {2′ ,3}. Since the vertex 3 is on the left side of the left window
boundary (i.e., outside the clip window), we continue processing the fill-area anti-clockwise.
The next edge processed is {3,4}. Note that the edge intersects the window boundary at 3′ .
Since the vertex 4 is inside the window and 3′ is a new intersection point, we add the intersection
point 3′ and the vertex 4 to the output vertex list Vout . We continue processing the fill area edges.
Both the vertices of the next edge {4,5} are inside the clipping window. So, we simply add them
to Vout and continue processing the fill-area edges.
The next edge {5,6} intersects the window boundary at the intersection point 6′ . This is an
exit-intersection point. We record the intersection point and add it to the output vertex list Vout .
From 6′ , we start processing the window boundaries again (anti-clockwise). During this pro-
cessing, we encounter the intersection point 3′ . This is already processed before. So, we output
Vout = {3′ , 4, 5, 6′ , 3′ } that represents the other clipped region of the fill area. Since now all the
edges of the fill-area are processed, the algorithm stops at this stage.

7.2 3D CLIPPING
So far, we have discussed clipping algorithms for 2D. The same algorithms, with a few mod-
ifications, are used to perform clipping in 3D. A point to be noted here is that clipping is
performed against the normalized view volume (usually the symmetric cube with each coor-
dinates in the range [−1,1] in the three directions). In the following, we will discuss only the
extension of the 2D algorithms without further elaborations, as the core algorithms remain
the same.
Point clipping in 3D is done in a way similar to 2D. Given a point with coordinate (x,y,z),
we simply check if the coordinate lies within the view volume. In other words, if −1 ≤ x ≤ 1
AND −1 ≤ y ≤ 1 AND −1 ≤ z ≤ 1, we keep the point; otherwise, we clip it out.

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 144 — #15


i i

144 Computer Graphics

101001 101000 101010


Far
100001 100000 100010
Top
100101 100100 100110
Near
Region codes behind far
plane
Bottom
001001 001000 001010
Left Right
000001 000000 000010
011001 011000 011010
000101 000100 000110
010001 010000 010010
Region codes between near
010101 010100 010110 and far planes

Region codes in front of


near plane

Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1


Far Near Top Bottom Right Left

The 6-bit code and significance of each bit

Fig. 7.9 The 27 regions of the Cohen–Sutherland algorithm for 3D. The 6-bit code is
explained in the bottom figure. Note that only the view volume interior gets a code 0000.

7.2.1 3D Line Clipping


The Cohen–Sutherland can be extended for 3D line clipping. The core idea of the 3D
Cohen–Sutherland algorithm remains the same: we divide the view coordinate space into
regions. However, there are the following major differences between the two versions of the
algorithm.
1. Unlike in 2D, we have 27 regions to consider.
2. Regions are encoded with 6 bits corresponding to the 6 planes (far, near, top, bottom,
right, left) of the cube, unlike the 4 bits in 2D. The 6 bits are (from left to right):
The 27 regions and the 6-bit encoding scheme are depicted in Fig. 7.9. It is left as an
exercise for the reader to modify Algorithm 7.1 taking into account these considerations.

7.2.2 3D Fill-area Clipping


In 3D fill-area (polyhedron) clipping, we first check if the bounding volume of the poly-
hedron is outside the view volume (by comparing their maximum and minimum coordi-
nates in each of the x, y, z directions). Otherwise, we can apply the 3D extension of the
Sutherland–Hodgeman algorithm to perform the clipping.
The core idea of the 3D Sutherland–Hodgeman algorithm remains the same. The main
differences are outlined below.
1. A polyhedron is made up of polygonal surfaces. We take one surface at a time to per-
form clipping. Usually, the polygon is further divided into triangular meshes. Then, each

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 145 — #16


i i

Clipping 145

triangle in the mesh is processed at a time. Thus, there are two more outer loops in
Algorithm 7.3. The outermost loop is for checking one surface at a time. Inside this
loop, the next level loop processes each triangle in the mesh (of that surface). Then the
two loops of Algorithm 7.3 are executed in sequence.
2. Instead of four clippers, we now have six clippers corresponding to the six bounding sur-
faces of the normalized view volume. Hence, the for loop in Algorithm 7.3 (lines 2–17)
is executed six times.
Algorithm 7.5 shows a quick and easy way of creating a triangle mesh from a convex
polygon.

Algorithm 7.5 Algorithm to create triangle mesh from a convex polygon

1: Input: Set of vertices V = {v1 , v2 , · · · , vn }, the triangle set VT = NULL


2: Output: Set of triangles VT
3: repeat
4: Take first three vertices from V to form the vertex set vt representing a triangle
5: Add vt to VT
6: Reset V by removing from it the middle vertex of vt
7: until V contains only three vertices
8: Add V to VT and Return VT

Let us consider an example to understand the idea of Algorithm 7.5. Suppose we want to
create a triangle mesh from the polygon shown in Fig. 7.10.
4
5

2
1

Fig. 7.10 Polygon for creating triangular mesh

The input vertex list is V = {1, 2, 3, 4, 5} (we followed an anti-clockwise vertex naming con-
vension). In the first iteration of the loop, we create vt = {1, 2, 3} and reset V = {1, 3, 4, 5}
after removing vertex 3 (the middle vertex of vt ). Then we set VT = {{1, 2, 3}}. In the next
iteration, V = {1, 3, 4, 5}. We create vt = {1, 3, 4} and reset V = {1, 4, 5}. Also, we set
VT = {{1, 2, 3}, {1, 3, 4}}. Since V now contains three vertices, the iteration stops. We set
VT = {{1, 2, 3}, {1, 3, 4}, {1, 4, 5}} and return VT as the set of three triangles.
Note that Algorithm 7.5 works when the input polygon is convex. In case of concave poly-
gons, we first split it into a set of convex polygons and then apply Algorithm 7.5 on each
member of the set. There are many efficient methods for splitting a concave polygon into a
set of convex polygons such as the vector method, the rotation method and so on. However,

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 146 — #17


i i

146 Computer Graphics

we shall not go into the details of these methods any further, as they are not necessary to
understand the basic idea of 3D clipping.

SUMMARY
In this chapter, we learnt the basic idea behind the clipping stage of the graphics pipeline. For
ease of understanding, we started with the 2D clipping process. We covered three basic clipping
algorithms for line- and fill-area (polygon) clipping, namely the Cohen–Sutherland line clipping
algorithm, the Liang–Barsky line clipping algorithm, and the Sutherland–Hodgeman polygon
clipping algorithm.
The core idea of the Cohen–Sutherland algorithm is the division of world space into regions,
with each region having its own and unique region code. Based on a comparison of region
codes of the end points of a line, we decide if the line needs to be clipped or not. On the
other hand, the Liang–Barsky algorithm makes use of a parametric line equation to perform
clipping. Clipping is done based on the line parameters determined from the end points of the
line. The algorithm reduces line-window boundary intersection calculation to a great extent. In
the Sutherland–Hodgeman algorithm, polygons are clipped against window boundaries, on the
basis of the inside-outside test.
The same 2D algorithms are applicable in 3D with some minor modifications. The first thing
to note is that the clipping algorithms are designed keeping the normalized view volume in mind.
There are 27 regions and a 6-bit region code to be considered for using the Cohen–Sutherland
algorithm in 3D. In order to use the Sutherland–Hodgeman algorithm, we need to consider six
clippers as against four in 2D.
In clipping, we discard the portion of the object that lies outside the window/view vol-
ume. However, depending on the position of the viewer, some portion of an inside object
also sometimes needs to be discarded. When we want to discard parts of objects that are
inside window/view volume, we apply another set of algorithms, which are known as hid-
den surface removal (or visible surface detection) methods. Those algorithms are discussed
in Chapter 8.

BIBLIOGRAPHIC NOTE
Two-dimensional line clipping algorithms are discussed in Sproull and Sutherland [1968],
Cyrus and Beck [11978], Liang and Barsky [1984], and Nicholl et al. [1987]. In Sutherland
and Hodgman [1974] and Liang and Barsky [1983], basic polygon-clipping methods are pre-
sented. Weiler and Atherton [1977] and Weiler [1980] contain discussions on clipping arbitrarily
shaped polygons with respect to arbitrarily shaped polygonal clipping windows. Weiler and
Atherton [1977], Weiler [1980], Cyrus and Beck [1978], and Liang and Barsky [1984] also
describe 3D viewing and clipping algorithms. Blinn and Newell [1978] presents homogeneous-
coordinate clipping. The Graphics Gems book series (Glassner [1990], Arvo [1991], Kirk
[1992], Heckbert [1994], and Paeth [1995]) contain various programming techniques for
3D viewing.

KEY TERMS
Clipping – the process of eliminating objects fully or partially that lie outside a predefined region
Clipping window – the predefined region with respect to which clipping is performed
Cohen–Sutherland line clipping – an algorithm used to perform line clipping

i i

i i
i i

“Chapter-7” — 2015/9/15 — 9:37 — page 147 — #18


i i

Clipping 147

Fill area clipping – clipping procedure for fill area


Fill-area – an enclosed region in space, usually of polygonal shape
Liang–Barsky line clipping – a parametric algorithm used to perform line clipping
Region codes – a coding scheme used to identify spatial regions in the Cohen–Sutherland
algorithm
Sutherland–Hodgeman algorithm – a fill area clipping procedure
Triangular mesh – representation of a polygon in terms of a set of interconnected triangles
Weiler–Atherton algorithm – a fill area clipping procedure

EXERCISES
7.1 Briefly explain the basic idea of clipping in the context of 3D graphics pipeline. In which
coordinate system does this stage work?
7.2 Write an algorithm to clip lines against window boundaries using brute force method (i.e.,
intuitively what you do). What are the maximum and minimum number of operations (both
integer and floating point) required? Give one suitable example for each case.
7.3 Consider the clipping window with vertices A(2,1), B(4,1), C(4,3), and D(2,3). Use the
Cohen–Sutherland algorithm to clip the line A(−4,−5) B(5,4) against this window (show
all intermediate steps).
7.4 Determine the maximum and minimum number of operations (both integer and floating
point) required to clip lines using the Cohen–Sutherland clipping algorithms. Give one
suitable example for each case.
7.5 Consider the clipping window and the line segment in Exercise 7.2. Use the Liang–Barsky
algorithm to clip the line (show all intermediate steps).
7.6 Answer Exercise 7.4 using the Liang–Barsky algorithm.
7.7 In light of your answer to Exercised 7.2, 7.4, and 7.6, which is the best method (among
brute force, Cohen–Sutherland, and Liang–Barsky)?
7.8 Discuss the role of the clippers in the Sutherland–Hodgman algorithm.
7.9 Write a procedure (in pseudocode) to perform inside-outside test with respect to a clipper.
Modify Algorithm 7.3 by invoking the procedure as a sub-routine.
7.10 Consider a clipping window with corner points (1,1), (5,1), (5,5), and (1,5). A square with
vertices (3,3), (7,3), (7,7), and (3,7) needs to be clipped against the window. Apply Algorithm
7.3 to perform the clipping (show all intermediate stages).
7.11 Modify Algorithm 7.1 for 3D clipping of a line with respect to the symmetric normalized view
volume.
7.12 Modify Algorithm 7.3 for 3D clipping of a polyhedron (with convex polygonal surfaces) with
respect to the symmetric normalized view volume.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 148 — #1


i i

CHAPTER
Hidden
8 Surface
Removal
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the concept of hidden surface removal in computer graphics
• Understand the two broad categories of hidden surface removal techniques—object
space method and image space method
• Get an idea about the object space method known as back face elimination
• Learn about two well-known image space methods—Z-buffer algorithm and A-buffer
algorithm
• Learn about the Painter’s algorithm, a popular hidden surface removal algorithm con-
taining elements of both the object space and image space techniques
• Get an overview of the Warnock’s algorithm, which belongs to a group of techniques
known as the area subdivision methods
• Learn about the octree method for hidden surface removal,which is another object space
method based on the octree representation

INTRODUCTION
In Chapter 7, we learnt to remove objects that are fully or partially outside the view vol-
ume. To recap, this is done using the clipping algorithms. However, sometimes, we need
to remove, either fully or partially, objects that are inside the view volume. An example is
shown in Fig. 8.1. In the figure, object B is partially blocked from the viewer by object A. For
realistic image generation, the blocked portion of B should be eliminated before the scene
is rendered. As we know, clipping algorithms cannot be used for this purpose. Instead, we
make use of another set of algorithms to do it. These algorithms are collectively known as the
hidden surface removal methods (often also called the visible surface detection methods). In
this chapter, we shall learn about these methods.
In all the methods we discuss, we shall assume a right-handed coordinate system with
the viewer looking at the scene along the negative Z direction. One important thing should
be noted: when we talk about hidden surface, we assume a specific viewing direction. This
is so since a surface hidden from a particular viewing position may not be so from another
position. Moreover, for simplicity, we shall assume only objects with polygonal surfaces.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 149 — #2


i i

Hidden Surface Removal 149

This surface of A
Object A is blocked by B

Z
X
Object B

View coordinate
origin

Fig. 8.1 The figure illustrates the idea of hidden surface. One surface of object A is blocked
by object B and the back surfaces of A are hidden from view. So, during rendering, these
surfaces should be removed for realistic image generation.

This is not an unrealistic assumption after all, since curved surfaces are converted to
polygonal meshes anyway.

8.1 TYPES OF METHODS


The hidden surface removal methods are broadly of the following two types—object space
and image space methods.

Object Space Methods


In these algorithms, objects and parts of objects are compared to each other to determine
which surfaces are visible. The general approach can be summarized as follows:
For each object in the scene, do,
1. Determine those parts of the object whose view is unobstructed by other parts of it or any
other object with respect to the viewing specification.
2. Render those parts with the object color.
As you can see, such methods work before projection and have both advantages and dis-
advantages. On the plus side, they are device-independent and work for any resolution.
However, step 1 is computation-intensive. Depending on the complexity of the scene and
the hardware resources available, the methods can even become infeasible. Usually, such
methods are suitable for simple scenes with small number of objects.

Image Space Methods


In such an approach, visibility is decided point-by-point at each pixel position on the
projection plane. All such methods perform the following steps.
For each pixel on the screen, do,
1. Determine objects closest to the viewer that are pierced by the projector through the pixel.
2. Draw the pixel with the object color.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 150 — #3


i i

150 Computer Graphics

Clearly, such methods work after the surfaces are projected and rasterized (i.e.,
mapped to pixels). The computations involved are usually less, although the methods
depend on the display resolution. A change in resolution requires recomputation of
pixel colors.
In this chapter, we shall have a closer look-at some of the algorithms belonging to both
these classes.

8.2 APPLICATION OF COHERENCE


As the general algorithms of the two classes of methods show, hidden surface elim-
ination is a computationally intensive process. Therefore, it is natural that we try to
reduce computations. One way to do this is to use the coherence properties. Here,
we exploit the local similarities, that is, making use of the results calculated for one
part of the scene or image for the other nearby parts. There are several types of
coherences.
In object coherence, we can check for visibility of an object with respect to another by
comparing its circumscribing solids (which are usually of simple forms such as sphere or
cube). Only if the solids overlap, we go for further processing.
In face coherence, surface properties computed for one part of a surface can be
applied to adjacent parts of the same surface. For example, if the surface is small,
we can sometimes assume that the surface is invisible to a viewer if one part of it is
invisible.
Such coherence checking can be performed for edges also. The edge coherence indicates
that the visibility of an edge changes only when it crosses another edge. Therefore, if one
segment of a non-intersecting edge is visible, we determine without further calculation that
the entire edge is also visible.
The scan line coherence states that a line or surface segment visible in one scan line is
also likely to be visible in the adjacent scan lines. Therefore, we need not perform visibility
computations for every scan line.
In many cases, a group of adjacent pixels in an image is often found to be covered by the
same visible object. This is known as the area and span coherence, which is based on the
assumption that a small enough region of pixels will most likely lie within a single poly-
gon. This reduces computation effort in searching for those polygons which contain a given
screen area (region of pixels).
The depth coherence tells us that the depth of the adjacent parts of the same surface are
similar, which is very useful in determining the visibility of adjacent parts.
Finally, we have the frame coherence which implies that pictures of the same scene at
successive points in time are likely to be similar, despite small changes in objects and view-
point, except near the edges of moving objects. Consequently, visibility computations need
not be performed for every scene rendered on the screen.
Such properties are used in one form or the other in most of the hidden surface removal
methods. As we mentioned, they reduce computations to a good extent. In addition, there
is another method called back face elimination, which is very simple and eliminates a large
number of hidden surfaces.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 151 — #4


i i

Hidden Surface Removal 151

8.3 BACK FACE ELIMINATION


For a scene consisting of polyhedrons, back face elimination is the simplest way of removing
a large number of hidden surfaces. As the name suggests, the objective of this method is to
detect and eliminate surfaces that are on the back side of objects with respect to the viewer.
The process is simple and consists of the following steps.
1. From the surface vertices, determine the normal for each surface (for calculation of
normal vectors, see Appendix A). Let NEi = (a, b, c) be the normal vector of the ith
surface.
2. If c < 0, the surface cannot be seen (i.e., it is a back face and should be eliminated). Oth-
erwise, if c = 0, then the viewing vector grazes the surface. In other words, the surface
cannot be seen and should be eliminated. Otherwise, retain the surface (i.e., if c > 0).
3. Perform steps 1 and 2 for all surfaces.
Let us understand the idea with an example. Consider the polyhedron of Fig. 8.2, with the
vertex coordinates given. As we can see, there are four surfaces, which we can represent as a
list S = [ACB, ADB, DCB, ADC]. For each of these surfaces, we calculate the z component
of the surface normal. Let us start with ACB. The z component of the normal is −12 (for
details of calculation, see Appendix A). Since this is less than 0, the surface ACB is not vis-
ible. Similarly, for the surfaces ADB, DCB, and ADC, we calculate the z components of the
surface normals to be −4, 4, and 2, respectively. Note that the two surfaces DCB and ADC
have the z component of the surface normal greater than 0. Hence, these two are the visible
surfaces.
Y
A(4, 5, 4)
B(2, 3, 6)

X
D(3, 2, 2)
C(6, 1, 4)

Z Viewing direction (−Z axis)

Fig. 8.2 Illustration of the back face elimination method

Since the method works on surfaces, this is an object space method. Using this simple
method, about half of the surfaces in a scene can be eliminated. However, note that the
method does not consider obscuring of a surface by other objects in the scene. For such situ-
ations, we need to apply other algorithms (in conjunction with this method), as we shall see
next.

8.4 DEPTH (Z) BUFFER ALGORITHM


The depth (Z) buffer algorithm is an image space method, in which the comparisons are done
at the pixel level. We assume the presence of an extra storage, called the depth-buffer (also

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 152 — #5


i i

152 Computer Graphics

called z-buffer). Its size is the same as that of the frame buffer (i.e., one storage for each
pixel). As we assume canonical volumes, we know that the depth of any point within the
surface cannot exceed the normalized range; hence, we can fix the size of the depth-buffer
(number of bits per pixel).
The idea of the method is simple (we assume that 0 ≤ depth ≤ 1): at the begin-
ning, we initialize the depth-buffer locations with 1.0 (the maximum depth value) and
the frame buffer locations with the value corresponding to the background color. Then,
we process each surface of the scene at a time. For each projected pixel position (i, j)
of the surface s, we calculate the depth of the point in 3D dijs . Then, we compare dijs
value with the corresponding entry in the depth-buffer (i.e., (i, j)th depth-buffer value
DBij ). If dijs < DBij , we set DBij = dijs and the surface color value is set to the
corresponding location in the frame buffer. The process is repeated for all projected
points of the surface and for all surfaces. The pseudocode of the method is shown
in Algorithm 8.1.

Algorithm 8.1 Depth-buffer algorithm

1: Input: Depth-buffer DB[][] initialized to 1.0, frame buffer FB[][] initialized to background color
value, list of surfaces S, list of projected points for each surface.
2: Output: DB[][] and FB[][] with appropriate values.
3: for each surface in S do
4: for each projected pixel position of the surface i, j, starting from the top-leftmost projected pixel
position do
5: Calculate depth d of the projected point on the surface.
6: if d <DB[i][j] then
7: Set DB[i][j]=d
8: Set FB[i][j]=surface color
9: end if
10: end for
11: end for

We can follow an iterative procedure to calculate the depth of a surface point. We can repre-
sent a planar surface with its equation ax + by + cz + d = 0, where a, b, c, and d are surface
−ax − by − d
constants. Thus, depth of any point on the surface is represented as z = .
c
Since we are assuming a canonical view volume, all projections are parallel. Thus, a point
(x, y, z) is projected to the point (x, y) on the view plane. Now, consider a projected pixel
−ai − bj − d
(i, j) of the surface. Then, depth of the original surface point is z = . As we
c
progress along the same scan line, the next pixel is at (i + 1, j). Thus, the depth of the
−a(i + 1) − bj − d −ai − bj − d a a
corresponding surface point is z′ = = − = z− .
c c c c
Hence, along the same scan line, we can calculate the depth of consecutive surface pixels
a
by adding a constant term (− ) from the current depth value. We can perform similar iter-
c
ations across scan lines also. Assume a point (x, y) on an edge of the projected surface. If

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 153 — #6


i i

Hidden Surface Removal 153

1
we go down to the next scan line, the x value of the edge point will be x′ = x − where
m
m 6= 0 is the slope of the edge. The y value also becomes y − 1. Hence, depth of that point is
−a(x − m1 ) − b(y − 1) − d a
+b
z′ = . Rearranging, we get z′ = z + m . In other words, the
c c
depth of the starting x position of the projected points on a scan line can be found by adding
a constant term to the depth of the starting x position of the previous line (m 6= 0). The idea
is illustrated in Fig. 8.3 with the pseudocode shown in Algorithm 8.2.

Topmost scan line


for the surface
y th scan line
a (y − 1)th scan line
+b
z′ = z + m
c

x x+1
a
z″ = z′ −
c

Fig. 8.3 Illustration of the iterative depth calculation procedure

Algorithm 8.2 Iterative depth calculation

1: Input: Plane constants a, b, c, d


2: Output: Depth values of the projected surface points stored in DV[][]
3: Initialize x = leftmost projected point, y = topmost scan line, z value of the leftmost edge point
a
a +b
z′ = 0, constants c1 = − , c2 = m .
c c
−ax − by − d
4: Compute depth of the projected surface point at (x, y): z =
c
5: Set z′ = z
6: for y = topmost scan line to the lowermost scan line do
7: Set DV[leftmost edge pixel][y] = z
8: for x = (leftmost edge pixel + 1) to rightmost edge pixel do
9: z = z + c1
10: Set DV[x][y] = z
11: end for
12: Set z = z′ + c2 , z′ = z
13: end for

Let us try to understand Algorithm 8.1 in terms of a numerical example. For simplicity,
we shall assume an arbitrary (that means, the coordinate extents are not normalized) parallel
view volume. In that case, we shall initialize depth-buffer with a very large value (let us
denote that by the symbol ∞).

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 154 — #7


i i

154 Computer Graphics

Example 8.1
Assume there are two triangular surfaces s1 and s2 in the view voloume. The vertices of s1 are
[(0,0,6), (6,0,0), (0,6,0)] and that of s2 are [(2,0,6), (6,0,6), (4,4,0)]. Since we are assuming parallel
projection, the projected vertices of s1 on the view plane are [(0,0), (6,0), (0,6)] (simply drop the z
coordinate value). Similarly, for s2 , the projected vertices are [(2,0), (6,0), (4,4)]. The situation is
depicted in Fig. 8.4.

Y
6

(4, 4)

X
(0, 0) 2 6
(3, 1)

Fig. 8.4 The two surfaces after their projection

Solution Let us follow the steps of the algorithm to determine the color of the pixel (3,1). Assume
that cl1 and cl2 are the colors of the surfaces s1 and s2 , respectively and the background color is bg.
After initialization, the depth-buffer value DB[3][1]= ∞ and the frame buffer value FB[3][1]=bg.
We will process the surfaces one at a time in the order s1 followed by s2 .
From the vertices, we can determine the surface equation of s1 as x + y + z − 6 = 0 (see
Appendix A for details). Using the surface equation, we first determine the depth of the leftmost
projected surface pixel on the topmost scan line. In our case, the pixel is (0,6) with a depth of
−1.0 − 1.6 − (−6)
z = = 0. Since this is the only point on the topmost scan line, we move
1
to the next scan line below (y = 5). Using iterative method, we determine the depth of the
a
+b
leftmost projected pixel on this scan line (0,5) to be z′ = z + m . However, note that we
c
a 1
have the slope of the left edge m = ∞. Hence, we set = 0. Therefore, z′ = 0 + = 1.
m 1
The algorithm next computes depth and determines the color values of the pixel along the scan
line y = 5 till it reaches the right edge. At that point, it goes to the next scan line down
(y = 4). For brevity, we will skip these steps and go to the scan line y = 1, as our point of
interest is (3,1).
Following the iterative procedure across scan lines, we compute the depth of the left most pro-
jected surface point (0,1) as z = 5 (check for yourself). We now move along the scan line to the
a
next projected pixel (1,1). Its depth can be iteratively computed as z = z + (− ) = 5 − 1 = 4.
c
Similarly, the depth of the next pixel (2,1) is z = 4 − 1 = 3. In this way, we calculate depth of s1
at (3,1) as z = 3 − 1 = 2. As you can see, this depth value at (3,1) is less than DB[3][1], which is
∞. Hence, we set DB[3][1]=2 and FB[3][1]= cl1 .

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 155 — #8


i i

Hidden Surface Removal 155

Afterwards, the other projected points of s1 are computed in a likewise manner, till the rightmost
edge point on the lowermost scan line. However, we shall skip those steps for brevity and move to
the processing of the next surface.
From the vertices of s2 , we derive the surface equation as 3y + 2z − 12 = 0 (see Appendix A
for details). The projected point on the topmost scan line is (4,4). Therefore, depth at this point is
−3.4 − (−12)
z= = 0. Going down the scan lines (skipping the pixel processing along the scan
2
lines for brevity, as before), we reach y = 1. Note that the slope of the left edge of the projected
surface is m = 2. We can calculate the left-most projected point on y = 1 iteratively based on the
fact that the x-coordinate of the intersection point of the line with slope m and the (y − 1)th scan
1
line is x − , if the x-coordinate of the intersection point of the same line with the yth scan line is
m
x. In this way, we compute x = 2.5 for y = 1. In other words, the leftmost projected point of s2 on
y = 1 is (2.5,1). Using the iterative procedure for depth calculation across scan line (with z = 0
at y = 4), we compute depth at this point to be z = 4.5 (check for yourself). Next, we apply the
iterative depth calculation along the scan line to determine depth of the projected point (3,1) (the
very next projected pixel) to be z = 4.5.
Note that z = 4.5 > DB[3][1], which is having the value 2 after the processing of s1 . There-
fore, the DB value and the corresponding FB value (which is cl1 ) is not changed. The algorithm
processes all the other pixels along y = 1 till the right edge in a likewise manner and the pixels for
y = 0. However, as before we skip those calculations for brevity. Thus, at the end of processing
the two surfaces, we shall have DB[3][1]=2 and FB[3][1]= cl1 .

8.5 A-BUFFER ALGORITHM


In the depth-buffer algorithm, a pixel can have only one surface color. In other words, from
any given viewing position, only one surface is visible. This is alright if we are dealing with
opaque surfaces only. However, as we know, if the scene contains transparent surfaces vis-
ible from a pixel position, the pixel color is a combination of the surface color as well as
contributions from the surfaces behind (see Chapter 4). Since in the depth-buffer method,
we have only one location to store depth value for each pixel (the depth buffer), we cannot
store all the surfaces contributing to the color value for a transparent surface simultaneously.
The A-buffer method is an attempt to overcome this limitation of the depth-buffer
algorithm. The ‘A’ in the A-buffer method stands for accumulation. The algorithm works
in the following way: as before, we have access to a depth-buffer although we now call
it A-buffer. Each location of the A-buffer corresponds to a pixel position just like a depth
buffer. However, unlike before, each position in the buffer is not a single field of depth value.
Instead, it can reference a linked list of surfaces. There are the following two fields at each
A-buffer location.
1. The depth field as in the depth-buffer. This field can store a real-number value (positive,
negative, or zero).
2. The surface data field. This is the modification to the depth-buffer. It contains various
surface data or a pointer to the next node in a linked list data structure.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 156 — #9


i i

156 Computer Graphics

The surface data includes the following:


1. RGB intensity components
2. Opacity factor value (indicating percent of transparency)
3. Depth
4. Surface identities
Depending on the value stored at the depth field of the A-buffer, we will be able to deter-
mine if the pixel color comes from a single surface (i.e., opaque surface) or a combination
of multiple surface colors (i.e., a transparent surface is visible from the pixel position). The
convention usually followed is this: if the depth field stores a non-negative value, the number
indicates the depth of an opaque surface. The surface data field contains various informa-
tion related to that surface. However, if the depth field contains a negative value, the visible
surface is transparent. In that case, the surface data field contains a pointer to a linked list
of data of all surfaces that contribute to the surface color. The two situations are shown
schematically in Fig. 8.5 for illustration. The scenario when depth of the visible surface is
non-negative is shown in Fig. 8.5(a). The case for a transparent visible surface is shown in
Fig. 8.5(b).

Surface
depth ≥ 0
data

(a)

Surface 1 Surface 2
depth ≥ 0
data data

(b)

Fig. 8.5 The figure illustrates the organization of an A-buffer location for the two possible
cases. In (a), organization for an opaque visible surface is shown. The case for a transparent
visible surface is shown in (b).

8.6 DEPTH SORTING (PAINTER’S) ALGORITHM


As we noted before, back face elimination is an object space method, whereas the depth-
buffer algorithm is performed in image space. In this section, we shall learn about the depth
sorting algorithm that works at both the image and object spaces. The algorithm is often
called the painter’s algorithm, as it tries to simulate the way a painter draws a scene.
There are two basic steps in the algorithm: first, we sort the surfaces present in the
scene on the basis of their depth (from the view plane). In order to do that, we deter-
mine the maximum and minimum depths of each surface. The list is created based on
the maximum depth. Let us denote the list by S = {s1 , s2 , · · · sn } where si denotes the
ith surface and depth(si )<depth(si+1 ). Next, we render the surface on the screen one at a
time starting with the surface having maximum depth (i.e., sn in S) to the surface with the
lowest depth.
During rendering of each surface si , we compare it with all the other surfaces of S to
see if there is any depth overlap (i.e., the minimum depth of one surface is greater than the

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 157 — #10


i i

Hidden Surface Removal 157

maximum depth of the other surface, see Fig. 8.6). If no overlap is found, we render the
surface and remove it from S. Otherwise, we perform the following checks.

Surface 2
Z Z
Zmax Zmax

Zmin Zmax

Zmax Zmin

Zmin Zmin
X X

Surface 1

Fig. 8.6 The figure illustrates the idea of depth overlap. No depth overlap between the two
surface is there on the left figure whereas in the right figure, the surfaces overlap.

1. The bounding rectangles of the two surfaces do not overlap.


2. Surface si is completely behind the overlapping surface relative to the viewing position.
3. The overlapping surface is completely in front of si relative to the viewing position.
4. The boundary edge projections of the two surfaces onto the view plane do not overlap.
The first check is true if there is no overlap in the x and y coordinate extents of the two
surfaces. If either of these coordinates overlap, then the first condition fails. Figure 8.7 illus-
trates the idea for the case where there is an overlap of the bounding rectangles along the
x direction. In order to check for the second condition in this list, we need to determine
the plane equation of the overlapping surface (the plane normal should point towards the
viewer). Next, we check all the vertices of si with the equation. If for all the vertices of si ,
the plane equation returns a value < 0, si is behind the overlapping surface. Otherwise, the
second condition fails. The idea is illustrated in Fig. 8.8. The third condition can be checked
similarly with the plane equation of si and the vertices of the overlapping surface (for all

Z
Zmax Surface 2

Zmin Surface 1

Zmax

Zmin
X
Xmin Xmin Xmax Xmax

Fig. 8.7 The figure illustrates the idea of bounding rectangle overlap of two surfaces along
the x axis

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 158 — #11


i i

158 Computer Graphics

Zmax

Surface 2
Zmax

Zmin
Surface 1
Zmin
X

Fig. 8.8 An example showing one surface (surface 1) completely behind the other surface,
viewed along the -z direction

vertices, the equation should return positive value). Figure 8.9 depicts the situation for two
surfaces. In order to check the final condition, we need to have the set of projected pixels
for each surface and then check if there are any common pixels in the two sets (se Fig. 8.10
for illustration). As you can see, the first and the last checks are performed at the pixel level,
whereas the other two checks are performed at the object level. Hence, the depth sorting
algorithm incorporates elements of both the object space and image space methods.
The tests are performed following the order as in our preceeding discussion. As soon
as one of the checks is true, we move to check for overlap with the next surface of the list.

Z
Zmax

Surface 2
Zmax

Zmin
Surface 1
Zmin
X

Fig. 8.9 Illustration of one surface (surface 2) completely in front of surface 1, although
surface 1 is not completely behind surface 2

Y
Surface 1 Bounding box

Surface 2

Fig. 8.10 An example where the projected surfaces do not overlap although their bounding
rectangles do

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 159 — #12


i i

Hidden Surface Removal 159

If all tests fail, we swap the order of the surfaces in the list (called reordering) and stop. Then,
we restart the whole process again. The steps of the depth sorting method, in pseudocode,
are shown in Algorithm 8.3.

Algorithm 8.3 Painter’s Algorithm

1: Input: List of surfaces S = {s1 , s2 , · · · sn }, in sorted order (of increasing maximum depth value).
2: Output: Final frame buffer values.
3: Set a flag Reorder=OFF
4: repeat
5: Set s = sn (i.e., the last element of S)
6: for for each surface si in S where 1 ≤ i < n do
7: if zmin (s) < zmax (si ) (that means, there is depth overlap) then
8: if (bounding rectangles of the two surfaces on the view plane do not overlap then
9: Set i = i + 1 and continue loop.
10: else if s is completely behind si then
11: Set i = i + 1 and continue loop
12: else if si is completely in front of s then
13: Set i = i + 1 and continue loop
14: else if projections of s and si do not overlap then
15: Set i = i + 1 and continue loop
16: else
17: Swap the positions of s and si in S
18: Set Reorder = ON
19: Exit inner loop
20: end if
21: end if
22: end for
23: if Reorder = OFF then
24: Invoke rendering routine for s
25: Set S = S − s
26: else
27: Set Reorder = OFF
28: end if
29: until S = NULL

Sometimes, there are surfaces that intersect each other. As an example, consider Fig. 8.11,
in which the two surfaces intersect. As a result, one part of surface 1 is at a depth larger than
surface 2, although the other part has lesser depth. Therefore, we may initially keep surface 1
after surface 2 in the sorted list. However, since the conditions fail (check for yourself), we
have to reorder them (bring surface 1 in front of surface 2 in the list). As you can check, the
conditions shall fail again and we have to reorder again. This will go on in an infinite loop
and Algorithm 8.3 will loop forever.
In order to avoid such situations, we can use an extra flag (a Boolean variable) for each
surface. If a surface is reordered, the corresponding flag will be set on. If the surface needs
to be reordered next time, we shall do the following.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 160 — #13


i i

160 Computer Graphics

1. Divide the surface along the intersection line of the two surfaces.
2. Add the two new surfaces in the sorted list, at appropriate positions.

Z
Surface 1
Intersection line

Surface 2

Fig. 8.11 Two surfaces that intersect each other. Algorithm 8.3 should be modified to take
care of such cases

8.7 WARNOCK’S ALGORITHM


There is a group of hidden surface removal techniques collectively known as the area sub-
division methods. All these methods work on the same general idea: we first consider an
area of the projected image. If we can determine which (polygonal) surfaces are visible
in the area, we assign those surface colors to the area. Otherwise, we recursively subdi-
vide the area into smaller regions and apply the same decision logic on the subregions. The
Warnock’s algorithm is one of the earliest among all the subdivision methods developed for
hidden surface removal. The algorithm works as follows.
First, we subdivide a screen area into four equal squares. Then, we check for visibility
in each square to determine the color of the pixels contained in the (square) region. The
algorithm proceeds as per the following cases.
1. The current square region being checked does not contain any surface. Thus, we do not
subdivide the region any further and assign background color to the pixels contained in it.
2. The nearest surface completely overlaps the region under consideration. In that case, the
square is not subdivided further. We assign the surface color to the region.
3. None of these. We recursively divide the region into four subregions and repeat the afore-
mentioned checks. The recursion stops if either of the cases is met or the region size
becomes equal to the pixel size.
The steps of the Warnock’s algorithm are shown in Algorithm 8.4 (in pseudocode).
Let us consider Fig. 8.12 to understand the steps of Algorithm 8.4. In the figure, there
is a surface occupying a region of the screen (the shaded region). One vertex of the polyg-
onal surface is on the center of the screen. We shall execute the steps of Algorithm 8.4 to
determine the color of the screen where the surface is.
We make the first call to the algorithm with the whole screen region as input. The
algorithm then creates four subregions of equal size (denoted by P1 , P2 , P3 , and P4 in the
figure) and checks for visibility of the surface for each subregion inside the loop. We first
check the region P1 . Since it does not contain any surface, background color is assigned to

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 161 — #14


i i

Hidden Surface Removal 161

Algorithm 8.4 Warnock’s Algorithm

1: Input: The screen region


2: function Warnock (Projected region P)
3: Divide the input region P into four equal sized subregions P1 , P2 , P3 and P4
4: for each subregion Pi do
5: if there is no surface in Pi or Pi equals the pixel size then
6: Assign background color to Pi
7: else if the nearest surface completely overlaps Pi then
8: Assign the surface color to Pi
9: else
10: Warnock (Pi )
11: end if
12: end for

P2 P1

P312
P32 P31
P313 P314 P4
P3
P33 P34

Subregion P311 overlapped by the surface

Fig. 8.12 Example illustrating Warnock’s algorithm

this region. Next, we check the region P2 . Again, no surface is contained within this region.
So, we assign background color to the region and proceed to the next region P3 .
We determine that P3 contains the surface. However, it is not completely overlapped by
the surface. Therefore, we go for dividing the region into four subregions of equal size (the
recursive call in the Algorithm 8.4, line 10). The four subregions are denoted by P31 , P32 ,
P33 , and P34 in the figure. For each of these subregions, we perform the checks again.
We find that the subregion P31 contains the surface. However, the surface does not com-
pletely overlap P31 . Therefore, we go for subdividing the region. The four subregions of
P31 are denoted as P311 , P312 , P313 , and P314 . We then check each of these subregions for
surface visibility.
Since the surface lies in the subregion P311 and is completely overlapping it, we assign
the surface color to the subregion P311 . The other three subregions of P31 do not contain

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 162 — #15


i i

162 Computer Graphics

any surface. Therefore, all these subregions are assigned background color. This completes
our processing of the subregion P31 .
We then retrace the recursive step and go for checking the other three subregions of the
region P3 , namely P32 , P33 , and P34 . Since none of them contains any surface, we assign
background color to them and complete our processing of the subregion P3 .
We then return from the recursive step to check the remaining subregion P4 of the screen.
We find that the region contains no surface. Therefore, background color is assigned to it.
Since all the regions have been checked, the algorithm stops.

8.8 OCTREE METHODS


The depth-buffer algorithm is an image space method while the depth sorting algorithm is
a combination of both image and object space methods. There are several other algorithms
available for hidden surface removal, which belong to either of the methods. In the follow-
ing, we discuss another hidden surface removal technique based on the octree representation
of objects. As may be clear to you, the method depends on a particular object representation
technique, namely the octree representation (see Chapter 2). Therefore, we categorize it as
an object space method. Note that the method assumes a volumetric (or interior) represen-
tation of objects, while in the back face elimination or depth sorting methods, we assumed
that objects are represented by their bounding surfaces (exterior representation).
As you may recall, in octree representation, we first divide a 3D space into eight regions
(quadrants). Each region is then subdivided into eight subregions. The process continues
in a recursion till we reach a predefined region size (usually unit cubes, sometimes called
voxels, see Chapter 2). After the division, we get an octree (a tree with each node having
eight children). For simplicity, we shall assume a cubical region to start with and that the
divisions are performed till we reach voxels. In such case, the leaf nodes (voxels) of the
octree stores attributes of the object associated with it (we shall assume only color as the
object attribute for simplicity). Thus, the voxels with attributes define objects in the scene.
In terms of this simplistic representation, let us try to understand the octree methods for
hidden surface removal.
During the creation of the octree representation, we label each of the eight subregions of
a region according to its position with respect to the viewer. Refer to Fig. 8.13. With respect
to the viewer position, we may number the eight subregions of a region as {1,2,3,4} denot-
ing front four quadrants and {5,6,7,8} denoting the four back quandrants. We follow such
convention in each step of the recursive subdivision of a region. When we reach the leaf
5
6

1 2
8 7
4 3

Viewing direction

Fig. 8.13 The naming of regions with respect to a viewer in an octree method

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 163 — #16


i i

Hidden Surface Removal 163

nodes (i.e., voxels), each voxel shall have the information about its position with respect to
the viewer, along with the color of the object associated with it.
In order to render the scene represented by the octree, we project the voxels on the view
plane in a front-to-back manner. In other words, we start with the voxels nearest to the
viewer, then move to the next nearest voxels, and so on. It is easy to see that the terms
such as voxels nearest to the viewer indicates a voxel grid (having voxels at the same dis-
tance from the viewer) with a one-to-one correspondance to the pixel grid on the view plane
(since both the grid sizes are same). When a color is encountered in a voxel (of a particular
grid), the corresponding pixel in the frame buffer is painted only if no previous color has
been loaded into the same pixel position. This we can achieve by initially assuming that the
frame buffer locations contains 0 (no color). A pseudocode of the method (for our simplest
octree representation) is shown in Algorithm 8.5.

Algorithm 8.5 Octree method of hidden surface removal

1: Input: The set of octree leaf nodes (voxels) OCT with each voxel having two attributes (distance from
the viewer, color). We assume that the nearest voxel to viewer has a distance 0.
2: Output: Frame buffer with appropriate color in its locations
3: Set all frame buffer locations to 0 (to denote no color)
4: Set distance = 0
5: repeat
6: for From the leftmost to the rightmost element of OCT do
7: Take an element of OCT (denoted by v)
8: if distance of v from viewer = distance then
9: if corresponding frame buffer location has a value 0 then
10: Set the voxel color as the corresponding frame buffer value
11: end if
12: end if
13: Set OCT = OCT − v
14: end for
15: Set distance = distance + 1
16: until OCT = NULL

Example 8.2
Let us consider an example to illustrate the idea of our simplistic octree method for hidden surface
removal. Assume that we have a display device with a 4 × 4 pixel grid. On this display, we wish
to project a scene enclosed in a cubical region with each side = 4 unit. Note that, we shall have
two levels of recursion to create the octree representation of the scene till we reach the voxel level.
In the first recursion, we create eight regions from the original volume given, each region having
a side length = 2 units. In the next level of recursion, we divide each of the subregions further,
so that we reach the voxel level. The divisions are illustrated in Fig. 8.14, showing two levels of
recursion for one quadrant. Other quadrants are divided similarly.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 164 — #17


i i

164 Computer Graphics

7 6
5
5
8 6
1 2
1 2 7
8 4 3
4 3

Viewing direction

Fig. 8.14 Octree of Example 8.2

Note how the naming convention is used. In the first level of recursion, we named the quadrants
as we discussed before ({1,2,3,4} are the front regions while {5,6,7,8} are the back regions with
respect to the viewer). In the second level of recursion, we have done the same thing. Thus, after
the recursion ends, each voxel is associated with two numbers in the form {first level number, sec-
ond level number}. For example, the voxels shown in Fig. 8.14 will have numbers as {1,1}, {1,2},
{1,3}, {1,4}, {1,5}, {1,6}, {1,7} and {1,8}.
As you can see, from these numbers, we can make out the relative position of the voxels with
respect to the viewer. Hence, we can easily determine the voxel grids (i.e., voxels at the same
distance from the viewer). There are four such grids as follows:
Grid 1: {1,1}, {1,2}, {1,3}, {1,4}, {2,1}, {2,2}, {2,3}, {2,4}, {3,1}, {3,2}, {3,3}, {3,4}, {4,1},
{4,2}, {4,3}, and {4,4} [distance = 0 from the viewer]
Grid 2: {1,5}, {1,6}, {1,7}, {1,8}, {2,5}, {2,6}, {2,7}, {2,8}, {3,5}, {3,6}, {3,7}, {3,8}, {4,5},
{4,6}, {4,7}, and {4,8} [distance = 1 from the viewer]
Grid 3: {5,1}, {5,2}, {5,3}, {5,4}, {6,1}, {6,2}, {6,3}, {6,4}, {7,1}, {7,2}, {7,3}, {7,4}, {8,1},
{8,2}, {8,3}, and {8,4} [distance = 2 from the viewer]
Grid 4: {5,5}, {5,6}, {5,7}, {5,8}, {6,5}, {6,6}, {6,7}, {6,8}, {7,5}, {7,6}, {7,7}, {7,8}, {8,5},
{8,6}, {8,7}, and {8,8} [distance = 3 from the viewer]
It is also easy to define a mapping between voxel and pixel grids. For example, we may define
that a voxel with location (i,j) in a grid maps to the pixel (i,j) in the pixel grid.
Now, let us try to execute the steps of Algorithm 8.5. Initially, all pixels have a color value 0
and OCT contains all the voxels of the four grids. Since the distance of the voxels in grid 1 is 0,
all these voxels will be processed in the inner loop first. During the processing of each voxel, its
color (if any) will be set as the color of the corresponding frame buffer location. Afterwards, the
voxel will be removed from the list of voxels in OCT. Thus, after the first round of processing of
the inner loop, OCT shall contain voxels of grids 2, 3 and 4.
In a similar manner, voxels of grid 2 will be processed during the second round of inner loop
execution and the frame buffer colors modified appropriately (i.e., if a frame buffer location already
contains a non zero color value and the corresponding voxel in grid 2 has a color, the frame buffer
color value remains unchanged. Otherwise, the current voxel color will replace the frame buffer
color) . After the second round of inner loop execution, OCT shall contain grid 3 and 4 voxels
and the inner loop executes third time. The process continues in this way till the fourth round exe-
cution of the inner loop is over, after which we will have OCT = NULL and the execution of the
algorithm stops.

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 165 — #18


i i

Hidden Surface Removal 165

SUMMARY
In this chapter, we learnt about the idea of hidden surface removal in a scene. The objective is
to eliminate surfaces that are invisible to a viewer, with respect to a viewing positions. We learnt
about the two broad types of methods—image space and object space methods. The former
works at the pixel level while the later works at the level of object representations.
In order to reduce computations, coherence properties are used in conjunction with the
algorithms. We mentioned seven such properties, namely (a) object coherence, (b) face coher-
ence, (c) edge coherence, (d) scan line coherence, (e) area and span coherence, (f) depth
coherence, and (g) frame coherence. We discussed the ideas in brief. In addition to these, we
also saw how the back face elimination method provides a simple and efficient way for removal
of a large number of hidden surfaces.
Among the many hidden surface removal algorithms available, we discussed three in detail
along with illustrative examples. The first algorithm, namely the depth-buffer algorithm, is one of
the most popular algorithms which works in the image space. The depth sorting algorithm that
we discussed next works at both the image and object space. As we saw, it is more complex and
computation-intensive compared to the depth-buffer algorithm. The third algorithm, namely the
octree method, is an object space method that is based on the octree representation of objects.
We illustrated the idea of octree methods considering few simplisitic assumptions.
In the next chapter, we will discuss the final stage of a 3D graphics pipeline, namely rendering.

BIBLIOGRAPHIC NOTE
There are a large number of hidden surface removal techniques. We have discussed a few
of those. More techniques can be found in Elber and Cohen [1990], Franklin and Kankanhalli
[1990], Segal [1990], and Naylor et al. [1990]. A well-known hidden surface removal technique
is the A-buffer method. Works on this are presented in Cook et al. [1987], Haeberli and Akeley
[1990], and Shilling and Strasser [1993]. Hidden surface removal is also important in three-
dimensional line drawings. For curved surfaces, contour plots are displayed. Such contouring
techniques are summarized in Earnshaw [1985]. For various programming techniques for hid-
den surface detection and removal, the graphics gems book series can be referred (Glassner
[1990], Arvo [1991], Kirk [1992], Heckbert [1994], and Paeth [1995]).

KEY TERMS
(A)ccumulation buffer – a data structure to store depth and associated information for each
surface to which a pixel belongs
A-buffer method – an image space technique for hidden surface removal that works with
transparent surfaces also
Area subdivision – recursive subdivision of the projected area of a surface
Back face elimination – an object space method for hidden surface removal
Coherence – the property by which we can apply some results calculated for one part of a scene
or image to the other parts
Depth (Z) buffer algorithm – an image space method for hidden surface removal
Depth-buffer – a data structure to store the depth information of each pixel
Depth coherence – the depth of nearby parts of a surface is similar
Depth overlap – the minimum depth of one surface is greater than the maximum depth of another
surface

i i

i i
i i

“Chapter-8” — 2015/9/15 — 9:41 — page 166 — #19


i i

166 Computer Graphics

Depth sorting (Painter’s) algorithm – a hidden surface removal technique that combines
elements of both object space and image space methods
Face coherence – the property by which we can check visibility of one part of a surface by
checking its properties at other parts
Frame coherence – pictures of the successive frames are likely to be similar
Hidden surfaces – object surfaces that are hidden with respect to a particular viewing position
Image space method – hidden surface removal techniques that work with the pixel level
projections of object surfaces
Object coherence – determining visibility of an object surface with respect to nearby object
surfaces by comparing their bounding volumes
Object space methods – hidden surface removal techniques that work with the objects, rather
than their projections on the screen.
Octree method – an object space method for hidden surface removal
Scan line coherence – a line or surface segment visible in one scan line is also likely to be visible
in the adjacent scan line
Visible surfaces – surfaces that are visible with respect to the viewing position
Warnock’s algorithm – the earliest area subdivision method for hidden surface removal

EXERCISES
8.1 Discuss the importance of hidden surface removal in a 3D graphics pipeline. How is it
different from clipping?
8.2 What are the broad classes of hidden surface removal methods? Describe each class in
brief along with its pros and cons.
8.3 Briefly explain the idea of coherence. Why is it useful in hidden surface removal?
8.4 In which category of methods does the depth-buffer algorithm belong to? Justify.
8.5 We have discussed the depth-buffer algorithm (Algorithm 8.1) and the iterative depth
calculation (Algorithm 8.2) separately. Write the pseudocode of an algorithm combining
the two.
8.6 The depth sorting method is said to be a hybrid of the object space and image space
methods. Why?
8.7 Why does Algorithm 8.3 fail for intersecting surfaces? Explain with a suitable example.
8.8 Modify Algorithm 8.3 to take into account intersecting surfaces.
8.9 Consider the objects mentioned in Example 8.1. Can Algorithm 8.3 be applied to these
objects or the modified algorithm you wrote to answer Example 8.8? Show the execution
of the steps of the appropriate algorithm for the objects.
8.10 Both the back face elimination and octree methods belong to the object space methods.
What is the major difference between them?
8.11 Assume that during octree creation, we named each region as illustrated in Example 8.2.
Write the pseudocode of an algorithm to determine the distance of a voxel from the viewer.
Integrate this with Algorithm 8.5.
8.12 Algorithm 8.5 assumed a simplistic octree representation. Discuss ways to improve it.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 167 — #1


i i

CHAPTER
Rendering
9
Learning Objectives
After going through this chapter, the students will be able to
• Understand the concept of scan conversion
• Get an overview of the issues involved in line scan conversion
• Learn about the digital differential analyser (DDA) line drawing algorithm and its
advantage over the intuitive approach
• Understand the Bresenham’s line drawing algorithm and its advantage over the DDA
algorithm
• Get an overview of the issues involved in circle scan conversion
• Learn about the mid-point algorithm for circle scan conversion
• Understand the issues and approaches for fill area scan conversion
• Learn about the seed fill, flood fill, and scan line polygon fill algorithms for fill area scan
conversion
• Get an overview of character rendering methods
• Understand the problem of aliasing in scan conversion
• Learn about the Gupta-Sproull algorithm for anti-aliasing lines
• Learn about the area sampling and supersampling approaches towards anti-aliasing

INTRODUCTION
Let us review what we have learnt so far. In a 3D graphics pipeline, we start with the def-
inition of objetcs that make up the scene. We learnt different object definition techniques.
We also learnt the various geometric transformations to put the objects in their appropriate
place in a scene. Then, we learnt about the lighting and shading models that are used to
assign colors to the objects. Subsequently, we discussed the viewing pipeline comprising
the three stages: (a) view coordinate formation, (b) projection, and (c) window-to-viewport
transformations. We also learnt various algorithms for clipping and hidden surface removal.
Thus, we now know the stages involved in transforming a 3D scene to a 2D viewport
in the device coordinate system. Note that a device coordinate system is continuous in
nature (i.e., coordinates can be any real number). However, we must use the pixel grid
to render a scene on a physical display. Clearly, pixel grids represent a discrete coordi-
nate system, where any point must have integer coordinates. Thus, we need to map the

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 168 — #2


i i

168 Computer Graphics

scene defined in the viewport (continuous coordinates) to the pixel grid (discrete coor-
dinates). The algorithms and techniques used for perfoming this mapping are the subject
matter of this chapter. These techniques are collectively known as rendering (often called
scan conversion or rasterization). In the rest of this chapter, these terms will be used
synonymously.
The most basic problem in scan conversion is to map a point from the viewport to the
pixel grid. The approach is very simple: just round off the point coordinates to their nearest
integer value. For example, consider the point P(2.6,5.1). This viewport point is mapped to
the pixel grid point P′ (3,5) after rounding off the individual coordinates to their nearest inte-
gers. However, scan conversion of more complex primitives such as line and circle are not
so simple and we need more complex (and efficient) algorithms. Let us learn about those
algorithms.

9.1 SCAN CONVERSION OF A LINE SEGMENT


We know that a line segment is defined by its end points. In order to scan convert the
line segment, we need to map points on the line to the appropriate pixels. We can follow
a simple approach: first, we shall map the end points to the appropriate pixels following
the point scan conversion method (i.e., round off to nearest integer). This will give us the
starting and ending pixels for the line segment. We will then take one end point having
the lower x and y coordinate values. As we know that two pixels are seperated by a unit
distance along x- axis, we then work out the y coordinates for successive x-coordinates
differing by one. The computed y-coordinate is then mapped to its nearest integer, giv-
ing us the pixel coordinates of the point. Let us try to understand this idea in terms of
an example.
Assume we have a line segment defined by the end points A(2.1,2.3) and B(6.7,5.2). We
first convert the end points to pixels. Thus, the two end point pixels of the line segment are:
A′ (2,2) and B′ (7,5) (after rounding off each coordinate value to the nearest integer). Since
the coordinate values of A′ are less than those of B′ , we take A′ as our starting pixel and
compute the points on the line.
y2 −y1
First, from the end points, we compute slope (m) = 5−2 3
5−2 = 5 [we know that m = x2 −x1 ]
and y-intercept (b) = 2 − 53 .2 = 45 [from the equation y1 = m.x1 + b]. Then, for each
x-coordinate separated by unit distance (starting from 2), we compute the y-coordinate using
the line equation y = 35 x + 54 (till we reach the other end point B′ ). The computation yields
the following four values of y.

3 4
y(x = 3) = 3+ = 2.6
5 5
3 4
y(x = 4) = 4 + = 3.2
5 5
3 4
y(x = 5) = 5 + = 3.8
5 5
3 4
y(x = 6) = 6 + = 4.4
5 5

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 169 — #3


i i

Rendering 169

5 (7, 5)

Scan converted
pixel (3, 3)
Actual point on
the line(3, 2.6)

X
(0, 0) 2 7
(2, 2)

Fig. 9.1 Simple line scan conversion—Note that the actual points on the line are scan
converted to the nearest pixels after rounding off

Thus, between A′ and B′ , we obtain the four points on the line as (3,2.6), (4,3.2), (5,3.8),
and (6,4.4). Following point conversion technique, we determine the pixels for these points
as (3,3), (4,3), (5,4), and (6,4), respectively. The idea is illustrated in Fig. 9.1.
The approach is very simple. However, there are mainly two problems with this approach.
First, we need to perform the multiplication m.x. Second, we need to round off the y-
coordinates. Both of these may involve floating point operations, which are computationally
expensive. In typical graphics applications, we need to scan convert very large number of
lines within a very small time. In such cases, floating point operations make the process slow
and flickers may occur. Thus we need some better solution.

Role of slopes in line scan conversion


In the simple line scan conversion we dis- Thus, the two computed points between A′ and
cussed, we calculated the y-coordinate for each ′
B are (3.7,3) and (5.3,4).These two are scan con-
x-coordinate. We could have done the other way verted to the pixels (4,3) and (5,4). Recall that
round also. Let us see the result for the same when we computed y, we got the pixels (includ-
example (Fig.9.1). As before,we start with the end ing end points) (2,2), (3,3), (4,3), (5,4), (6,4), and
point A′ . This time, we calculate the x-coordinates (7,5). However, if we calculate x for this line seg-
of successive points by increasing y by 1 (mov- ment we get the pixels (2,2), (4,3), (5,4), and
ing from one scan line to the next) based on the (7,5). As you can see, the line segment ren-
equation x = y−b m . Therefore, we get two x val-
dered with the first set of pixels is much bet-
ues between y = 2 and y = 5 (the two end ter compared to the second set of pixels. Thus,
pixel y-coordinates), unlike the four y values we we have to decide which coordinate to calculate
got before. The two x values are, when.
The decision is taken based on the slope of
3 − 54 the line. If we have 0 ≤ m ≤ 1 or −1 ≤
x(y = 3) = = 3.7 m ≤ 0, we work out y-coordinates based on
3
5 x-coordinates (as we did in the example). Oth-
4 − 54 erwise, we compute x-coordinates based on the
x(y = 4) = = 5.3 y-coordinates.
3
5

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 170 — #4


i i

170 Computer Graphics

9.1.1 DDA Algorithm


DDA stands for digital differential analyser. In the DDA algorithm, we take an incremental
approach to speed up line scan conversion. In order to illustrate the working of the algorithm,
let us consider the previous example again.
Recall that there are four pixels we calculated between the two end pixels (2,2) and (7,5).
These pixels are (3,2.6), (4,3.2), (5,3.8), and (6,4.4). We know that the slope of the line
m = 53 = 0.6. Note that the successive y-coordinates are obtained by adding 0.6 to the
current value (i.e., (3,2.6), (4,3.2 + 0.6), (5,(3.2 + 0.6) + 0.6), and (6,((3.2 + 0.6) + 0.6)
+ 0.6). Thus, instead of computing the y-coordinate by the line equation y = m.x + b
every time, we can simply add m to the current y value (i.e., yk+1 = yk + m). In this
way, we eliminate the floating point multiplication m.x from calculation. When m > 1 or
m < −1, we compute successive x-coordinates as xk+1 = xk + m1 (derivation is left as
an exercise), in the process eliminating floating point operations. The algorithm is shown
in Algorithm 9.1, where RoundOff(a) is a function to round off the real number a to its
nearest integer.
Although the DDA algorithm is able to reduce some floating point operations (mul-
tiplication), it still requires floating point addition and rounding off operations. More-
over, for large line segments, the rounding off may result in pixels that are far away
from the actual line. So, it is preferable to have a more efficient algorithm for line scan
conversion.

Algorithm 9.1 DDA algorithm

1: Input: The two line end points (x1 , y1 and x2 , y2 )


2: Output: Set of pixels P to render the line segment
3: Compute m = xy22 −x−y1
1
4: if 0 ≤ m ≤ 1 or −1 ≤ m ≤ 0 then
5: Set x = x1 + 1, y = y1
6: repeat
7: y=y+m
8: RoundOff(y)
9: Add (x, y) to P
10: Set x = x + 1
11: until x < x2
12: else
13: Set x = x1 , y = y1 + 1
14: repeat
15: x = x + m1
16: RoundOff(x)
17: Add (x, y) to P
18: Set y = y + 1
19: until y < y2
20: end if

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 171 — #5


i i

Rendering 171

9.1.2 Bresenham’s Algorithm


The Bresenham’s algorithm is a very efficient way to scan convert a line segment. Let us
first understand the idea of the algorithm. We shall discuss the idea for line segments with
0 ≤ m ≤ 1 or −1 ≤ m ≤ 0.
Consider Fig. 9.2. As we know, in order to determine pixels, we move along the x-direction
in unit steps from the current position (xk , yk ). At each of these steps, we have to choose
between the two y values yk and yk + 1. Clearly, we would like to choose a pixel that is closer
to the original line. Let us denote the distances of (xk + 1, yk + 1) and (xk + 1, yk ) from the
actual line by dupper and dlower , respectively (see Fig. 9.2). At xk + 1, the y-coordinate on the
line is y = m(xk + 1) + b, where m and b are the slope and y-intercept of the line segment,
respectively. Therefore, we can determine dupper and dlower as,

dupper = (yk + 1) − y = yk + 1 − m(xk + 1) − b


dlower = y − yk = m(xk + 1) + b − yk

Based on these two quantities, we can take a simple decision about the pixel closer to the
actual line, by taking the difference of the two.

dlower − dupper = 2m(xk + 1) − 2yk + 2b − 1

If the difference is less than 0, the lower pixel is closer to the line and we choose it. Oth-
△y
erwise, we choose the upper pixel. Now, we substitute m in this expression with △x , where
△y and △x are the differences between the end points. Rearranging, we get,
 
△y
△x(dlower − dupper ) = △x 2 (xk + 1) − 2yk + 2b − 1
△x
= 2 △ y.xk − 2 △ x.yk + 2 △ y + △x(2b − 1)
= 2 △ y.xk − 2 △ x.yk + c

where the constant c = 2 △ y + △x(2b − 1)


Upper candidate pixel
Y
(xk + 1, yk + 1)

Actual point on dupper


the line

dlower
(xk, yk)
X
Lower candidate pixel
(xk + 1, yk)

Fig. 9.2 The key idea of Bresenham’s line scan conversion algorithm. The algorithm
chooses one of the two candidate pixels based on the distance of the pixels from the actual
line. The decision is taken entirely based on integer calculations.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 172 — #6


i i

172 Computer Graphics

Note that △x(dlower − dupper ) can also be used to make the decision about the closeness
of the pixels to the actual line. Let us denote this by pk , a decision parameter for the kth
step. Clearly, the sign of the decision parameter will be the same as that of (dlower − dupper ).
Hence, if pk < 0, the lower pixel is closer to the line and we choose it. Otherwise, we choose
the upper pixel.
At step k + 1, the decision parameter is,

pk+1 = 2 △ y.xk+1 − 2 △ x.yk+1 + c

Subtracting from pk , we get,

pk+1 − pk = 2 △ y(xk+1 − xk ) − 2 △ x(yk+1 − yk )

We know that xk+1 = xk + 1. Substituting and rearranging, we get Eq. 9.1.

pk+1 = pk + 2 △ y − 2 △ x(yk+1 − yk ) (9.1)

Note that in Eq. 9.1, if pk < 0, we set yk+1 = yk , otherwise we set yk+1 = yk + 1. Thus,
depending on the sign of pk , the difference (yk+1 − yk ) in this expression becomes either 0
or 1. The first decision parameter at the starting point is given by p0 = 2 △ y − △x.
What is the implication of this? We are choosing pixels at each step, depending on the
sign of the decision parameter. The decision parameter is computed entirely with inte-
ger operations only. All floating point operations are eliminated. Thus, the approach is a
huge improvement, in terms of speed of computation, over the previous approaches we dis-
cussed. The pseudocode of the Bresenham’s algorithm for line scan conversion is given in
Algorithm 9.2.

Algorithm 9.2 Bresenham’s line drawing algorithm

1: Input: The two line end points (x1 , y1 and x2 , y2 )


2: Output: Set of pixels P to render the line segment
3: Compute △x = x2 − x1 , △y = y2 − y1 , p = 2 △ y − △x
4: Set x = x1 , y = y1
5: Add (x1 , y1 ) to P
6: repeat
7: if p < 0 then
8: Set x = x + 1
9: Set p = p + 2 △ y
10: else
11: Set x = x + 1, y = y + 1
12: Set p = p + 2 △ y − 2 △ x
13: end if
14: Add (x, y) to P
15: until x < (x2 − 1)
16: Add (x2 , y2 ) to P

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 173 — #7


i i

Rendering 173

Example 9.1
In order to understand Algorithm 9.2, let us execute its steps for the line segment defined by
the end points A(2,2) and B(7,5) in our previous example. Following line 3 of Algorithm 9.2,
we compute △x = 5, △y = 3, and p = 1. The two variables x and y are set to the
end point A′ as x = 2, y = 2 (line 4). Also, the end point A′ (2,2) is added to P
(line 5).
Note that p = 1 ≥ 0. Therefore, we execute the ELSE part of the loop (lines 10–12) and we get
x = 3, y = 3, p = −3. The pixel (3,3) is added to the output list P (line 14). Since x = 3 < 6 (the
loop termination condition), loop is executed again.
In the second execution of the loop, we have p = −3 < 0. Thus, the IF part (lines 7–9) is
executed and we get x = 4, y = 3 (no change), and p = 3. The pixel (4,3) is added to the output
list P. Since x = 4 < 6, the loop is executed again.
Now we have p = 3 ≥ 0. Therefore, in the third execution of the loop, the statements in the
ELSE part are executed. We get x = 5, y = 4, p = −1. The pixel (5,4) is added to the output pixel
list P. Since x = 5 < 6, the loop is executed again.
In the fourth loop execution, p = −1 < 0. Hence, the IF part is executed with the result x = 6,
y = 4 (no change), and p = 5. The pixel (6,4) is added to the output list P. As the loop termination
condition (x < 6) is no longer true (x = 6), the loop stops.
Finally, we add the other end point B′ (7,5) to the output list P and the algorithm
stops.
Thus, we get the output list P = {(2, 2), (3, 3), (4, 3), (5, 4), (6, 4), (7, 5)} after the termination
of the algorithm.

Algorithm 9.2 works for line segments with 0 ≤ m ≤ 1 or −1 ≤ m ≤ 0. For other line
segments, minor modification to the algorithm is needed, which is left as an exercise for the
reader.

9.2 CIRCLE SCAN CONVERSION


Similar to lines, we can also use a very simple approach to scan convert circles. Let us
consider a circle centered at origin. Assuming a radius r, the equation for the circle is
x2 + y2 =√ r2 . Using this equation, we can solve for y after every unit increment of x
as y = ± r2 − x2 . Obviously, this solution is not good as it involves inefficient compu-
tations such as square root and multiplications (remember, r need not be integer). Also,
we may need to round off computed values. Moreover, the pixels obtained may not gen-
erate a smooth circle just as in the case of lines (the gap between actual points on the
circle and choosen pixels may be large). In the following section, we discuss a much
more efficient algorithm, known as the midpoint algorithm, to scan convert circles. For
simplicity, we shall restrict our discussion to circles about origin. We shall follow an
approach similar to the previous section: first, we shall discuss the derivation of the steps
of the algorithm; then, the pseudocode of the algorithm will be presented, followed by an
illustrative example.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 174 — #8


i i

174 Computer Graphics

9.2.1 Midpoint Algorithm


A circle about origin has an important property that we shall exploit in the algorithm—
the eight-way symmetry. We can divide a circle into eight quadrants, as shown in
Fig. 9.3. If we can determine one point on any of these quadrants, seven other points
on the circle belonging to the seven quadrants can be trivially derived, as illustrated in
Fig. 9.3. In the algorithm, we shall compute pixels for the top right eighth quadrant
(the marked quadrant of Fig. 9.3) and determine the pixels for other quadrants following
the property.
Now assume that we have just determined the pixel (xk , yk ), as shown in Fig. 9.4. The
next pixel is a choice between the two pixels (xk + 1, yk ) and (xk + 1, yk − 1) [note that here
we are going down the scan lines for the top right eighth quadrant, unlike the line scan con-
version where we were going up]. Clearly, the pixel that is closer to the actual circle should
be chosen. How do we decide that?
We perform some mathematical tricks here as we did for Bresenham’s line draw-
ing algorithm. We first restate the circle equation as f (x, y) = x2 + y2 − r2 .

(−x, y) (x, y)
Top right eighth
quadrant

(−y, x) (y, x)

(−y, −x) (y, −x)

(−x, −y) (x, −y)

Fig. 9.3 The eight way symmetry and its use to determine points on a circle centered at
origin. We compute pixels for the marked quadrant. The other pixels are determined based
on the property.

Upper candidate pixel


Y (xk + 1, yk)
Circle boundary

(xk, yk)
Midpoint
between
Actual point on candidate pixels
the line X

Lower candidate pixel


(xk + 1, yk − 1)

Fig. 9.4 Idea of candidate pixels and midpoint decision variable

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 175 — #9


i i

Rendering 175

The equation evaluates as follows.



< 0 if the point (x, y) is inside the circle

f (x, y) = 0 if the point (x, y) is on the circle

> 0 if the point (x, y) is outside the circle

We evaluate this function at the midpoint of the two candidate pixels to make our decision
(see Fig. 9.4). In other words, we compute the value f (xk + 1, yk − 21 ). Let us call this the
decision variable pk after the kth step. Thus, we have
 
1
pk = f xk + 1, yk −
2
1 2
 
2
= (xk + 1) + yk − − r2
2
Note that if pk < 0, the midpoint is inside the circle. Thus, yk is closer to the circle
boundary and we choose the pixel (xk + 1, yk ). Otherwise, we choose (xk + 1, yk − 1) as the
midpoint is outside the circle and yk − 1 is closer to the circle boundary.
To come up with an efficient algorithm, we perform some more tricks. First, we consider
the decision variable for the (k + 1)th step pk+1 as,
 
1
pk+1 = f xk+1 + 1, yk+1 −
2
1 2
 
2
= [(xk + 1) + 1] + yk+1 − − r2
2
After expanding the terms and rearranging, we get Eq. 9.2.
pk+1 = pk + 2(xk + 1) + (y2k+1 − y2k ) − (yk+1 − yk ) + 1 (9.2)
In Eq. 9.2, yk+1 is yk if pk < 0. In such a case, we have pk+1 = pk + 2xk + 3. If pk > 0,
we have yk+1 = yk − 1 and pk+1 = pk + 2(xk − yk ) + 5. Thus, we can choose pixels based on
an incremental approach (computing the next decision parameter from the current value).
One thing remains, that is the first decision variable p0 . This is the decision variable at
(0, r). Using the definition of p0 , we can compute it as follows.
 
1
p0 = f 0 + 1, r −
2
 2
1
= 1+ r− − r2
2
5
= −r
4
The pseudocode of the algorithm is shown in Algorithm 9.3, where RoundOff(a) rounds
off the number a to its nearest integer.
With very simple modifications to Algorithm 9.3, we can determine pixels for circles
about any arbitrary center. The modifications are left as an exercise for the reader.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 176 — #10


i i

176 Computer Graphics

Algorithm 9.3 Midpoint circle drawing algorithm

1: Input: The radius of the circle r


2: Output: Set of pixels P to render the line segment
3: Compute p = 45 − r
4: Set x = 0, y = RoundOff (r)
5: Add the four axis points (0, y), (y, 0), (0, −y) and (−y, 0) to P
6: repeat
7: if p < 0 then
8: Set p = p + 2x + 3
9: Set x = x + 1
10: else
11: Set p = p + 2(x − y) + 5
12: Set x = x + 1, y = y − 1
13: end if
14: Add (x, y) and the seven symmetric points {(y, x), (y, −x), (x, −y), (−x, −y), (−y, −x), (−y, x),
(−x, y)} to P
15: until x ≥ y

Example 9.2
Let us consider a circle with radius r = 2.7. We will execute the steps of Algorithm 9.3 to see how
the pixels are determined.
First, we compute p = −1.05 and set x = 0, y = 3 (lines 3–4). Also, we add the axis pixels
{(0,3), (3,0), (0,−3), (−3,0)} to the output pixel list P (line 5). Then, we enter the loop.
Note that p = −1.45 < 0. Hence, the IF part (lines 7–9) is executed and we get p = 1.55 and
x = 1 (y remains unchanged). So, the pixels added to P are (line 14): {(1,3), (3,1), (3,−1), (1,−3),
(−1,−3), (−3,−1), (−3,1), (−1,3)}. Since x = 1 < y = 3, the loop is executed again.
In the second run of the loop, we have p = 1.55 > 0. Hence, the ELSE part is now executed
(lines 10–12) and we get p = 2.55, x = 2, and y = 2. Therefore, the pixels added to P are {(2,2),
(2,2), (2,−2), (2,−2), (−2,−2), (−2,−2), (−2,2), (−2,2)}. Since now x = 2 = y, the algorithm
terminates.
Thus, at the end, P consists of the 20 pixels {(0,3), (3,0), (0,−3), (−3,0), (1,3), (3,1), (3,−1),
(1,−3), (−1,−3), (−3,−1), (−3,1), (−1,3), (2,2), (2,2), (2,−2), (2,−2), (−2,−2), (−2,−2), (−2,2),
(−2,2)}.
Note that the output set P contains some duplicate entries. Before rendering, we perform further
checks on P to remove such entries.

9.3 FILL AREA SCAN CONVERSION


What we discussed so far is concerned with the determination of pixels that define a line or
a circle boundary. Sometimes, however, we may know the pixels that are part of a region
and we want to apply a specific color to that region (i.e., color the pixels that are part

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 177 — #11


i i

Rendering 177

of the region). In other words, we want to fill the region with a specified color. For exam-
ple, consider an interactive painting system. You draw an arbitrary shape and color it (both
boundary and interior). Now, you want to change the color of the shape interactively (e.g.,
select a color from a menu and click in the interior of the shape to indicate that the new
color be applied to the shape). There are many ways to perform such region filling. The
techniques depend on how the regions are defined. There are broadly the following two
types of definitions of a region.
Pixel level definition A region is defined in terms of its boundary pixels (known as
boundary-defined) or the pixels within its boundary (called interior defined). Such defi-
nitions are used for regions having complex boundaries or in interactive painting systems.
Geometric definition A region is defined in terms of geometric primitives such as edges
and vertices. Primarily meant for defining polygonal regions, such definitions are commonly
used in general graphics packages.
In the following section, we shall discuss algorithms used to fill regions defined in either
of the ways.

9.3.1 Seed Fill Algorithm


In the seed fill algorithm, we start with one interior pixel and color the region progres-
sively. The algorithm works based on the boundary definition of a region (i.e., pixel level
definition with the boundary pixels specified). It further assumes that we know atleast one
interior pixel called the seed (which is easy to obtain from the boundary pixels). More-
over, it is assumed that an interior pixel is connected to either four (four-connected) or
eight (eight-connected) of its neighbouring pixels. In the former case, the neighbours are:
top, bottom, left, and right pixels. In the latter case, the neighbours are: top, top left,
top right, left, right, bottom, bottom left, and bottom right pixels. The algorithm works
as follows.
We maintain a stack for the algorithm. The seed is first pushed to the stack. While the
stack is not empty, we pop the stack top pixel and color it. Then, for four-connected con-
vention, we check each of the four connected pixels to the current pixels. If the connected
pixel is a boundary pixel (i.e., having the boundary color) or already has the specified color,
we ignore it. Otherwise, we push it into the stack. A similar step is done for eight-connected
convention also, with the only difference that the eight neighbouring pixels are checked
instead of four. The steps are shown in Algorithm 9.4.

9.3.2 Flood Fill Algorithm


In the flood fill algorithm, we assume an interior definition (i.e., interior pixels of a
region are known). We want to recolor the region with a specified color. The algorithm
is similar to the seed fill algorithm, with the difference that now we take decisions
based on the original interior color of the current pixels instead of the boundary
pixel color. Other things remain the same. The pseudocode of the procedure is shown
in Algorithm 9.5.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 178 — #12


i i

178 Computer Graphics

Algorithm 9.4 Seed fill algorithm

1: Input: Boundary pixel color, specified color, and the seed (interior pixel) p
2: Output: Interior pixels with specified color
3: Push(p) to Stack
4: repeat
5: Set current pixel = Pop(Stack)
6: Apply specified color to the current pixel
7: for Each of the four connected pixels (four-connected) or eight connected pixels (eight-connected)
of current pixel do
8: if (connected pixel color 6= boundary color) OR (connected pixel color 6= specified color) then
9: Push(connected pixel)
10: end if
11: end for
12: until Stack is empty

Algorithm 9.5 Flood fill algorithm

1: Input: Interior pixel color, specified color, and the seed (interior pixel) p
2: Output: Interior pixels with specified color
3: Push(p) to Stack
4: repeat
5: Set current pixel = Pop(Stack)
6: Apply specified color to the current pixel
7: for Each of the four connected pixels (four-connected) or eight connected pixels (eight-connected)
of current pixel do
8: if (Color(connected pixel) = interior color then
9: Push(connected pixel)
10: end if
11: end for
12: until Stack is empty

9.3.3 Scan Line Polygon Fill Algorithm


Unlike the seed fill or flood fill algorithms, we assume that a polygonal region is defined
in terms of its vertices and edges (i.e., a geometric definition) in the scan line polygon fill
algorithm. We shall further assume that the vertices are rounded off to the nearest pixels.
The pseudocode is shown in Algorithm 9.6.
Let us illustrate the idea with an example. Consider Fig. 9.5. The polygon is specified
with the four vertices A, B, C, and D (note that we are following an anti-clockwise vertex
naming convention). Therefore, the edges are AB, BC, CD, and DA. Let us execute the steps
of the algorithm.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 179 — #13


i i

Rendering 179

Algorithm 9.6 Scan line polygon fill algorithm

1: Input: Set of vertices of the polygon


2: Output: Interior pixels with specified color
3: From the vertices, determine the maximum and minimum scan lines (i.e., maximum and minimum y
values) for the polygon.
4: Set scanline = minimum
5: repeat
6: for Each edge (pair of vertices (x1 , y1 ) and (x2 , y2 )) of the polygon do
7: if (y1 ≤ scanline ≤ y2 ) OR (y2 ≤ scanline ≤ y1 ) then
8: Determine edge–scanline intersection point
9: end if.
10: end for
11: Sort the intersection points in increasing order of x coordinate
12: Apply specified color to the pixels that are within the intersection points
13: Set scanline = scanline + 1
14: until scanline = maximum

Y C(3, 6)

6
B(6, 4)

D(1, 3)
X
(0, 0) 7
A(5, 1)

Fig. 9.5 Illustrative example of the scan line polygon fill algorithm

First, we determine the maximum and minimum scanlines (line 3) from the coordinate of
the vertices as: maximum = max{1,4,6,3} (i.e., maximum of the vertex y-coordinates) = 6,
minimum = min{1,4,6,3} (i.e., minimum of the vertex y-coordinates) = 1.
In the first iteration of the outer loop, we first determine the intersection points of the scan
line y = 1 with all the four edges in the inner loop (lines 6–10). For the edge AB, the IF
condition is satisfied and we determine the intersection point as the vertex A (lines 7–8). For
BC and CD, the condition is not satisfied. However, for DA, again the condition is satisfied
and we get the vertex A again. Thus, the two intersection points determined by the algorithm
are the same vertex A. Since this is the only pixel between itself, we apply specified color to
it (lines 11–12). Then we set scanline = 2 (line 11). Since 2 6= maximum = 6, we reenter
the outer loop.
In the second iteration of the outer loop, we check for the intersection points between the
edges and the scanline y = 2. For the edge AB, the IF condition is satisfied. So there is an
intersection point, which is (5 31 , 2). The edges BC and CD do not satisfy the condition, hence

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 180 — #14


i i

180 Computer Graphics

there are no edge–scanline intersections. The condition is satisfied by the edge DA and the
intersection point is (3,2). After sorting (line 11), we have the two intersection points (3,2)
and (5 13 , 2). The pixels in between them are (3,2), (4,2), and (5,2). We apply the specified
color to these pixels (line 12) and set scanline = 3. Since 3 6= maximum = 6, we reenter the
outer loop.
The algorithm works in a similar way for the the remaining scanlines y = 3, y = 4,
y = 5, and y = 6 (the execution is left as an exercise for the reader). There are two
things in the algorithm that require some elaboration. First, how do we determine the edge–
scanline intersection point? Second, how do we determine pixels within two intersection
points?
We can use a simple method to determine the edge–scanline intersection point. First,
from the vertices, determine the line equation for an edge. For example, for the edge AB in
4−1
Fig. 9.5, we compute m = 6−4 = 3. Thus, the line equation is y = 3x + b. Now, evaluating
the equation at the end point A (i.e., x = 5, y = 1), we get b = 1 − 3.5 = −14. Therefore
the equation for AB is y = 3x − 14. Now, to determine the edge–scanline intersection point,
simply replace the scanline (y) value in the equation and compute the x-coordinate. Thus, to
get the x-coordinate of the intersection point between the scanline y = 2 and AB, we evaluate
2 = 3x − 14 or x = 5 13 .
Given two intersection points (x1 , y1 ) and (x2 , y2 ) where x1 < x2 , determination of the
pixels between them is easy. Increment x1 by one to get the next pixel and continue till
the current x value is less than x2 . If either or both the intersection points are pixels them-
selves, they are also included. As an illustration, consider the two intersection points (3,2)
and (5 31 , 2) of the polygon edges (AB and DA, respectively) with the scanline y = 2 in
Fig. 9.5. Here x1 = 3, x2 = 5 13 . The first pixel is the intersection point (3,2) itself. The next
pixel is (3 + 1, 2) or (4,2). We continue to get the next pixel as (4 + 1, 2) or (5,2). Since
5 + 1 = 6 > x2 = 5 31 , we stop.
An important point to note here is that Algorithm 9.6 works for convex polygons only.
For concave polygons, an additional problem needs to be solved. As we discussed before,
we determine pixels between the pair of edge–scanline intersection points. However, all
these pixels may not be inside the polygon in case of a concave polygon, as illustrated in
Fig. 9.6. Therefore, in addition to determining pixels, we also need to determine which
pixels are inside.
In order to make Algorithm 9.6 work for concave polygons, we have to perform an inside–
outside test for each pixel between a pair of edge–scanline intersection points (an additional
overhead). The following are the steps for a simple inside–outside test for a pixel p.
1. Determine the bounding box (maximum and minimum x and y coordinates) for the
polygon.
2. Choose an arbitrary pixel po outside the bounding box (This is easy. Simply choose a
point whose x and y coordinates are outside the minimum and maximum range of the
polygon coordinates).
3. Create a line by joining p and po (i.e., determine the line equation).
4. If the line intersects the polygon edges an even number of times, p is an outside pixel.
Otherwise, p is inside the polygon.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 181 — #15


i i

Rendering 181

Y (3, 6) (4, 3)

6
(6, 4)
Intersection
point A Intersection
point D

(1, 1)
X

Intersection (5, 1)
point B These pixels Intersection
are outside point C

Fig. 9.6 The problem with concave polygons—two pixels (3,2) and (4,2), which are inside
the pair of intersection points B and C, are not inside the polygon

9.4 CHARACTER RENDERING


An important issue in scan conversion is the way alphanumeric and non-alphanumeric char-
acters are rendered. These are the building blocks of any textual content displayed on the
screen. Efficient rendering of characters is necessary since we often need to display a large
amount of text in a short time span. For example, consider scrolling up/down a text docu-
ment. With each scroll action, a whole new set of characters needs to be displayed on the
screen quickly.
When we deal with characters, the term font or typeface is used to denote the overall
design style of the characters. For example, we have fonts such as Times New Roman,
Courier, Arial, and so on. Each of these fonts can be rendered with varying appearance such
as bold, italic, or both bold and italic. The size of a character on screen is denoted by point
(e.g., 10-point, 12-point), which is a measure of the character height in inches. Although the
term is borrowed from typography (like the other terms), we do not use the original point
1
measure as used in typography. Instead, we use the definition that ‘a point equals to 72 of an
inch or ≈ 0.0139 inch’. This is also known as the DTP (desk top publishing) or postscript
point.
There are broadly two ways to render characters—bitmapped fonts and outlined fonts. In
bitmapped font, a pixel grid definition is maintained for each character. In the grid, those
pixels that are part of the character are marked as on pixels and the others are marked as
off pixels. The idea is illustrated in Fig. 9.7. In contrast, a character is defined in terms of
geometric primitives such as points and lines in the outlined definition of font. Before ren-
dering, the characters are scan converted (i.e., pixels are determined using scan conversion
methods such as those we discussed for points, lines, and circles). The idea is illustrated in
Fig. 9.8.
Clearly, bitmapped fonts are simple to define and fast to render (since we do not need
to compute pixels). However, they require large storage and do not look good after resizing
or reshaping. Also, the size of the bitmapped font depends on screen resolution. For exam-
ple, a 12-pixel high bitmap will produce a 12-point character in a 72 pixels/inch resolution.
However, the same bitmap will produce 9-point character in a 96 pixels/inch resolution.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 182 — #16


i i

182 Computer Graphics

On pixel
Off pixel

X
8 × 8 pixel grid

Fig. 9.7 Bitmap font definition for the character B

Line pixels
Vertex pixel

Fig. 9.8 The same character B of Fig. 9.7 is defined in terms of vertices and edges, as out-
line definition. The intermediate pixels are computed using scan conversion procedure during
rendering.

Outline fonts, on the other hand, require less storage and we can perform geometric trans-
formations with satisfactory effect to reshape or resize such fonts. Moreover, they are not
resolution-dependent. However, rendering is slow since we have to perform scan conversion
procedures before rendering.

9.5 ANTI-ALIASING
Let us consider Fig. 9.9. This is basically a modified form of Fig. 9.1, where we have
seen the pixels computed to render the line. As shown in Fig. 9.9, the scan converted line
(shown as a continuous line) does not look exactly like the original (shown as a dotted
line). Instead, we see a stair-step like pattern, often called the jaggies. This implies that,
after scan conversion, some distortion may occur in the original shape. Such distortions are
called aliasing (we shall discuss in the next section why it is called so). Some additional
operations are performed to remove such distortions, which are known as anti-aliasing
techniques.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 183 — #17


i i

Rendering 183

Scan converted
line
Actual line

X
(0, 0) 2 7

Fig. 9.9 Problem of aliasing—Note the difference between actual line (dotted) and scan
converted line

9.5.1 Aliasing and Signal Processing


Aliasing is usually explained in terms of concepts borrowed from the field of signal pro-
cessing. We know that in computer graphics, we are concerned about synthesizing images.
Think of it as a problem of rendering a true image (on window in view coordinate system)
to device. The image is defined in terms of intensity values, which can be any real num-
ber. Therefore, the intensity of a true image represents a distribution of continuous values.
In other words, a true image can be viewed as a continuous signal. The rendering process
then can be viewed as a two stage process: sampling of the continuous signal (i.e., comput-
ing pixel intensities) and then reconstructing the original signal as the set of colored pixels
on the display. When we reconstruct an original signal, the reconstructed signal is clearly
a false representation of the original. In English, when a person uses a false name, that is
known as an alias, and so it was adapted in signal analysis to apply to falsely represented
signals. As we have already seen, aliasing usually results in visually distracting artifacts.
Additional efforts go into trying to reduce or eliminate its effect, through techniques that we
call anti-aliasing.
A continuous intensity signal can be viewed as a composition of various frequency
components (i.e., primary signal of varied frequencies). The uniform regions of constant
intensity values correspond to the low frequency components, whereas values that change
abruptly and correspond to sharp edges are at the high end of the frequency spectrum.
Clearly, such abrupt changes in the intensity signal result in aliasing effects, which we need
to smoothen out. In other words, we need to remove (filter) high frequency components from
the (reconstructed) intensity signal. Consequently, we have the following two broad groups
of anti-aliasing techniques.

Pre-filtering It works on the true signal in the continuous space to derive proper val-
ues for individual pixels. In other words, it is filtering before sampling. There are various
pre-filtering techniques, often known as area sampling.

Post-filtering In post-filtering, we try to filter high frequency components of the sig-


nal from the sampled data (i.e., modify computed pixel values). In other words, it

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 184 — #18


i i

184 Computer Graphics

is filtering after sampling. The post-filtering techniques are often known as super
sampling.
Let us now get some idea about the working of these two types of filtering techniques.

9.5.2 Pre-filtering or Area Sampling


In pre-filtering techniques, we assume that a pixel has an area (usually square or circular
with unit radius), rather than being a dimensionless point. Lines passing through those pix-
els have some finite width. In other words, each line has some area. In order to compute pixel
intensity, we first determine the percentage p of pixel area occupied by the line. Assume the
original line color (either preset or computed from the stages of the graphics pipeline) is cl
and the background color is cb . Then, we set pixel intensity I = p.cl + (1 − p)cb . The idea
is illustrated in Fig. 9.10.
Y

Line (with
Pixel (0, 1) - area 2 width)
overlap 50%

0 X
0 1 2

Fig. 9.10 Illustration of area sampling technique. Each square represents a pixel area.
Depending on the area overlap of the line with the pixels, pixel intensities are set.

9.5.3 Gupta–Sproull Algorithm


The Gupta–Sproull algorithm is a pre-filtering technique for anti-aliased line drawing. In
this algorithm, the intensity of a pixel is set based on the distance of a line center from the
pixel center. The algorithm is based on the midpoint line drawing algorithm. Let us first
understand this algorithm.
Consider Fig. 9.11. Let us assume that we have just determined the pixel (xk , yk ). There are
two candidates for the next pixel: E(xk + 1, yk ) and NE (xk + 1, yk + 1). This is similar to the
Bresenham’s algorithm. However, instead of the decesion parameter based on the distance of
the line from the candidate pixels, here we consider the midpoint M(xk + 1, yk + 21 ) between
the candidate pixels. We know that a line can be represented as F(x, y) = ax + by + c,
where a, b, and c are integer constants. We can restate the same as F(x, y) = 2(ax + by + c)
without changing the nature of the equation. We require this matematical manipulation to
avoid some floating point operations. We set our decision variable as,

dk = F(M)
 
1
= F xk + 1, yk +
2 
1
= 2(a(xk + 1) + b yk + + c)
2

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 185 — #19


i i

Rendering 185

Y
Upper candidate pixel
NE(xk + 1, yk + 1)

Midpoint between
candidate pixels
M(xk + 1, yk + 1/2)

(xk, yk) X

Lower candidate pixel


E(xk + 1, yk)

Fig. 9.11 Midpoint line drawing algorithm. The decision variable is based on the midpoint
between the candidate pixels.

If d > 0, the midpoint is below the line. Thus, the pixel NE is closer to the line and we
choose it. In such a case, the next decision variable is,
 
1
dk+1 = F (xk + 1) + 1, (yk + 1) +
2
   
1
= 2 a((xk + 1) + 1) + b (yk + 1) + +c
2
   
1
= 2(a(xk + 1) + b yk + + c + 2(a + b)
2
= dk + 2(a + b)

Similarly, if dk ≤ 0, we choose the next pixel as E. In that case, we have


  
1
dk+1 = F (xk + 1) + 1, yk +
2
Expanding and rearranging as before, we get dk+1 = dk + 2a (check for your-
self). The initial decision variable is defined as d0 = F(x0 + 1, y0 + 21 ). Expanding,
we get
   
1
d0 = 2 a(x0 + 1) + b y0 + +c
2
 
1
= 2(ax0 + by0 + c) + 2 a + b
2
= F(x0 , y0 ) + 2a + b
= 2a + b since F(x0 , y0 ) = 0.

Algorithm 9.7 shows the pseudocode of this algorithm.


In the Gupta–Sproull algorithm, the basic midpoint algorithm is modified a little. Con-
sider Fig. 9.12. Suppose the current pixel is (xk , yk ). Based on the midpoint, we have chosen
pixel E in the next step. D is the perpendicular distance of E from the line. Using analytical

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 186 — #20


i i

186 Computer Graphics

Algorithm 9.7 Midpoint line drawing algorithm

1: Input: The two line end points (x1 , y1 and x2 , y2 )


2: Output: Set of pixels P to render the line segment
3: Determine the line constants a, b, and c from the end points
4: Determine the initial decision value d = 2a + b
5: Set x = x1 , y = y1
6: Add (x1 , y1 ) to P
7: repeat
8: if d > 0 then
9: Set x = x + 1, y = y + 1
10: Set d = d + 2(a + b)
11: else
12: Set x = x + 1
13: Set d = d + 2a
14: end if
15: Add (x, y) to P
16: until x < x2 − 1
17: Add (x2 , y2 ) to P

NE(xk + 1, yk + 1)
Dupper
Midpoint

1−v

(xk, yk) v

D 1+v
E(xk + 1, yk )

Dlower

Fig. 9.12 Illustration of distance calculation in the Gupta–Sproull algorithm

geometry, we can determine D as (left as an exercise for the reader),

d + △x
D= p (9.3)
2 △x2 + △y2

In the expression, d is the midpoint decision variable and △x and △y are the differences
in x and y coordinate values of the line endpoints, respectively. Note that the denominator is
a constant.
The intensity of E will be a fraction of the original line color. The fraction is determined
based on D. This is unlike Algorithm 9.7 where the line color is simply assigned to E. In
order to determine the fraction, a cone filter function is used. In other words, the more the
distance of the line from the chosen pixel center, the lesser will be the intensity. The function

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 187 — #21


i i

Rendering 187

is implemented in the form of a table. In the table, each entry represents the fraction with
respect to a given D.
In order to increase the line smoothness, the intensity of the two vertical neighbours of E,
namely the points (xk + 1, yk + 1) and (xk + 1, yk − 1) are also set in a similar way according
to their distances Dupper and Dlower respectively from the line. We can analytically derive the
two distances as (derivation is left as an exercise),

(1 − v)△x
Dupper = 2 p (9.4a)
2 △x2 + △y2
(1 + v)△x
Dlower = 2 p (9.4b)
2 △x2 + △y2

If instead of E, we have chosen the NE pixel, the corresponding expressions would be


(derivation is left as an exercise for the reader),

d − △x
D= p (9.5a)
2 △x2 + △y2
(1 − v)△x
Dupper = 2 p (9.5b)
2 △x2 + △y2
(1 + v)△x
Dlower = 2 p (9.5c)
2 △x2 + △y2

Note that in Eq. 9.5(b), Dupper is the perpendicular distance of the pixel (xk + 1, yk + 2)
and Dlower denotes the distance of the pixel E (xk + 1, yk ) from the line.
Thus in the Gupta–Sproull algorithm (Algorithm 9.8), we perform the following addi-
tional steps in each iteration of the midpoint line drawing algorithm.

Algorithm 9.8 Gupta–Sproull algorithm

1: if The lower candidate pixel E is chosen then


2: Compute DE = √d +2△x 2
2 △x + △y
3: else
4: Compute DNE = √d −2△x
2 △x + △y2
5: end if
6: Update d as in the regular midpoint algorithm
7: Set intensity of the current pixel (E or NE) according to D, determined from the table (the higher the
value, the lower the intensity)
(1 − v)△x
8: Compute Dupper = 2 √ 2
2 △x + △y2
(1 + v)△x
9: Compute Dlower = 2 √ 2
2 △x + △y2
10: Set the intensity of the two vertical neighbours of the current pixels according to Dupper and Dlower ,
determined from the table (the higher the value, the lower the intensity)

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 188 — #22


i i

188 Computer Graphics

Example 9.3
Let us understand the working of the Gupta–Sproull algorithm in terms of an example. Consider
the line segment shown in Fig. 9.13 between the two end points A(1,1) and B(4,3). Our objective
is to determine the following two things.
1. The pixels that should be colored to render the line.
2. The intensity values to be applied to the chosen pixels (and its vertical neighbours) to reduce
aliasing effect.

Y
Actual line

3 E′(3, 2)
NE(2, 2)

X
(0, 0) 4

Fig. 9.13 Gupta–Sproull algorithm example

Let us first determine the pixels to be choosen to render the line following the midpoint
algorithm (Algorithm 9.7). From the line end points, we can derive the line equation as 2x −
3y + 1 = 0 (see Appendix A for derivation of line equation from two end points). Thus, we have
a = 2, b = −3, and c = 1. Hence the initial decision value is: d = 1 (lines 3–4, Algorithm 9.7).
In the first iteration of the algorithm, we need to choose between the two pixels: the upper
candidate pixel NE (2,2) and the lower candidate pixel E (2,1) (see Fig. 9.11). Since d > 0, we
choose the NE pixel (2,2) and reset d = −1 (lines 8–10, Algorithm 9.7). In the next iteration,
the two possibilities are: the upper candidate pixel NE′ (3,3) and the lower candidate pixel E′
(3,2). Since now d < 0, we choose E′ (3,2) as the next pixel to render and reset d = 3 (lines
11–13, Algorithm 9.7). However, since now x = 3, the looping condition check fails (line 16,
Algorithm 9.7). The algorithm stops and returns the set of pixels {(1,1), (2,2), (3,2), (4,3)}. These
are the pixels to be rendered.
Next, we determine the intensity values for the choosen pixels and its two vertical neighbours
according to the Gupta–Sproull algorithm (Algorithm 9.8). We know that △x = 4 − 1 = 3 and
△y = 3 − 1 = 2. Let us start with the first intermediate pixel. Note that the first intermediate pixel
chosen is the upper candidate pixel NE (2,2). For this pixel, we have d = 1. Therefore, we compute
the perpendicular distance D from the line as,
DNE = √−1 (line 4, Algorithm 9.8)
13
Next, we have to compute the distances of the vertical neighbours of the choosen pixels Dupper
and Dlower . The line equation is 2x − 3y + 1 = 0. At the chosen pixel position, x = 2. Putting this
value in the line equation, we get y = 35 . Therefore, v = 35 − 1 = 23 (see Fig. 9.12). Hence,

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 189 — #23


i i

Rendering 189

Dupper = √1 (line 8, Algorithm 9.8)


13
Dlower = √1 (line 9, Algorithm 9.8)
13
We perform a table lookup to determine the fraction of the original line color to be applied to
the three pixels based on the three computed distances.
The next chosen point is the lower candidate pixel E′ (3,2). For this pixel, we have d = −1.
Hence, the perpendicular distance of the pixel from the line is computed as,
DE = √1
13
As in the previous case, we compute Dupper and Dlower for the two vertical neighbours of E′ .
For E′ , x = 3. We put this value in the line equation 2x − 3y + 1 = 0 to obtain y = 73 . Therefore,
v = 73 − 2 = 31 (see Fig. 9.12). Hence,
Dupper = √2
13
Dlower = √4
13
As before, we perform the table lookup to determine the fraction of the line color to be assigned
to these three pixels based on the three distances.

9.5.4 Super Sampling


In super sampling, each pixel is assumed to consist of a grid of sub-pixels (i.e., we effec-
tively increase the display resolution). In order to draw an anti-aliased line, we count the
number of sub-pixels through which the line passes in each pixel. This number is then used
to determine pixel intensity. The idea is illustrated in Fig. 9.14.
Another approach is to use a finite line width. In that case, we determine which sub-pixels
are inside the line (a simple check for this is to consider sub-pixels, whose lower left corners
are inside the line, to be inside). Then, pixel intensity is determined as the weighted average
of the sub-pixel intensities, where the weights are the fraction of sub-pixels that are inside or

A pixel with 2 × 2 2
sub-pixel grid

0
X
0 1 2

Fig. 9.14 The idea of super sampling with a 2 × 2 sub-pixel grid for each pixel. The pixel
intensity is determined based on the number of sub-pixels through which the line passes. For
example, in the pixel (0,0), the line passes through 3 sub-pixels. However, in (1,0), only one
sub-pixel is part of the line. Thus, intensity of (0,0) will be more than (1,0)

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 190 — #24


i i

190 Computer Graphics

Pixel (1, 1). Three of the four


Y sub-pixels are inside the line

2
Line with
width
1

0
X
0 1 2

Fig. 9.15 The idea of super sampling for lines with finite width

outside of the line. For example, consider Fig. 9.15 where each pixel is divided into a 2 × 2
sub-pixel grid.
Assume that the original line color is red (R = 1, G = 0, B = 0) and background is
light yellow (R = 0.5, G = 0.5, B = 0). Note that three sub-pixels (top right, bottom left,
and bottom right) are inside the line in pixel (1,1). Therefore, the fraction of sub-pixels that
are inside is 43 and the outside sub-pixel fraction is 41 . The weighted average for individual
intensity components (i.e., R, G, and B) for the pixel (1,1) therefore are,

3 1 7
AverageR = 1 × + 0.5 × =
4 4 8
3 1 1
AverageG = 0 × + 0.5 × =
4 4 8
3 1
AverageB = 0 × + 0 × = 0
4 4

Thus, the intensity of (1,1) will be set as (R = 78 , G = 81 , B = 0).


Sometimes, we use weighting masks to control the amount of contribution of var-
ious sub-pixels to the overall intensity of the pixel. The size of the mask depends
on the sub-pixel grid size. For example, for a 3 × 3 sub-pixel grid, we shall have a
3 × 3 mask. How do we determine pixel intensity from a given mask? Let us consider
an example.
Assume we have the following 3 × 3 mask.
 
121
2 4 2
121

Note that the intensity contribution of a sub-pixel is its corresponding mask value divided
4
by 16 (sum of all the values). For example, the contribution of the center sub-pixel is 16 .
Now suppose a line passes through (or encloses) the sub-pixels top, center, bottom left, and
bottom of a pixel (x, y). Thus, if the line intensity is cl (rl , gl , bl ) and the background color

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 191 — #25


i i

Rendering 191

is cb (rb , gb , bb ), then the pixel intensity can be computed as: Intensity = (total contribution
of sub-pixels) × line color + (1 − total contribution of sub-pixels) × background color for
each of the R, G, and B color components as we have done before.

SUMMARY
In this chapter, we learnt about the last stage of the 3D graphics pipeline, namely the rendering
of objects on the screen (also known as scan conversion or rasterization). We discussed ren-
dering for geometric primitives such as lines and circles. In line rendering, we started with the
simple and intuitive algorithm and saw its inefficiency in terms of the floating point operations
it requires. Only some and not all these operations can be eliminated in the DDA algorithm,
which thus offer little improvement. Bresenham’s algorithm is the most efficient as it renders a
line using integer operations only. The midpoint circle rendering algorithm similarly increases the
efficieny by performing mostly integer operations. However, unlike the Bresenham’s line drawing,
some floating point operations are still required in midpoint circle drawing.
An issue in interactive graphics is to render a fill area (i.e. an enclosed region). We discussed
the two ways to define a fill area, namely the pixel-level definition and geometric definition.
Depending on the definition, we discussed various fill area rendering algorithms such as seed
fill, flood fill, and scanline polygon fill. The first two rely on pixel-level definitions while the third
algorithm assumes a geometric definition of fill area.
A frequent activity in computer graphics is to display characters. We discussed both the
bitmap and outlined character rendering techniques along with their pros and cons.
Finally, the problem of distortion in original shapes (known as aliasing) that arises during
the rendering process is discussed, along with the various techniques (called anti-aliasing) to
overcome it. Following an explanation of the origin of the term aliasing with the signal pro-
cessing concepts, we mentioned and briefly discussed the two broad groups of anti-aliasing
techniques: pre-filtering and post-filtering. The pre-filtering or area sampling is discussed includ-
ing the Gupta–Sproull algorithm. We also learnt about the idea of various post-filtering or super
sampling techniques with illustrative examples.

BIBLIOGRAPHIC NOTE
Bresenham [1965] and Bresenham [1977] contain the original idea on the Bresenham’s
algorithm. More on the midpoint methods can be found in Kappel [1985]. Fill area scan con-
version techniques are discussed in Fishkin and Barsky [1984]. Crow [1981], Turkowski [1982],
Fujimoto and Iwata [1983], Korien and Badler [1983], Kirk and Arvo [1991], and Wu [1991]
can be referenced for further study on anti-aliasing techniques. The Computer Gems book
series ((Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994] and Paeth [1995]) contain
additional discussion on all these topics.

KEY TERMS
Aliasing – the distortions that may occur to an object due to scan conversion
Anti-aliasing – techniques to eliminate/reduce the aliasing effects
Bitmapped fonts – a character representation scheme in which each character is represented in
terms of on and off pixels in a pixel grid
Boundary defined – defining a fill area in terms of its boundary pixels

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 192 — #26


i i

192 Computer Graphics

Bresenham’s algorithm – a more efficient line scan conversion algorithm that works based on
integer operations only
DDA algorithm – a line scan conversion algorithm
Decision parameter – a parameter used in the Bresenham’s algorithm
Eight-connected – an interior pixel is connected to eight of its neighbours
Flood fill algorithm – a fill area scan conversion algorithm that works with the interior defined
regions
Font/Typeface – overall design style of a character
Four-connected – an interior pixel is connected to four of its neighbours
Geometric definition – defining a fill area in terms of geometric primitives such as edges and
vertices
Gupta-Sproull algorithm – a pre-filtering anti-aliasing technique for lines
Interior defined – defining a fill area in terms of its interior pixels
Midpoint algorithm – an algorithm for circle scan conversion
Outlined font – a character representation scheme in which each character is represented in
terms of some geometric primitives such as points and lines
Pixel level definition – defining a fill area in terms of constituent pixels
Point (of font) – size of a character
Post-filtering/Super sampling – anti-aliasing techniques that work on the pixels to modify their
intensities
Pre-filtering/Area sampling – anti-aliasing techniques that work on the actual signal and derive
appropriate pixel intensities
Scan conversion/Rasterization/Rendering – the process of mapping points from continuous
device space to discrete pixel grid
Scan line polygon fill algorithm – a fill area scan conversion algorithm that works with geometric
definition of fill regions
Seed – an interior pixel inside a fill area
Seed fill algorithm – a fill area scan conversion algorithm that works with the boundary defined
regions
Sub pixel – a unit of (conceptual) division of a pixel for super sampling techniques

EXERCISES
9.1 Discuss the role played by the rendering techniques in the context of the 3D graphics
pipeline.
9.2 Derive the incremental approach of the Bresenham’s line drawing algorithm. Algorithm 9.2
works for lines with slope 0 ≤ m ≤ 1 or −1 ≤ m ≤ 0. Modify the algorithm for slopes that
are outside this range.
9.3 The midpoint line drawing algorithm is shown in Algorithm 9.7. Is there any difference
between Algorithms 9.2 and 9.7? Discuss with respect to a suitable example.
9.4 Derive the incremental computation on which the midpoint circle algorithm is based.Explain
the importance of the eight-way symmetry in circle drawing algorithms.
9.5 Algorithm 9.3 works for circles having the origin as center. Modify the algorithm so as to
make it work for circles with any arbitrary center.
9.6 Explain the different definitions of an enclosed region with illustrative examples. Calculate
the pixels for the scanlines y = 3, y = 4, y = 5, and y = 6 in the example mentioned in
Section 9.3.3.
9.7 Algorithm 9.6 works for convex polygons only. Modify the algorithm, by incorporating the
steps for the simple inside-outside test, so that the algorithm works for concave polygons
also.

i i

i i
i i

“Chapter-9” — 2015/9/15 — 9:44 — page 193 — #27


i i

Rendering 193

9.8 Discuss the advantanges and disadvantages of the bitmapped and the outlined font ren-
dering methods. Suppose we have a 20′′ × 10′′ display with resolution 720 × 360. What
would be the bitmap size (in pixels) to produce a 12-point font on this display?
9.9 Explain the term aliasing. Why it is called so? How is the concept of filtering related to
anti-aliasing?
9.10 Discuss the basic idea of area sampling with an illustrative example.
9.11 Derive the expressions of Eqs. 9.3, 9.4, and 9.5 using analytical geometry. Modify
Algorithm 9.7 to include the Gupta–Sproull anti-aliasing algorithm.
9.12 Explain the basic idea of super sampling. Discuss, with illustrative examples other than the
ones mentioned in the text,the three super sampling techniques we learnt,namely (a) super
sampling for lines without width, (b) super sampling for lines with finite width, and (c) super
sampling with weighting masks. Do you think the use of masks offers any advantage over
the non-mask-based methods? Discuss.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 194 — #1


i i

CHAPTER
Graphics
10 Hardware and
Software
Learning Objectives
After going through this chapter, the students will be able to
• Review the generic architecture of a graphics system
• Get an overview of the input and output devices of a graphics system
• Understand the basics of the flat panel displays including the plasma panels, thin-
film electroluminescent displays, light-emitting diode (LED) displays, and liquid crystal
displays (LCDs)
• Get an overview of the common hardcopy output devices-printers and plotters
• Know about the widely used input devices including keyboards, mouse, trackballs,
spaceballs, joysticks, data gloves, and touch screen device.
• Learn the fundamentals of the graphics processing unit (GPU)
• Get an overview of shaders and shader programming
• Know about graphics software and software standards
• Learn the basics of OpenGL, a widely used open source graphics library

INTRODUCTION
We are now in a position to understand the fundamental process involved in depicting an
image on a computer screen. In very simple terms, the process is as follows: we start with
the abstract representation of the objects in the image, using points (vertices), lines (edges),
and other such geometric primitives in the 3D world coordinate; the pipeline stages are
then applied to convert the abstract representation to a sequence of bits (i.e., a sequence of
0’s and 1’s); the sequence is stored in the frame buffer and used by the video controller to
activate appropriate pixels on the screen, so that we perceive the image. So far, we have
discussed only the theoretical aspects of this process; how it works conceptually without
elaborating on the implemetation issues. Although we touched upon the topic of displaying
images on a CRT screen Chapter 1, it was very brief. In this chapter, we shall learn in more
detail the implementation aspect of the fundamental process. More specifically, we shall
learn about the overall architecture of a graphics system, the technology of few important
display devices, introductory concepts on the graphics processing unit (GPU), and how the
rendering process (the 3D pipeline) is actually implemented on the hardware. The chapter

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 195 — #2


i i

Graphics Hardware and Software 195

will also introduce the basics of OpenGL, an open source graphics library widely used to
write computer graphics programs.

10.1 GENERIC ARCHITECTURE


Let us begin by reviewing the generic architecture of a graphics system we learnt in Chap-
ter 1 (Refer to Fig. 1.7). Recall that there are five major hardware components apart from the
computer (which basically represents the system memory, CPU, and the system bus together),
which are specific to any graphics system: the display controller, the video controller, the
frame buffer (part of the video memory), the input devices, and the display screen.
Earlier, we understood the working of the system in terms of broad concepts. However,
after learning the pipeline stages, let us now try to understand the relationship between these
hardware components and the pipeline stages. Assume that we have written a program to
display two objects (a ball and a cube) on the screen. Once the CPU detects that the pro-
cess involves graphics operations, it transfers control to the display controller (thus, freeing
itself up for other activities). The controller contains its own processing unit (GPU). The
pipeline stages (geometric transformations, illumination, projection, clipping, hidden sur-
face removal, and scan conversion operations) are then performed on the object definitions

System
memory Video
Display controller
CPU controller

System bus

Display
I/O devices
(a)

Display Frame
memory buffer

Display Video
CPU controller controller

System bus

Display
I/O devices
System
memory

(b)

Fig. 10.1 Different ways to integrate memory in the generic graphics system (a) No sepa-
rate graphics memory is present and the system memory is used in shared mode by both
the CPU and the graphics controller (b) Controller has its own dedicated memory

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 196 — #3


i i

196 Computer Graphics

(as specialized instructions executed in the GPUs) to convert them to the sequence of bits.
The bit sequence gets stored in the frame buffer. In the case of interactive systems, the frame
buffer content may be changed depending on the input coming from the input devices such
as a mouse.
The video controller acts based on the frame buffer content (the bit sequence). The job of
the video controller is to map the value represented by the bit(s) in each frame buffer location
to the activation of the corresponding pixel on the display screen. For example, in the case
of CRT devices, such activation refers to the excitation (by an appropriate amount) of the
corresponding phosphor dots on the screen. Note that the amount of excitation is determined
by the intensity of the electron beam, which in turn is determined by the voltage applied on
the electron gun, which in turn is determined by the frame buffer value.
The frame buffer is only a part of the video memory required to perform graphics oper-
ations. Along with the frame buffer, we also require memory to store object definitions and
instructions for graphics operations (i.e., to store code and data as in any other program).
The memory can be integrated in the generic architecture either as shared system mem-
ory (shared by CPU and GPU) or dedicated graphics memory (part of graphics controller
organization). The two possibilities are illustrated in Fig. 10.1. Note that when the memory
is shared, the execution will be slower since the data transmission takes place through the
common system bus.

10.2 INPUT AND OUTPUT OF GRAPHICS SYSTEM


Whenever we talk of a graphics system, the primary output device that comes to our mind
is a video monitor. Sometimes, instead of monitors, outputs are projected using projectors.
As we know, both can be present together in a graphics system. In addition, the system may
have a third mode of output: hardcopy output through printers or plotters. Head-mounted
displays are also another way of displaying output to a viewer.

10.2.1 Video Monitors


Till very recently, the ubiquitous video monitor was the cathode ray tube or CRT monitor.
We have already seen in Chapter 1 the basic idea behind the working of a CRT. Of late, the
flat panels have started replacing the CRTs. Such screens are much thinner and lighter than
CRTs. As a result, they are useful for both non-portable and portable systems. Consequently,
we can see them almost everywhere around us, in desktops, laptops, palmtops, calculators,
advertising boards, pocket video-game console, wrist-watch, and so on.

Flat Panel Displays


Let us have a look-at the technology behind flat panel displays. The very first thing we
should remember is that flat panel is a generic term indicating a display monitor having a
(much) reduced volume, weight, and power consumption compared to a CRT. Technology-
wise, they are of two types—emissive displays and non-emissive displays. Emissive displays
(or emitters) are those that convert electrical energy into light on the screen. Prominent
examples of such devices are the plasma panels, thin-film electroluminescent displays, and
light-emitting diodes (LEDs). In the case of non-emissive displays (or non-emitters), no

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 197 — #4


i i

Graphics Hardware and Software 197

Inside region (with


gaseous mixture)

Horizontal conductor
(inside of the plate) Vertical conductor
(inside of the plate)

Pixel (intersection region between


Glass plates the opposite conducting pairs)

Fig. 10.2 Schematic illustration of the basic plasma panel design

electrical-to-optical energy conversion takes place. Instead, such devices convert light (either
natural light or light from some other sources) to a graphics pattern on the screen through
some optical effects. A widely used non-emissive display is the liquid crystal display (LCD).

Plasma Panels
The schematic of a plasma panel is shown in Fig. 10.2. As the figure illustrates, there are two
glass plates placed parallelly. The region between the plates is filled with a mixture of gases
(xeon, neon, and helium). The inner walls of each glass plate contains a set of parallel con-
ductors (very thin and shaped like ribbons). One plate has a set of vertical conductors while
the other contains a set of horizontal conductors. The region between each corresponding
pair of conductors on the two plates (e.g., two consecutive horizontal and opposite vertical
conductors) form a pixel. The screen side wall of the pixel is coated with phosphors like
in CRT (three phosphors corresponding to RGB for color displays). With the application
of appropriate firing voltage, the gas in the pixel cell breaks down into electrons and ions.
The ions rush towards the electrodes and collide with the phosphor coating emitting lights.
Separation between pixels is achieved by the electric fields of the conductors.

LED Displays
Thin-film electroluminescent displays are similar in construction to plasma panels; they also
have mutually perpendicular sets of conducting ribbons on the inside walls of two parallel
glass plates. However, instead of gases, the region between the glass plates is filled with a
phosphor, such as zinc sulphide doped with manganese. The phosphor becomes a conductor
at the point of intersection when a sufficiently high voltage is applied to a pair of cross-
ing electrotodes. The manganese atom absorbes the electrical energy and releases photons,
generating the perception of a glowing spot or pixel on the screen. Such displays, however,
require more power than plasma panels. Also, good color displays with this technology is
difficult to achieve.
Light emitting diode or LED displays are another type of emissive devices, which are
becoming popular nowadays. In such devices, each pixel position is represented by an LED.
Thus, the whole display is a grid of LEDs corresponding to the pixel grid. Based on the frame
buffer information, suitable voltage is applied to each diode to make it emit appropriate
amount of light.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 198 — #5


i i

198 Computer Graphics

Liquid Crystal Displays


The most well-known non-emissive display is the liquid crystal display or LCD. In an LCD,
there are two parallel glass plates as before. Each contains a light polarizer that is aligned in
a perpendicular way to the other (i.e., one plate contains the vertical polarizer while the other
contains the horizontal polarizer). Rows of horizontal, transparent conductors are placed on
the (inside) surface of one plate (having the vertical polarizer) while columns of vertical,
transparent conductors are placed on the other. A liquid crystal material is put in between
the plates. The term liquid crystal refers to materials having crystalline molecular arrange-
ment, though they flow like liquids. The material used in flat-panel LCDs typically contain
nematic or thread-like crytalline molecules. The rod-shaped molecules tend to align along
Vertical conductor
Liquid crystal with
rod-shaped cells

Light

Horizontal polarizer

Horizontal conductor Vertical polarizer

(a)

Vertical conductor
Liquid crystal with
rod-shaped cells

Light

Horizontal polarizer

Horizontal conductor Vertical polarizer


(b)

Fig. 10.3 The working of a transmissive LCD. A light source at the back sends light through
the polarizer. The polarized light gets twisted and passes through the opposite polarizer to
the viewer in the active pixel state, shown in (a). (b) shows the molecular arrangement of
the liquid crystal at a pixel position, after voltage is applied to the perpendicular pair of con-
ductors on the opposite glass plates. The arrangement prevents light from passing between
polarizers, indicating deactivation of the pixel.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 199 — #6


i i

Graphics Hardware and Software 199

their long axes. The intersection points of each pair of mutually perpendicular conductors
define pixel positions. When a pixel position is active, the molecules are aligned as shown
in Fig. 10.3(a). In a reflective display, external light enters through one polarizer and gets
polarized. The molecular arrangement of the liquid crystal ensures that the polarized light
gets twisted so that it can pass through the opposite polarizer. Behind the polarizer, a reflec-
tive surface is present and the light is reflected back to the viewer. In a transmissive display,
a light source is present on the back side of the screen. Light from the source gets polar-
ized after passing through the back side polarizer, twisted by the liquid crystal molecules
and passes through the screen-side polarizer to the viewer. In order to de-activate the pixel,
a voltage is applied to the intersecting pair of conductors. This leads to the molecules in
the pixel region (between the conductors) getting arranged as shown in Fig. 10.3(b). This
new arrangement prevents the polarized light to get twisted and pass through the oppo-
site polarizer. The technology, both reflective and trasmissive, described here is known as
passive-matrix LCD. Another method for constructing LCDs is to place thin-film transistors
at each pixel location to have more control on the voltage at those locations. The transistors
also help prevent charges to be leaking out gradually to the liquid crystal cells. Such types
of LCDs are called active matrix LCDs.

10.2.2 Printers and Plotters


The primary means of hardcopy output, that is the printers, can be of two types—impact and
non-impact printers. In impact printers, pre-formed character faces are present. These are
pressed against an inked ribbon on the paper. An example of impact printers is a line printer,
in which the typefaces are mounted on bands, chains, drums, or wheels. In line printers, the
whole line gets printed at a time. Character printers, on the other hand, print one character
at a time. One widely used character print device, till very recent times, was the dot-matrix
printer. As the name suggests, the print head contained a rectangular array (i.e., matrix) of
portruding wire pins (the dots). The number of pins in the matrix determines the print qual-
ity: higher the number, better the quality. Each of these pins can be retracted inwards. During
printing of a character or graphics pattern, some pins are retracted. The remaining pins then
press against the ribbon on the paper, printing the character or pattern.
Non-impact printers and plotters use laser techniques, ink-jet sprays, electrostatic, and
electrothermal methods to get images onto paper. In a laser device, a laser beam is applied
on a rotating drum. The drum is coated with photo-electric material such as selenium. Con-
sequently, a charge distribution is created on the drum. The toner is then applied to the drum,
which gets transferred to the paper. In ink-jet printers, an electrically charged ink stream is
sprayed in horizontal rows across a paper wrapped around a drum. Using electrical fields that
deflect the charged ink stream, dot-matrix patterns of ink are created on the paper. An elec-
trostatic device places a negative charge on the paper (at selected dot positions determined
by the frame buffer values), one row at a time. The paper is then exposed to a positively
charged toner, which gets attracted to the negatively charged areas, producing the desired
output. Another printing technique is the electrothermal method. In it, heat is applied to a
dot-matrix print head (on selected pins). The print head is then used to put patterns on a
heat-sensitive paper.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 200 — #7


i i

200 Computer Graphics

Spare pen in
different colours Moving arm
Pen

Moving pen
carriage

Fig. 10.4 Flatbed pen plotter device

In order to get colored printouts, impact printers use different-colored ribbons. However,
the range of color produced and the quality are usually limted. Non-impact printers, on
the other hand, are good at producing color images. In such devices, color is produced by
combining the three color pigments (cyan, magenta, and yellow) (see the discussion on
the CMY model in Chapter 5). Laser and electrostatic devices deposit the three pigments
on separate passes; the three colors are shot together on a single pass along each line in
ink-jet printers.
Plotters are another class of hardcopy graphics output devices, which are typically used to
generate drafting layouts and other drawings. In a pen plotter (see Fig. 10.4), one/more pens
are mounted on a carriage, or crossbar, that spans a sheet of paper. The paper can lie flat or
rolled onto a drum or belt and held in place with clamps, a vaccum or an electrostatic charge.
In order to generate different shading and line styles, pens with varying colors and widths
are used. Pen-holding crossbars can either move or remain stationary. In the latter case, the
pens themselves move back and forth along the bar. Instead of pen, ink-jet technology is also
used to design plotters.

10.2.3 Input Devices


Graphics systems usually provide data input facilities for the user, through which users can
manipulate the screen image. The ubiquitous keyboards and mouse, found with any com-
puter, are two examples of such graphics input devices. There are a variety of other devices
also, some of which are designed specifically for interactive computer graphics systems.
Most popular of these devices are discussed in this section.
Keyboards We are all familiar with the alphanumeric keyboards (physical or virtual) that
comes with most graphics systems. The alphanumeric keys in a keyboard are used for text
and command inputs. In addition, keyboards typically contain some function and cursor-
control keys also. With the function keys, frequently performed operations can be selected
with a single key. Cursor-control keys allow the user to set the screen cursor position or
select menu items.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 201 — #8


i i

Graphics Hardware and Software 201

Mouse A mouse is primarily used to position the cursor on the screen, select on-screen
items, and perform a host of menu operations. Wheels or rollers at the bottom of the mouse
record the amount and direction of mouse movement which is converted to screen-cursor
movement. Sometimes, instead of a roller, optical sensing techniques are used to deter-
mine the amount and direction of mouse movement. There are between one to three buttons
present on a mouse, although the two-button mouse is more common. Along with the but-
tons, it may be equipped with a wheel to perform positioning operations more conveniently.
A mouse may be attached to the main computer with wires or it may be a wireless mouse.

Trackballs and spaceballs Similar to a mouse, trackball devices (Fig. 10.5) are used to
position screen-cursors. A trackball device contains a ball. When the ball is rotated by a fin-
ger/palm/hand, a screen-cursor movement takes place. A potentiometer is connected to the
ball and measures the amount and direction of the ball rotation, which is then mapped to the
screen-cursor movement. While the term trackball typically denotes devices used to con-
trol cursor in a 2D-space (screen), cursor-control in 3D-space is done through spaceballs. A
spaceball provides six degrees of freedom. However, in a spaceball, there are no actual ball
movements. Instead, the ball is pushed and pulled in various directions, which is mapped to
cursor positioning in 3D space.

Joysticks Another positioning device is the joystick (Fig. 10.6). It contains a small, ver-
tical lever (called stick) attached to a base. The stick can be moved in various directions to
move the screen-cursor. The amount and direction of cursor movement is determined by the

Fig. 10.5 Trackball device

Fig. 10.6 Illustrative example of a joystick device

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 202 — #9


i i

202 Computer Graphics

amount and direction of stick movement from its center position (measured with a poten-
tiometer mounted at the base). In isometric joysticks, however, no movable sticks are present.
Instead, the sticks are pushed or pulled to move on-screen cursor.
Data gloves Commonly used in virtual reality systems, data gloves are devices that allow
a user to position and manipulate virtual objects in a more natural way—through hand or
finger gestures. It is a glove-like device, containing sensors. These sensors can detect the
finger and hand movements of the glove-wearer and map the movement to actions such as
grasping a virtual object, moving a virtual object, rotating a virtual object, and so on. The
position and orientation information of the finger or hand are obtained from electromagnetic
coupling of transmitter and receiver antennas. Each of the transmitting and receiving anten-
nas are constructed as a set of mutually perpendicular coils, thereby creating a 3D cartesian
reference frame.
Touch screens Touch input systems are the preferred mode of input in most consumer-
grade graphics systems nowadays. As the name suggests, on-screen elements are selected
and manipulated through the touch of fingers or stylus (a special pen-like device) in touch
screen systems. Touch input can be recorded using electrical, optical, or accoustic methods.
In optical touch screens, an array of infrared LEDs are placed along one vertical and one hor-
izontal edge. Light detectors are placed along the opposite horizontal and vertical edges. If
a position on the screen is touched, lights coming from the vertical and horizontal LEDs get
interrupted and recorded by the detectors, giving the touch location. In an electrical touch
screen, there are two transparent plates separated by a small distance. One plate is coated
with conducting material while the other is coated with resistive material. When the outer
plate is touched, it comes into contact with the inner plate. This creates a voltage drop across
the resistive plate, which is converted to the coordinate value. Less common are the acoustic
touch screens devices, in which high-frequency sound waves are generated in horizontal and
vertical directions across a glass plate. Touching the screen results in reflecting part of the
waves (from vertical and horizontal directions) back to the emitters. The touch position is
computed from the time interval between transmission of each wave and its reflection back
to the emitter.
Apart from these, many more techniques are used to input data to a graphics system.
These include image scanners (to store drawings or photographs in a computer), digitiz-
ers (used primarily for drawing, painting, or selecting positions), light pens (pencil shaped
device used primarily for selecting screen positions), and voice-based input system (using
speech-recognition technology).

10.3 GPU AND SHADER PROGRAMMING


A characteristic of graphics operations is that they are highly parallel in nature. For exam-
ple, consider the modeling transformation stage of the pipeline. In this stage, we need to
apply transformations (e.g., rotation) to the vertices. A transformation, as we have seen, is
nothing but the multiplication of the transformation matrix with the vertex vector. The same
vector-matrix multiplication is required to be performed for all the vertices which are part
of the scene that we want to transform. Instead of going in for serial multipliclication of one

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 203 — #10


i i

Graphics Hardware and Software 203

matrix-vector pair at a time, if we can apply the operation on all vectors at the same time,
there will be significant gain in performance. The gain becomes critical in real-time render-
ing, where millions of vertices need to be processed per second. CPUs, owing to their design,
cannot take advantage of this inherent parallelism in graphics operations. As a result, almost
all graphics systems nowadays come with a separate graphics card containing its own pro-
cessing unit and memory elements. The processing unit is known as the graphics processing
unit or GPU.

10.3.1 Graphics Processing Unit


A GPU is a multicore system: it contains a large number of cores or unit processing elements.
Each of these cores is a stream processor—it works on data streams. The cores are capable
of performing simple integer and floating point arithmetic operations only. Multiple cores
are grouped together to form streaming multiprocessors (SM). To understand the working
of SMs, let us revisit our previous example of geometric transformation of vertices. Note
that the instruction (multiplication) is same; the data (vertex vectors) varies. Consequently,
what we have is known as single instruction multiple data (SIMD). The idea is illustrated
in Fig. 10.7. Each SM is designed to perform SIMD operations. Organization of GPU cores
and their interconnection to various storage elements (local and global) are schematically
depicted in Fig. 10.8.
Let us try to understand how the 3D graphics pipeline is implemented with the GPU.
Most real-time graphics systems assume that everything in a scene is made of triangles.

Instruction

Core Core Core Core

Data

(a)

Scalar operation SIMD operation

A0 + B0 = C0
A0 B0 C0
+ =
A1 + B1 = C1 A1 B1 C1

(b) (c)

Fig. 10.7 Single instruction multiple data (SIMD) (a) The idea (b) Serial additions performed
on inputs (c) Same output obtained with a single addition applied on data streams

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 204 — #11


i i

204 Computer Graphics

Host

Input assembler

Thread execution manager

Core Core

Streaming Streaming Core Core


multiprocessor multiprocessor
.......... Core Core
Parallel data Parallel data
cache cache Core Core

Textute storage Textute storage A streaming


multiprocessor containing
several streaming
processor(cores)

Load/store .......... Load/store

Global memory

Fig. 10.8 An illustrative GPU organization. Each core is a very simple processing unit
capable of performing simple floating point and integer arithmetic operations only.

Surfaces that are not expressed in terms of triangles, such as quadrilaterals or curved surface
patches (see Chapter 2), are converted to triangular meshes. Through the APIs supported in
a computer graphics library, such as OpenGL or Direct3D, the triangles are sent to the GPU
one vertex at a time. The GPU assembles vertices into triangles as needed.
The vertices are expressed with homogeneous coordinates (see Chapter 3). The objects
they define are represented in local or modeling coordinate system. After the vertices have
been sent to the GPU, it performs modeling transformations on these vertices. The transfor-
mation (single or composite), as you may recall, is acheived with a single matrix-vector
multiplication: the matrix represents the transformation while the vector represents the
vertex. The multicore GPU architecture can be used to perform multiple such operations
simultaneously. In other words, multiple vertices can be simultaneously transformed. The
output of this stage is a stream of triangles, all represented in a common (world) coordinate
system in which the viewer is located at the origin and the direction of view is aligned with
the z-axis.
In the third stage, the GPU computes the color of each vertex based on the light defined
for the scene. Recall the structure of the simple lighting equation we discussed in Chapter 4.
The color of any vertex can be computed by evaluating vector dot products and a series of

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 205 — #12


i i

Graphics Hardware and Software 205

add and multiply operations. In a GPU, we can perform these operations simultaneously for
multiple vertices.
In the next stage, each colored 3D vertex is projected onto the view plane. Similar to the
modeling transformations, the GPU does this using matrix-vector multiplication (see Chap-
ter 6 for the maths involved), again leveraging efficient vector operations in hardware. The
output after this stage is a stream of triangles in screen or device coordinates, ready to be
converted to pixels.
Each device space triangle, obtained in the previous stage, overlaps some pixels on the
screen. In the rasterization stage, these pixels are determined. GPU designers over the years
have incorporated many rasterization algorithms, such as those we discussed in Chapter 9.
All these algorithms exploit one crucial observation: each pixel can be treated indepen-
dently from all other pixels. This leads to the possibility of handling all pixels in parallel.
Thus, given the device space triangles, we can determine the color of the pixels for all pixels
simultaneously.
During the pixel processing stage, two more activities take place—surface texturing and
hidden surface removal. In the simplest surface texturing method, texture images are draped
over the geometry to give the illusion of detail (see Chapter 5). In other words, the pixel color
is replaced or modified by the texture color. GPUs store the textures in high-speed memory,
which each pixel calculation must access. Since this access is very regular in nature (nearby
pixels tend to access nearby texture image locations), specialized memory caches are used
to reduce memory access time. For hidden surface removal, GPUs implement the depth(Z)-
buffer algorithm. All modern-day GPUs contain a depth-buffer as a dedicated region of
its memory, which stores the distance of the viewer from each pixel. Before writing to the
display, the GPU compares a pixel’s distance with the distance of the pixel that is already
present. The display memory is updated only if the new pixel is closer (see the depth-buffer
algorithm in Chapter 8 for more details).

10.3.2 Shaders and Shader Programming


Note that in the previous discussion, we covered all the pipelines stages in two broad group
of activities—vertex (or geometry) processing and pixel processing. During its early years
of evolution, GPUs used to come with fixed-function hardware pipeline, that is all the stages
are pre-programmed and embedded into the hardware. In other words, a GPU contained
dedicated components for specific tasks. The idea is illustrated in Fig. 10.9. However, in
order to leverage the power of GPUs in a better way, modern day GPUs are designed to be
programmable. Fixed-function units for transforming vertices and texturing pixels have been
replaced by unified grid of processors, known as shaders. All these processing units can be
used to do the calculations for any pipeline stage, as illustrated in Fig. 10.10.

2D screen
User Primitives Geometry coordinates Pixel
program processing processing

Fig. 10.9 Schematic of a fixed-function GPU stages—the user has no control on how it
should work and what processing unit performs which stage of the pipeline

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 206 — #13


i i

206 Computer Graphics

3D geometric primitives

GPU

Programmable unified processors

Vertex Geometry Pixel


program program program

Hidden surface
Rasterization
removal

GPU memory (DRAM)

Fig. 10.10 The idea of programmable GPU. The GPU elements (processing units and
memory) can be reused through user programs.

With a programable GPU, it is possible for the programers to modify how the hardware
processes the vertices and shades pixels. They can do so by writing vertex shaders and frag-
ment shaders (also known as vertex programs and fragment programs). This is known as
shader programming (also known by many other names that include GPU programming and
graphics hardware programming). As the names suggest, vertex shaders are used to pro-
cess vertices (i.e., geometry)—modeling transformations, lighting, and projection to screen
coordinates. Fragment shaders are programs that perform the computations in the pixel pro-
cessing stage and determine how each pixel is shaded (rendering), how texture is applied
(texture mapping), and if a pixel should be drawn or not (hidden surface removal). The term
fragment shader is used to denote the fact that a GPU at any instant can process a subset
(or fragment) of all the screen pixel positions. These shader programs are small pieces of
codes that are sent to the graphics hardware from the user programs, but they are executed
on the graphics hardware. The ability to program GPUs gave rise to the idea of a general
purpose GPU or GPGPU; we can use the GPU to perform tasks that are not related to
graphics at all.

10.4 GRAPHICS SOFTWARE AND OPENGL


The software used for computer graphics are broadly of two types—special-purpose pack-
ages and general programming packages. Special-purpose packages are designed for the
non-programmers. These are complete software systems with their own graphical user inter-
face. An example of such a package is a painting system, where an artist can select objects
of various shapes, color them, place them at the desired screen position, change their size,
shape, and orientation, and many more just by interacting with the interface. No knowledge
of the graphics pipeline is required. Other examples include various CAD (computer-aided
design) packages used in architectural, medical, business, and engineering domains. A gen-
eral programming package, in contrast, provides a library of graphics functions that are to
be used in a programming language such as C, C++, or Java. The graphics functions are

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 207 — #14


i i

Graphics Hardware and Software 207

Graphics software standards


When programs are written with graphics func- known as the PHIGS (programmer’s hierarchical
tions, they may be moved from one hardware interactive graphics standard), and adopted by
platform to another. Without some standards (i.e., standards organizations worldwide.
a commonly agreed syntax), this is not possible While GKS and PHIGS were being developed,
and we need to rewrite the whole program. In the Silicon Graphics Inc. (SGI) started to ship
order to avoid such problems and ensure porta- their graphics workstations with a set of routines
bility, efforts were made to standardize computer called graphics library (GL). The GL became very
graphics software. The first graphics software popular and eventually evolved as the OpenGL
standard was developed in 1984, known as the (in the early 1990s), a de-facto graphics stan-
graphics kernel system or GKS. It was adopted dard. It is now maintained by the OpenGL Archi-
by the ISO (International Standards Organization) tecture Review Board, a consortium of repre-
and many other national standards bodies. A sec- sentatives from many graphics companies and
ond standard was developed by extending GKS, organizations.

designed to perform various tasks that are part of the graphics pipeline such as object defini-
tion, modeling transformation, color assignment, projection, and display. Examples of such
libraries include OpenGL (Open source Graphics Library), VRML (Virtual-Reality Mod-
eling Language), and Java 3D. The functions in a graphics library are also known as the
computer graphics application programming interface (CG API) since the library provides
a software interface between a programming language and the hardware. So when we write
an application program in C, the graphics library functions allow us to construct and display
a picture on an output device.
Graphics functions in any package are typically defined independent of any program-
ming language. A language binding is then defined for a particular high-level programming
language. This binding gives the syntax for accessing various graphics functions from that
language. Each language binding is designed to make the best use of the capabilities of
the particular language and to handle various syntax issues such as data types, parameter
passing, and errors. The specifications for language bindings are set by the International
Standards Organization. In the following, we learn the basic idea of a graphics library with
an introduction to OpenGL, a widely-used open source graphics library, with its C/C++
binding.

10.4.1 OpenGL: An Introduction


Consider Fig. 10.11, showing the program for displaying a straight line on the screen written
using the OpenGL library functions in C. As you can see, there are several functions in the
program. Let us try to understand each line of the code.

GLUT Library
The first thing we do is to include the header file containing the graphics library functions.
Thus, the very first line in our program is
#include<GL/glut.h>

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 208 — #15


i i

208 Computer Graphics

#include<GL/glut.h>
void init (void){
glClearColor (1.0, 1.0, 1.0, 0.0);
glMatrixMode (GL_PROJECTION)
gluOrhto2D (0.0, 800.0, 0.0, 600.0)
}
void createLine (void){
glClear (GL_COLOR_BUFFER_BIT);
glColor3f (0.0, 1.0, 0.0);
glBegin (GL_LINES);
glVertex2i (200, 100);
glVertex2i (20, 50);
glEnd ();
glFlush ();
}
void main (int argc, char** argv){
glutInit (& argc, argv);
glutInitDisplayMode (GLUT_SINGLE | GLUT_RGB);
glutInitWindowPosition (0, 0);
glutInitWindowSize (800, 600);
glutCreateWindow ("The OpenGL example");
init ();
glutDisplayFunc (createLine);
glutMainLoop ();
}

Fig. 10.11 Example OpenGL program

The OpenGL core library does not provide support for input and output, as the library
functions are designed to be device-independent. However, we have to show the line on the
display screen.Thus, auxiliary libraries are required for the output, on top of the core library,
which is provided in the GLUT or OpenGL Utility Toolkit library. GLUT provides a library
of functions for interacting with any screen-windowing system. In other words, the functions
in the GLUT library allow us to set up a display window on our video screen (a rectangu-
lar area on the screen showing the picture, in this case the line). The library functions are
prefixed with glut. Since GLUT functions provide interface to other device-specific window
systems, we can use GLUT to write device independent programs. Note that GLUT is suit-
able for graphics operations only. We may require to include other C/C++ header files such
as <stdio.h> or <stdlib.h> along with GLUT.

Managing the Display Window: The main() Function


Inclusion of GLUT allows us to create and manage the display window, that is, the region
on the screen where we see the line. The first thing required is to initialize GLUT. This we
do with the following statement.
glutInit (& argc, argv);
After initialization, we can set various options for the display window, using the glu-
tInitDisplayMode function. This function takes symbolic GLUT constants as arguments.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 209 — #16


i i

Graphics Hardware and Software 209

For example, the following line of code written for the example program specifies that
a single refresh buffer is to be used for the display window and the RGB color mode is
to be used for selecting color values. Note the syntax used to represent symbolic GLUT
constants: the prefix GLUT is added followed by an underscore (‘_’) to each constant
name and is written in capital letters. The two constants are combined using a logical
OR operation.
glutInitDisplayMode (GLUT_SINGLE | GLUT_RGB);
Although GLUT provides for some default position and size of the display window, we
can change those. The following two lines in the example are used for the purpose. As the
names suggest, the glutInitWindowPosition function allows us to specify the window loca-
tion. This is done by specifying the top-left corner position of the window (supplied as
argument to the function). The position is specified in integer screen coordinates (the X and
Y pixel coordinates, in that order), assuming that the origin is in the top-left corner of the
screen. The glutInitWindowSize function is used to set the window size. The first argument
specifies the width of the window. The window height is specified with the second argument.
Both are specified in pixels.
glutInitWindowPosition (0, 0);
glutInitWindowSize (800, 600);
Next, we create the window and set a caption (optional) with the following function. The
argument of the function, that is, the string within the quotation, is the caption.
glutCreateWindow ("The OpenGL example");
Once the window is created, we need to specify the picture to be displayed in the
window. In our example, the picture is simply the line. We create this picture in a
separate function createLine, which contain OpenGL functions. The createLine func-
tion is passed as an argument to the glutDisplayFunc indicating that the line is to be
displayed on the window. However, before the picture is generated, certain initializa-
tions are required. We perform these initializations in the init function (to make our
code look nice and clean). Hence, the following sequence of lines are added to our
main program.
init ();
glutDisplayFunc (createLine);
The display window, however, is not yet on the screen. What we need is to activate it,
once the window content is decided. Thus, we add the following statement. This statement
activates all display windows we have created along with their graphic contents.
glutMainLoop ();
This function must be the last one in our program. Along with displaying the initial graph-
ics, it puts the program into an infinite loop. In this loop, the program waits for inputs from
devices such as mouse or keyboard. Even if no input is available (like in our example), the
loop ensures that the picture is displayed till we close the window.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 210 — #17


i i

210 Computer Graphics

Basic OpenGL Syntax


In the main body of the example program, we have mostly the GLUT library functions.
However, in the two functions init and createLine, core OpenGL library functions are used.
The syntax followed by these functions is different from those of the GLUT functions, and
is described as follows.
Each openGL function is prefixed with gl. Also, each component word within the func-
tion name has its first letter capitalized. The naming convention is illustrated in the following
examples.
glClear glPolygonMode
Sometimes, some functions require that one or more of their arguments be assigned sym-
bolic constants. Example of such constants include a parameter name, a parameter value, or
a particular mode. All such constants begin with the uppercase letter GL. Each component
of the name is written in capital letters and are separated by the underscore (‘_’) symbol. A
few examples are shown here for illustration.
GL_RGB GL_AMBIENT_AND_DIFFUSE
The openGL functions also expect specific data types, a 32 bit integer as a param-
eter value, for example. For the purpose, OpenGL uses built-in data type names.
Each name begins with the capital letters GL. This is followed by the data type
name (the standard designations for various data types) written in lower-case letters,
such as
GLbyte GLdouble

Initialization: The init() Function


In the init function, initializations and one-time parameter settings are done. There are three
OpenGL library routines called in this function. The first routine is,
glClearColor (1.0, 1.0, 1.0, 0.0);
This OpenGL routine is used to set a background color to our display window. The
color is specified with the red, green, and blue (RGB) components. The RGB component
values are supplied through the first three arguments, in that order. Thus, in our exam-
ple, we are setting the window background to be white having R = 0.1, G = 0.1 and
B = 0.1 values. If we set all components to 0.0, we will get the color black. If the com-
ponents are set to any other value between 0.0 and 1.0, we get some shade of gray. The
fourth parameter in the function is called the alpha value for the specified color. It is used
as a blending parameter: specifying the way to color two overlapping objects. A value of
0.0 implies the objects are totally transparent and a value of 1.0 indicates totally opaque
objects.
Although we are displaying a line, which is a 2D object, OpenGL does not treat 2D pic-
ture generation separately. It treats 2D pictures as a special case of 3D viewing. So the
entire 3D pipeline stages has to be performed. In our example, therefore, we need to specify
the projection type and other viewing parameters. These are done with the following two
functions.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 211 — #18


i i

Graphics Hardware and Software 211

glMatrixMode (GL_PROJECTION)
gluOrhto2D (0.0, 800.0, 0.0, 600.0)
Note that although the first function is an OpenGL routine (prefixed with gl), the sec-
ond function is pefixed with glu. This indicates that the second function is not part of the
core OpenGL library. Instead, it belongs to the GLU or the OpenGL Utility, an auxiliary
library that provides routines for complex tasks including setting up of viewing and projec-
tion matrices, describing complex objects with line and polygon approximations, processing
the surface-rendering operations and displaying splines with linear approximations. Together
the two functions specify that an orthogonal projection is to be used to map the line from the
view plane to the screen. The view plane window is specified in terms of its lower-left (0.0,
0.0) and top-right corners (800.0, 600.0). Anything outside this boundary will be clipped out.

Creating the Picture: The createLine() Function


This is the function that actually creates the line. The first line of the function is,
glClear (GL_COLOR_BUFFER_BIT);
With this OpenGL function, the display window with the specified background color is
put on the screen. The argument, as you can see, is an OpenGL symbolic constant. It indi-
cates that the bit values in the color (refresh) buffer are to be set to the background color
values specified in the glClearColor function.
While we can set the background color, OpenGL also allows us to set the object color.
This we do with the following function.
glColor3f (0.0, 1.0, 0.0);
The three arguments are used to specify the R, G, and B components of the color, in that
order. The suffix 3f in the function name indicates that the three components are specified
using floating-point (f ) values. These values can range between 0.0 and 1.0. The three val-
ues we used in the example denote the green color (as the other two components have values
0.0 each).
Finally, we need to call the appropriate OpenGL routines to create the line segment. The
following piece of code performs just that: it specifies a line segment between the end points
(200, 100) and (20, 50).
glBegin (GL_LINES);
glVertex2i (200, 100);
glVertex2i (20, 50);
glEnd ();

The two line end points (vertices) are specified using the OpenGL function glVertex2i.
The suffix 2i indicates that the vertices are specified by two integer (i) values denoting their
X and Y coordinates. The first and second end points are determined depending on their
ordering in the code. Thus, in the example, the vertex (200, 100) is the first end point while
the vertex (20, 50) acts as the second line end point. The function glBegin with its symbolic
OpenGL constant GL_LINES along with the function glEnd indicate that the vertices are
line end points.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 212 — #19


i i

212 Computer Graphics

With all these functions, the basic line creation program is ready. However, the functions
we used may be stored at different locations in the computer memory, depending on the
implementation of OpenGL. We need to force the system to process all these functions. This
we do with the following OpenGL function, which should be the last line of our picture
generation procedure.
glFlush ();

SUMMARY
In this chapter, we learnt about the underlying hardware in a graphics system. There are three
components of the hardware—the input devices, the output devices, and the display controller.
The most common graphics output devices are the video monitors. Various technologies are
used to design monitors. The earliest of those are the CRT, which we have already discussed in
Chapter 1. In this chapter, we learnt about flat panel displays. Broadly, they are of two types. In
the emissive displays, elctrical energy is converted to light energy, similar to a CRT. Examples
include plasma panels, LEDs, and thin-film electroluminescent displays. In non-emissive dis-
plays, external light energy is used to draw pictures on the screen. The most popular example
of such displays is the LCD. Hardcopy devices are another mode of producing graphics output.
Such devices are of two types: printers are used to produce any image including alphanumeric
characters on paper, while plotters are used for specific drawing purpose. Input devices are
mainly used for interactive graphics. Many such devices exist. Most common are the mouse
and keyboards. Other input devices include joystick, trackball, data gloves, and touch screens.
The display controller or the graphics card contains a special-purpose processor and a mem-
ory tailor-made for graphics. The processor is called the graphics processing unit or GPU. It
consists of a large number of simple processing units or cores, organized in the form of stream-
ing multiprocessors. The organization allows a GPU to perform parallel processing in SIMD
(single instruction multiple data) mode. Modern-day GPUs allow general-purpose programming
of its elements. This is done through shader programming. The vertex shaders are programs
that allow us to process vertices while the fragment shaders allow us to process pixels the way
we want.
We also got introduces to the graphics software. We learnt about the role played by graphics
libraries in the developement of a graphics program and learnt the basics of a popular graphics
library, the OpenGL, through an example line-drawing program.

BIBLIOGRAPHIC NOTE
Sherr [1993] contains more discussions on electronic displays. Tannas [1985] can be used for
further reading on flat-panel displays and CRTs. Raster graphics architecture is explained in
Foley et al. [1995]. Grotch [1983] presents the idea behind the 3D and stereoscopic displays.
Chung et al. [1989] contains work on head-mounted displays and virtual reality environments.
More on GPU along with examples on programming the vertex and fragment processors can
be found in the GPU Gems series of books (Fernando [2004], Pharr and Fernando [2005]).
For additional details on writing program using a shading language, refer to the OpenGLTM
Shading Language (Rost [2004]). The website www.gpgpu.org is a good source for more infor-
mation on GPGPU. A good starting point for learning OpenGL is the OpenGL Programming
Guide (Shreiner et al. [2004]).

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 213 — #20


i i

Graphics Hardware and Software 213

KEY TERMS
Computer graphics application programmers interface (CG API) – a set of library functions
to perform various graphics operations
Data glove – an input device, typically used with virtual reality systems, for positioning and
manipulation
Dot-matrix printer – a type of impact printer
Emissive display – a type of display that works based on the conversion of electric energy into
light on screen
Fixed-function hardware pipeline – all the pipeline stages are pre-programmed and embedded
into the hardware
Flat panel – a class of graphics display units
Fragment shaders (programs) – hardware programs to assign colors to pixels
GPU – the graphics processing unit, which is typically employed for graphics-related operations
Graphical kernel standard (GKS) – an early standard for graphics software
Impact printer – a printer that works by pressing character faces against inked ribbons on a paper
Joystick – an input device for positioning
Keyboard – an input device for characters
LCD – a type of flat panel non-emissive display
LED display – a type of flat panel emissive display
Mouse – an input device for pointing and selecting
Multicore – multiple processing units connected together
Non-emissive display – a type of display that works based on the conversion of light energy to
some onscreen graphical patterns
Non-impact printer – a printing device that uses non-impact methods such as lasers, ink sprays,
electrostatic, or electrothermal methods for printing
OpenGL – an open source graphics library, which has become the de-facto standard for graphics
software
PHIGS (Programmer’s Hierarchical Interactive Graphics Standard) – a standard for graphics
software
Plasma panel – a type of flat panel emissive display
Plotter – a type of hardcopy output device
Printer – a type of hardcopy output device.
Programmable GPU – a GPU where pipeline stages are not fixed and can be controlled
programmatically
Shader – a grid of GPU processors to perform specific stages of graphics pipeline
Shader programming/GPU programming/Graphics hardware programming – programs to
manipulate shaders
Spaceball – an input device for positioning
Stream processor – a processor that work on data streams
Streaming multiprocessor – a group of stream processors
Thin-film electroluminiscent display – a type of flat panel emissive display
Touch screen – gestural input systems
Trackball – an input device for positioning
Vertex shaders (programs) – hardware programs to process vertices

EXERCISES
10.1 What are the major components of a graphics system?
10.2 Discuss the difference between emissive and non-emissive displays.

i i

i i
i i

“Chapter-10” — 2015/9/15 — 13:14 — page 214 — #21


i i

214 Computer Graphics

10.3 Explain, with illustrative diagrams, the working of the plasma, LED, and thin-film electrolu-
minescent displays?
10.4 How do LCDs work? Explain with schematic diagrams.
10.5 Mention any five input devices.
10.6 Why si GPU better suited for graphics operations than CPU? Discuss with a suitable
illustration.
10.7 Explain the implementation of 3D graphics pipeline on GPU.
10.8 Explain the concept of shader programming. Why is it useful?
10.9 Why do we need graphics standards? Mention any two standards used in computer
graphics.
10.10 In Fig. 10.11, the code for drawing a line segment on the screen is shown. Assume that
we have a square surface with four vertices. The line displayed on the screen is a specific
orthogonal view of the surface (may be top view). Modify the code to define the surface
and perform the specific projection. Modify the code further so that the Gouraud shading
is used to color the surface.

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 251 — #1


i i

APPENDIX
Mathematical
A Background

Various mathematical concepts are involved in understanding the theories and principles
of computer graphics. They include the idea of vectors and vector algebra, matrices and
matrix algebra, tensors, complex numbers and quaternions, parametric and non-parametric
representations and manipulations of curves, differential calculus, numerical methods, and
so on. In order to explain the fundamental concepts of graphics in this book, we used some
of those. The mathematics used in this book mostly involved vectors and matrices and how
those are manipulated. In addition, we also used concepts such as reference frames and line
equations for calculating intersection points between two lines. The backgrounds for these
mathematical concepts are discussed in this appendix.

A.1 COORDINATE REFERENCE FRAMES


In the discussion on the pipeline stages, we mentioned about different (coordinate) reference
frames. We primarily used the concepts of the Cartesian reference frames in our discussion.
The Cartesian frames are characterized by the mutually perpendicular (orthogonal) axes,
which are straight lines. Both 2D and 3D Cartesian frames are used to represent points at
the different stages of the graphics pipeline.
Usually, the device-independent commands within a graphics package assume that a
screen region represents the first quadrant of a 2D Cartesian reference frame in standard
position (see Fig. A.1(a)), The lower-left corner of the screen is the coordinate origin. How-
ever, scan lines are numbered from top to bottom with the topmost scan line numbered 0.
This means that the screen positions are represented internally assuming the upper-left cor-
ner of the screen to be the origin and the positive Y -axis points downwards, as shown in
Fig. A.1(b). In other words, we make use of an inverted Cartesian frame. Note that the hori-
zontal (X ) coordinate values in both the systems are the same. The relationship between the
vertical (Y ) values of the two systems are shown in Eq. A.1, where y is the Y -coordinate
value in the standard frame and yinv is the Y-coordinate value in the inverted frame.

y = ymax − yinv (A.1)

In order to represent points in the three-dimensional space, usually the right-handed


Cartesian frame is used (Fig. A.2). We can imagine the system as if we are trying to grasp
the Z-axis with our right hand in a way such that the thumb points towards the positive

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 252 — #2


i i

252 Computer Graphics

Y Origin (0, 0)
X
Ymax

X Ymax

Y
Origin (0, 0)
(a) (b)

Fig. A.1 Standard and inverted 2D Cartesian reference frames used in computer graphics
(a) Points are represented with respect to the origin in the lower-left corner (b) Points are
represented with respect to the origin at the upper-left corner

+Y axis

+X axis

+Z axis

Fig. A.2 Right-handed 3D Cartesian coordinate system

Z direction. Hence the fingers are curling from the positive X direction to the positive
Y direction (through 90◦ ).
When a view of a 3D scene is displayed on a 2D screen, the 3D point for each 2D screen
position is sometimes represented with the left-handed reference frame shown in Fig. A.3.
Unlike the right-handed system, here we assume to grasp the Z axis with our left hand. Other
things remain the same. That is, the left-hand thumb points towards the positive Z direc-
tion and the left-hand fingers curl from the positive X direction to the positive Y direction.

+Z axis

+Y axis

View plane +X axis

Fig. A.3 Left-handed 3D Cartesian coordinate system

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 253 — #3


i i

Mathematical Background 253

The XY plane represents the screen. Positive Z values indicate positions behind the screen.
Thus, the larger positive Z values indicate points further from the viewer.

A.2 VECTORS AND VECTOR OPERATIONS


A vector is a mathematical entity with two fundamental properties—magnitude and direc-
tion. In a chosen coordinate reference frame, we can use two points to define a vector. For
example, consider Fig. A.4. It shows a vector in two dimensions in terms of the two points P1
and P2 . Let us denote the coordinates of the two points as (x1 , y1 ) and (x2 , y2 ), respectively.
Then, we can define the vector VE as,

VE = P2 − P1
= (x2 − x1 , y2 − y1 )
= (Vx , Vy )

The quantities Vx and Vy are the projections of the vector V onto the X - and Y -axis,
respectively. They are called the Cartesian components (or Cartesian elements). The mag-
nitude (denoted by |VE |) of the vector is computed in terms of these two components
as,
q
|VE | = Vx2 + Vy2

The direction can be specified in terms of the angular displacement α from the horizontal
as,
 
Vx
α = tan−1
Vy
The idea of a 3D vector is similar. Suppose we have two points P1 (x1 , y1 , z1 ) and
P2 (x2 , y2 , z2 ). We now have three Cartesian components instead of two for the three axes:
Vx = (x2 − x1 ), Vy = (y2 − y1 ), and Vz = (z2 − z1 ) for the X −, Y − and Z axis, respectively.
Then, the magnitude of the vector can be computed as,
q
|VE | = Vx2 + Vy2 + Vz2

The vector direction can be given in terms of the direction angles, that is, the angles α, β,
and γ the vector makes with each of the three axes (see Fig. A.5). More precisely, direction
Y

P2(x2, y2)

P1(x1, y1)
X

Fig. A.4 A 2D vector V defined in a Cartesian frame as the difference of two points

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 254 — #4


i i

254 Computer Graphics

+Y axis

b
a
+X axis
g

+Z axis

Fig. A.5 Three direction angles for a 3D vector

angles are the positive angles the vector makes with each of the positive coordinate axes.
The three direction angles can be computed as,
Vx
cos α =
|V |
Vy
cos β =
|V |
Vz
cos γ =
|V |
The values cosα, cosβ, and cosγ are known as the direction cosines of the vector. In fact,
we need to specify any two of the three cosines to find the direction of the vector. The third
cosine can be determined from the two since,

cos2 α + cos2 β + cos2 γ = 1

In many situations, we deal with unit vectors. It is a vector with magnitude 1. While we
usually denote a vector with an arrow on top, such as VE , a unit vector is denoted by putting
a hat on top of the vector symbol, such as V̂ . However, the most common notation for unit
vector is û. Calculation of the unit vector along the direction of a vector is easy. Suppose
VE = (Vx , Vy , Vz ) be the given vector. Then the unit vector V̂ along the direction of VE is
given by Eq. (A.2).
 
Vx Vy Vz
V̂ = , , (A.2)
|V | |V | |V |
q
where |V | = Vx2 + Vy2 + Vz2 is the magnitude of the vector.

A.2.1 Vector Addition and Scalar Multiplication


The addition of two vectors is defined as the addition of corresponding components. Thus,
we can represent vector addition as follows.

VE1 + VE2 = (V1x + V2x , V1y + V2y , V1z + V2z )

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 255 — #5


i i

Mathematical Background 255

+Y axis +Y axis

V1 + V2
V2
V2

V1

V1
+X axis +X axis

(a) (b)

Fig. A.6 Illustration of vector addition (a) Original vectors. (b) VE2 repositioned to start where
VE1 ends

The direction and magnitude of the new vector is determined from its components as
before. The idea is illustrated in Fig. A.6 for 2D vector addition. Note that the second vector
starts at the tip of the first vector. The resulting vector starts at the start of the first vector and
ends at the tip of the second vector.
Addition of a vector with a scalar quantity is not defined, since a scalar quantity has only
magnitude without any direction. However, we can multiply a vector with a scalar value. We
do this by simply multiplying the scalar value to each of the components as follows:

sVE = (sVx , sVy , sVz )

A.2.2 Multiplication of Two Vectors


We can multiply two vectors in two different ways—scalar or dot product and vector or
cross product. In the case of a dot product of two vectors, we obtain a scalar value, whereas
we get a vector from the cross product of two vectors.
The dot product of two vectors VE1 and VE2 is calculated as,

VE1 .VE2 = |VE1 ||VE2 | cos θ 0≥θ ≥π

where θ is the smaller of the two angles between the vector directions. We can also determine
the vector dot product in terms of their Cartesian components as,

VE1 .VE2 = V1x V2x + V1y V2y + V1z V2z

Note that the dot product of a vector with itself produces the square of the vector magni-
tude. There are two important properties satisfied by the dot product. They are commutative.
In other words, VE1 .VE2 = VE2 .VE1 . Also, dot products are distributive with respect to vector
addition. Thus, VE1 .(VE2 + VE3 ) = VE1 .VE2 + VE1 .VE3 .
The cross product of two vectors is defined as,

VE1 × VE2 = û|VE1 ||VE2 | sin θ 0≥θ ≥π

In this expression, û is a unit vector (magnitude 1), which is perpendicular to both VE1 and
VE2 (Fig. A.7). The direction for û is determined by the right-hand rule: we grasp with our

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 256 — #6


i i

256 Computer Graphics

V1 × V2

V1
u
q

V2

Fig. A.7 Illustration of the cross product of two vectors

right hand an axis that is perpendicular to the plane containing VE1 and VE2 such that the fingers
curl from VE1 to VE2 . Thus, the right thumb denotes the direction of û. The cross product can
also be expressed in terms of the Cartesian components of the constituent vectors as,

VE1 × VE2 = (V1y V2z − V1z V2y , V1z V2x − V1x V2z , V1x V2y − V1y V2x )

The cross product of two parallel vectors is zero. Therefore, the cross product of a vec-
tor with itself is zero. It is also not commutative since VE1 × VE2 = −(VE2 × VE1 ). However,
cross product of two vectors is distributive with respect to vector addition similar to the dot
product, that is, VE1 × (VE2 + VE3 ) = VE1 × VE2 + VE1 × VE3 .

A.3 MATRICES AND MATRIX OPERATIONS


A matrix is a rectangular array of elements, which can be numerical values, expressions, or
even functions. We have already encountered several examples of matrices in the book. In
general, we can represent a matrix M with r rows and c columns as,
 
m11 m12 . . . m1c
m21 m22 . . . m2c 
M=  .
 
.. .. 
 .. . . 
mr1 mr2 . . . mrc
where mij represents the elements of M. As per the convention, the first subscript of any
element denotes the row number and the column number is given by the second subscript.
A matrix with a single row or single column represents a vector (the elements represent
the coordinate components of the vector). When a vector is represented as a single-row
matrix, it is called a row vector. A single column matrix is called a column vector. Thus, a
matrix can also be considered as a collection of row or column vectors.

A.3.1 Scaler Multiplication and Matrix Addition


In order to multiply a matrix M with a scalar value s, we multiply each element mij with s.
For example, if
 
321
M = 2 1 3
132

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 257 — #7


i i

Mathematical Background 257

then
 
642
2M = 4 2 6
264

Two matrices can be added only if they both have the same number of rows and columns.
In order to add two matrices, we simply add their corresponding elements. Thus,
     
321 642 963
2 1 3 + 4 2 6 = 6 3 9
132 264 396

A.3.2 Matrix Multiplication


Given two matrices A (dimension m × n) and B (dimension p × q), we can multiply them
if and only if n = p. In other words, the number of columns in A should be the same as the
number of rows in B. The resultant matrix C = AB will have the dimension m × q. We can
obtain the elements of C (cij ) from the elements of A (aij ) and B (bij ) as,

n
X
cij = aik bkj
k=1

In the following example, we multiplied a 2 × 3 matrix with a 3 × 2 matrix to obtain the


2 × 2 matrix.
 
  64  
321   28 22
42 =
132 22 22
26

Note that matrix multiplication is not commutative: AB 6= BA. However, it is distributive


with respect to matrix addition. That is, A(B + C) = AB + AC.

A.3.3 Matrix Transpose


Given a matrix M, its transpose MT is obtained by interchanging rows and columns. For
example,
 
 T 3 1
321
= 2 3
132
1 2

We can also define transpose of a matrix product as,

(M1 M2 )T = M2T M1T

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 258 — #8


i i

258 Computer Graphics

A.3.4 Determinant of a Matrix


A useful concept in operations involving matrices is the determinant of a matrix. Determi-
nant is defined only for square matrices (i.e., those matrices having the same number of rows
and columns). The second-order determinant for a 2 × 2 square matrix M is defined as,
 
m11 m12
det M = = m11 m22 − m12 m21
m21 m22

Higher-order determinants are obtained recursively from the lower-order determinant val-
ues. In order to calculate a determinant of order 2 or greater of an n × n matrix M, we select
any column k and compute the determinant as,
n
X
det M = (−1)j+k mjk det Mjk
j=1

where det Mjk is the (n − 1) by (n − 1) determinant of the submatrix obtained from M after
removing the jth row and kth column. Alternatively, we can select any row j and calculate
the determinant as,
n
X
det M = (−1)j+k mjk det Mjk
j=1

Efficient numerical methods exist to compute determinants for large matrices (n > 4).

A.3.5 Matrix Inverse


Another useful matrix operation is the inverse of a matrix. This is again defined for only
square matrices. Moreover, a square matrix can have an inverse if and only if the determinant
of that matrix is non-zero. If an inverse exists, we call the matrix non-singular. Otherwise, it
is a singular matrix.
For an n × n matrix M, its inverse is usually denoted by M−1 . The matrix and its inverse
also satisfy the relation,

MM−1 = M−1 M = I

where I is the identity matrix. Only the diagonal elements of I are 1 and all other elements
are zero.
We can calculate the elements of M−1 from the elements of M as,

(−1)j+k det Mkj


m−1
jk = det M

where m−1 jk is the element of the jth row, kth column of the inverse matrix. Mkj is the (n − 1)
by (n − 1) submtarix obtained by deleting the kth row, jth column of M. Usually, more
efficient numerical methods are employed to compute the inverse of large matrices.

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 259 — #9


i i

Mathematical Background 259

A.4 LINE EQUATION AND INTERSECTION CALCULATION


In clipping algorithms (Chapter 7), we saw the need for determining line-boundary intersec-
tion points. How do we do that? Suppose we are given a line segment specified by two end
points P (x1 , y1 ) and Q (x2 , y2 ). We can use the point-slope form (Eq. A.3) to derive the line
equation.
y − y1 = m(x − x1 ) (A.3)
In Eq. A.3, m is the slope (or gradient) of the line, it indicates how steep the line is. We
can derive m in terms of the two end points as in Eq. A.4.
y2 − y1
m= (A.4)
x2 − x1
5−3
For example, suppose P (2,3) and Q (4,5). Then m = 4−2 = 1. Therefore, we can have
the line equation as: y − 3 = 1(x − 2).
Another way to represent a line is the slope-intercept form: y = mx + c. In the equation,
c is known as the intercept (of the line with the Y -axis). We can recast the point-slope form
to the standard form by rearranging the terms in Eq. A.3, as follows:

y = y1 + mx − mx1 = mx + (y1 − mx1 ) = mx + c

Let us illustrate this with the previous example. We have the point-slope form y − 3 =
1(x − 2). After expnading, we get y − 3 = x − 2. We rearrange the terms to get y = x − 2 + 3
or y = x + 1. This is the standard form.
Now suppose we are given two line segments: L1 (y = m1 x + c1 ) and L2 (y = m2 x + c2 ).
If the lines intersect, there must be a common point (x, y), which lies on both the lines.
Therefore, the following relation must hold.

m1 x + c1 = m2 x + c2

From this, we derive the common x-coordinate as,


c2 − c1
x=
m1 − m2
We can substitute this value of x in any of the line equations to get the common
y-coordinate as,
c2 − c1
y = m1 + c1 using L1 or
m1 − m2
c2 − c1
y = m2 + c2 using L2
m1 − m2
When we are considering a vertical line, the line equation is given as x = b where b
is some constant. Now, suppose we are trying to calculate the intersection point of a line
y = mx + c with a vertical line x = b. We substitute b for x in the line equation to get the

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 260 — #10


i i

260 Computer Graphics

y-coordinate as y = mb + c. Thus, the intersection point is (b, mb + c). In the case of a


horizontal line, we have its equation as y = b, where b is a constant. We substitute this value
for y in the line equation to compute the x-coordinate as: b = mx + c or x = b−c
m . Hence, the
b−c
intersection point is ( m , b).

A.5 PLANE EQUATION AND PLANE NORMAL


At many places of the 3D graphics pipeline (such as lighting and hidden surface removal),
we have to deal with 3D surfaces and surface normals. The implicit form of a surface in
general is given by Eq. A.5.

f (x, y, z) = 0 (A.5)

For any point P(x, y, z) on the surface, Eq. A.5 should evaluate to zero, that is, f (p) = 0.
For points that are not on the surface, Eq. A.5 returns some non-zero value. However, for
the purpose of this book, we shall restrict ourselves to the discussion of plane surfaces only,
rather than any arbitrary curved surface. This is so since we mostly considered objects with
polygonal surfaces. We also know that any surface can be represented as a mesh of polygonal
surfaces.
The most familiar way to represent a planar surface if the point-normal form, shown in
Eq. A.6.

Ax + By + Cz + D = 0 (A.6)

Equation A.6 is also known as the general form of a plane equation. Let nE be the normal
to the plane, that is, a vector perpendicular to the planar surface. Then, the constants A, B,
and C in the plane equation (Eq. A.6) denote the corresponding Cartesian components nE. In
other words, nE = (a, b, c).
Sometimes, we want to derive a plane equation given three points on the plane (say a, b,
and c). Each of these points can be represented as a point vector, that is, a vector from the
origin to the point. Thus, we can form three point vectors aE, b, E and Ec. From the three point
vectors, we can derive two vectors that lie on the plane. For example, the vectors (bE − aE) and
(Ec − aE) are two vectors on the plane.
Since the two vectors are on the plane, we can take their cross-product to obtain a vector
that is perpendicular to the two vectors, that is, the plane itself. Since the vector is perpen-
dicular to the plane, it is the normal vector. Thus, nE = (bE − aE) × (Ec − aE). Once we know nE,
we have the three constants A, B, and C. We then use Eq. A.6 by putting any one of the three
points in the equation to obtain the value of D.
Let us illustrate the idea with an example. Suppose we are given the three points a(1, 1, 1),
b(3, 4, 5), and c(7, 7, 7). Therefore, the three point vectors are,

aE = (1, 1, 1)
bE = (3, 4, 5)
Ec = (7, 7, 7)

i i

i i
i i

“App-A” — 2015/9/15 — 9:51 — page 261 — #11


i i

Mathematical Background 261

From these three point vectors, we form two vectors that lie on the plane, as follows.

bE − aE = (3, 4, 5) − (1, 1, 1) = (2, 3, 4)


Ec − aE = (7, 7, 7) − (1, 1, 1) = (6, 6, 6)

The cross-product of these two vectors yield (see Section A.2 for details),

(bE − aE) × (Ec − aE) = (2, 3, 4) × (6, 6, 6) = (−6, 12, −6)

Therefore, the normal to the plane is nE = (−6, 12, −6). Thus, the three plane constants
are A = −6, B = 12, C = −6. Replacing these values in Eq. A.6, we get,

−6x + 12y − 6z + D = 0

Now let us take any one of the three points, say b(3, 4, 5). Since the point is on the plane,
we should have

−6.3 + 12.4 − 6.5 + D = 0

From this equation, we get D = 0


Hence, the plane equation is −6x + 12y − 6z = 0 or −x + 2y − z = 0.

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 268 — #1


i i

APPENDIX
Ray-tracing
C Method for
Surface
Rendering
In Chapter 4, we learnt the simple lighting model for computing color at surface points. As
we discussed, the model is based on many simplistic assumptions. For example, the assump-
tion that all surfaces are ideal reflectors, light path does not shift during refraction, or the
ambient lighting effect can be modeled by a single number. Consequently, the model is inca-
pable of producing realistic effects. What we require is a global illumination model, that is,
a model of illumination that takes into account all the reflections and refractions that affect
the color at any particular surface point. In this chapter, we shall learn about one such model
known as ray tracing. Obviously, the computational cost for implementing this model is
much higher.

C.1 RAY-TRACING: BASIC IDEA


Generating or synthesizing an image on a computer screen is equivalent to computing pixel
color values. We can view this process in a slightly different way. In a scene, there are light
rays from the surfaces emanating in all directions. Some of these rays are passing through
the screen pixels towards the viewer. Thus, a pixel color is determined by the color of the
light ray passing through it. In the ray-tracing method, we trace these rays backward—from
the viewer towards the source. The idea is illustrated in Fig. C.1.

Towards source

Light ray (backward)


Viewer
Pixel

Fig. C.1 The basic idea of ray tracing. Each original ray passing through a pixel towards the
viewer is traced backward (from viewer to source).

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 269 — #2


i i

Ray-tracing Method for Surface Rendering 269

C.1.1 Steps Involved


The first step in the ray-tracing method is the generation of rays (from the viewer to the light
source) passing through the pixels. These are called pixel rays. There is one pixel ray for
each pixel. Depending on the type of projection we want, the rays differ. If we are interested
in parallel projection, the rays are generated perpendicular to the view plane (screen). If per-
spective projection is required, we create rays starting from a common projection point (i.e.,
center of projection, see Chapter 6). In both the cases, rays are assumed to pass through the
pixel centers.
Next, for each pixel ray, we check if it intersects any of the surfaces present in the scene.
For this purpose, we maintain a list of all surfaces. For each member of the list, we check for
ray-surface intersection. If for a surface, an intersection is found, we calculate the distance of
the surface from the pixel. Among all those intersecting surfaces, we choose the one having
the least distance. This is the visible surface. The pixel ray used for detecting visible surface
is called the primary ray.
Once we detect the visible surface with the primary ray, we perform reflection and refrac-
tion of the primary ray. We treat the primary ray as the incident ray. At the (primary
ray–visible surface) intersection point, the primary ray is reflected along the specular reflec-
tion direction (or the ideal reflection direction; see Chapter 4). In addition, if the surface
is transparent, the ray is transmitted through the surface along the refraction direction. As
we can see, two more rays are generated off the intersection point due to the reflection and
refraction. These are called the secondary rays. We then treat each secondary ray as primary
ray and repeat the procedure recursively (i.e., determine the visible surface and generate new
secondary rays from the intersection point).
We maintain a binary ray-tracing tree to keep track of the primary and secondary rays.
Surfaces serve as nodes of the tree. The left edges denote a reflection path between the two
surfaces. Right edges denote the refraction path. The idea is illustrated in Fig. C.2. As the

R1 T1

S1 S2
S4 T2
R4
R3
R2 R3
R2 S3
S2
S1
S3
R1(Refracted ray) T1(Refracted ray) T2
S R4

Primary ray S4

(a) (b)

Fig. C.2 The construction of a binary ray-tracing tree. In (a), we show the backward tracing
of the ray path through a scene having five surfaces. The corresponding ray-tracing tree is
shown in (b).

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 270 — #3


i i

270 Computer Graphics

figure shows, at each step of the recursive process, at most two new nodes and edges are
added to the tree. We can set the maximum depth of the tree as a user option, depending
on the available storage. The recursive process stops if any of the following conditions is
satisfied:
1. The current primary ray intersects no surface in the list.
2. The current primary ray intersects a light source, which is not a reflecting
surface.
3. The ray-tracing tree has reached its maximum depth.
At each ray–surface intersection point, we compute the intensity using a lighting model.
There are the following three components in the intensity.

Local contribution The intensity contribution due to the light source

Reflected contribution The intensity contribution due to the light that comes after
reflection from other surfaces

Transmitted contribution The intensity contribution due to the light that comes after
transmitting through the surface from the background
Thus, the total light intensity at any surface point I can be computed as,

I = Il + Ir + It (C.1)

In the equation, Il is the local contribution. We can use the simple lighting model dis-
cussed in Chapter 4 to compute it. For this calculation, we require the three vectors N E
E
(the surface normal at the point), V (the vector from the surface point to the viewer, i.e.,
along the opposite direction of the primary ray), and L E (the vector from the point to the
light source). In order to determine L E , we send a ray from the intersection point towards
the light source. This ray is called the shadow ray. We check if the shadow ray inter-
sects any surface in its path. If it does, the intersection point is in shadow with respect
to the light source. Hence, we do not require calculation of actual intensity due to the
light source. Instead, we can apply some technique (e.g., ambient light/texture pattern,
see Chapter 4) to create the shadow effect. The other two components in the Eq. C.1 (Ir
and It ) are calculated recursively using the steps mentioned before. The intensity value
at each intersection point is stored at the corresponding surface node position of the
ray-tracing tree.
Once the tree is complete for a pixel, we accumulate all the intensity contributions start-
ing at the leaf nodes. Surface intensity from each node is attenuated (see Chapter 4) by the
distance from the parent node and then added to the intensity of the parent surface. This
bottom-up procedure continues till we reach the root node. The root node intensity is set
as the pixel intensity. For some pixel rays, it is possible that the ray does not intersect any
surface. In that case, we assign background color to the pixel. It is also possible that, instead
of any surface, the ray intersects a (non-reflecting) light source. The light source intensity is
then assigned to the pixel.

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 271 — #4


i i

Ray-tracing Method for Surface Rendering 271

C.1.2 Ray–Surface Intersection Calculation


The prerequisite for the ray–surface intersection calculation is the representation of pixel
rays. How we can represent a ray? Note that a ray is not a vector. This is because a vector is
defined having a direction and a magnitude. However, in the case of a ray, it has a starting
point and a direction. Thus, we cannot use a single vector to represent it. Instead, we can use
a sum of the following two vectors to represent a ray.
1. The ray origin vector Es, which is a vector from the coordinate origin to the ray origin. The
ray origin point can be chosen as either the pixel position or the point of projection (for
perspective projection).
2. The ray direction vector d, E which is along the ray direction. Usually, we use the unit
direction vector d̂.
In terms of these two vectors, we can represent a ray in parametric form as shown in
Eq. C.2.
Er(t) = Es + td̂, t ≥ 0 (C.2)
How to determine d̂? In the case of parallel projection, it is simply the unit normal vector
of the XY plane (view plane). For perspective projection, we require little more computation.
With respect to the coordinate origin, let us denote the point of projection by the vector PrE
E
and the pixel point by the vector Px. Then, the unit vector d̂ should be along the direction
from PrE to Px.
E Hence, we can compute d̂ as,

E − Pr
Px E
d̂ = (C.3)
E − Pr|
|Px E

The various vectors for perspective projection is shown in Fig. C.3 for illustration. In
order to determine the ray–surface intersection point, we simultaneously solve the ray equa-
tion and the surface equation. This gives us a value for the parameter t from which the
intersection coordinates are determined. At each intersection point, we update Es and d̂ for
each of the secondary rays. The new Es is the vector from origin to the intersection point. For
the reflected ray, the unit vector d̂ is calculated along the specular reflection direction. For
the ray due to refraction, d̂ is determined along the refraction path.
Let us illustrate the ray–surface intersection calculation in terms of an example—
intersection of a ray with a sperical surface. Assume that the sphere center is at the point pc
Vector d
Y

Point of
projection Pixel

Vector s
Z
X

Fig. C.3 Representing a ray with the two vectors—perspective projection

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 272 — #5


i i

272 Computer Graphics

and the length of its radius is r. Then, for any point p on the surface, the surface can be
represented with the equation,
|Ep − pEc | = r (C.4)
If the ray intersects the surface, there should be a common surface point. Thus, we can
replace the point in the surface equation with the corresponding ray equation.

|Es + tdE − pEc | = r (C.5)

Next, we square both the sides in Eq. (C.5) to get,

|Es + td̂ − pEc |2 = r2 (C.6)

After expanding and rearranging Eq. (C.6), we get



−B ± B2 − A.C
t= (C.7)
A

where, A = |d̂|2 = 1, B = (Es − pEc ).d̂, and C = |Es − pEc |2 − r2 . Depending on the values of
A, B and C, we have the following scenarios:
1. If B2 − A.C < 0, there is no intersection between the surface and the ray.
2. If B2 − A.C = 0, the ray touches the surface. Clearly, there will be no secondary rays
generated in this case.
3. If B2 − A.C > 0, the ray intersects the surface. There are two possible parameter values
as per Eq. C.7.
(a) If both values are negative, there is no ray–surface intersection.
(b) If one of the values is zero and the other positive, the ray originates on the sphere and
intersects it. Usually in graphics, we are not interested in modelling such cases.
(c) If the two values differ in sign, the ray originates inside the sphere and inter-
sects it. This is again a mathematical possibility, which is usually not considered in
graphics.
(d) If both are positive, the ray intersects the sphere twice (enter and exit). The
smaller value corresponds to the intersection point that is closer to the starting
point of the ray. Thus, we take the smaller value to determine the intersection
coordinates.

C.2 REDUCING INTERSECTION CALCULATION


For commonly occurring shapes such as spheres, cubes, and splines, efficient ray–surface
intersection algorithms have been developed. However, as we can see, it takes a lot of com-
putation to check for intersections of all the pixel rays with all the surfaces present in the
scene. In fact, intersection calculations take up about 95% of the time it requires to ren-
der a scene using the ray-tracing method. Therefore, it is obvious that we try to reduce the
intersection calculations as much as possible.

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 273 — #6


i i

Ray-tracing Method for Surface Rendering 273

Various techniques are used in the ray-tracing method to speed up the surface rendering
process. We can broadly divide these techniques into the following two groups:
1. Bounding volume techniques
2. Spatial subdivision methods

C.2.1 Bounding Volume Techniques


In the bounding volume technique, a group of closely placed objects in the scene are
assumed to be enclosed by a bounding volume. Usually regular shapes such as spheres
or cubes are used for bounding volume. Before checking for ray–surface intersection, we
first check if the ray intersects the bounding volume. Only if the bounding volume is inter-
sected by the ray, do we go for ray–surface intersection checks for all the surfaces in the
bounding volume. Otherwise, we remove all the enclosed surfaces from further intersection
checks.
The basic bounding volume approach can be extended to the hierarchical bounding vol-
ume. In that case, we create a hierarchy of bounding volumes. The ray first checks for
intersection with the top-level bounding volume. If an intersection is found, the volumes
in the next level are checked for intersection. The process goes on till the lowest level of the
hierarchy.

C.2.2 Spatial Subdivision Methods


In the spatial subdivision approach, we enclose the entire scene within a cube. Then, we
recursively divide the cube into cells. The subdivision can proceed in either of the two ways.

Uniform subdivision At each subdivision step, we divide the current cell into eight
equal-sized octants.

Adaptive subdivision At each subdivision step, we divide the current cell only if it contains
surfaces.
The recursion continues till each cell contains no more than a predefined num-
ber of surfaces. The process is similar to the space subdivision method we dis-
cussed in Chapter 2. We can use octrees to store the subdivision. Along with
the subdivision information, information about surfaces stored in the cells are also
maintained.
First, we check for intersection of the ray with the outermost cube. Once an intersection
is detected, we check for intersection of the ray with the inner cubes (next level of subdivi-
sion) and continue in this way till we reach the final level of subdivision. We perform this
check for only those cells that contain surfaces. In the final level of subdivision, we check
for intersection of the ray with the surfaces. The first surface intersected by the ray is the
visible surface.

C.3 ANTI-ALIASED RAY TRACING


In the ray-tracing method, we take discrete samples (pixels) to depict a continuous scene.
Clearly, the phenomenon of aliasing (see Chapter 9) is an important issue in ray tracing also.

i i

i i
i i

“App-C” — 2015/9/15 — 9:54 — page 274 — #7


i i

274 Computer Graphics

We need some means (anti-aliasing techniques) to eliminate or reduce the effect of aliasing.
There are broadly two ways of antialiasing in ray tracing.
1. Supersampling
2. Adaptive sampling
In supersampling, each pixel is assumed to represent a finite region. The pixel region is
divided into subregions (subpixels). Instead of a single pixel ray, we now generate pixel rays
for each of these subregions and perform ray tracing. The pixel color is computed as the
average of the color values returned by all the subpixel rays.
The basic idea behind the adaptive sampling is as follows: we start with five pixel rays
for each pixel instead of one as before. Among these five, one ray is sent through the pixel
center and the remaining four rays are sent through the four corners (assuming each pixel is
represented by a square/rectangular region) of the pixel. Then we perform color computation
using the ray-tracing method for each of these five rays. If the color values returned by them
are similar, we do not divide the pixel further. However, in case the five rays return dissim-
ilar color values, we divide the pixel into a 2 × 2 subpixel grid. We then repeat the process
for each subpixel in the grid. The process terminates when a preset level of subdivision is
reached.

i i

i i

You might also like