Computer-graphics-Samit Bhattacharya-Department of Computer Science Engineering-Oxford Press-2015 e
Computer-graphics-Samit Bhattacharya-Department of Computer Science Engineering-Oxford Press-2015 e
GRAPHICS
Samit Bhattacharya
Assistant Professor
Department of Computer Science Engineering
IIT Guwahati
1
i i
Preface
The term computer graphics roughly refers to the field of study that deals with the display
mechanism (the hardware and software) of a computer. In the early days, for most of us, a
computer meant what we got to see on the monitor. Then came the laptops, in which the
display and the CPU tower (along with the peripheral keyboard unit) were combined into
a single and compact unit for easy portability. The cathode ray tube (CRT) displays were
replaced by the liquid crystal display (LCD) technology. However, the idea of computer was
still restricted to the display screen for a majority of the users.
For the younger generation, the personal computer (PC) is no longer a ‘computer’. It is
replaced by a plethora of devices, although the laptops have managed to retain their charm
and appeal due to portability. These devices come in various shapes and sizes with varying
degrees of functionality. The most popular of these is the ubiquitous smartphone. Although
much smaller in size compared to a PC, smartphones are similar to very powerful PCs of the
yesteryears; with powerful multicore processing units, high-resolution displays, and large
memory. Then we have the tablets (or tabs), which are slightly larger in size (although still
much smaller than a PC), and the fablets, having features of both a phone and a tab. Such
devices also include wearable computers such as the smart watch or the Google glass. Even
the televisions nowadays have many computing elements that has led to the concept of smart
TVs. This is made possible with a rapid change in technology, including display technology.
Instead of the CRT, we now have devices that use LCD, plasma panel, light-emitting diode
(LED), organic light-emitting diode (OLED), thin-film transistor (TFT), and so on, for the
design of display units.
However, regardless of the current state-of-the-art technology in computing is, the idea of
a ‘computer’ is shaped primarily by what we get to ‘see’ on the display unit of a computing
system. Since perception matters the most in the popularity of any system, it is important
for us to know the components of a computing system that give rise to this perception—the
display hardware and the associated software and algorithms. Therefore, it is very important
to learn the various aspects of computer graphics to understand the driving force behind the
massive change in consumer electronics that is sweeping the world at present.
i i
i i
i i
Brief Contents
Preface v
Features of the Book x
Detailed Contents xv
i i
i i
i i
CHAPTER
Overview of
1 Computer
Graphics
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the field of computer graphics and its application areas
• Trace the historical development of the field
• Understand the various components of a computer graphics system
• Have a basic understanding of the display hardware in terms of the cathode ray tube
display technology
• Identify the stages of the image synthesis process
INTRODUCTION
With a computer, we can and usually do lot of things. We create documents and presenta-
tions. For example, consider the screenshot in Fig. 1.1 that was taken during the preparation
of this book with MS WordTM .
Notice the components present in the image. The primary components, of course, are the
(alphanumeric) characters. These characters were entered using a keyboard. While in a doc-
ument, the alphanumeric characters are the most important, there are other equally important
components that are part of any word processing software. In this figure, these components
are the menu options and the editing tool icons on top. Some of these options are shown
as text while the others are shown as images (icons). Thus, we see a mix of characters and
images that constitute the interface of a typical word processing system.
Next, consider Fig. 1.2, which is an interface of a Computer-aided Design (CAD) tool. It
shows the design of some machinery parts on a computer screen, along with some control
buttons on the right-hand side. The part itself is constructed from individual components,
with specified properties (dimension, etc.). An engineer can specify the properties of those
individual components and try to assemble them virtually on the computer screen, to check
if there is any problem in the specifications. This saves time, effort, and cost, as the engineer
does not need to actually develop a physical prototype and perform the specification checks.
This is the advantage of using a CAD tool.
Figure 1.3 shows two instances of visualization, another useful activity done with com-
puters. Figure 1.3(a) shows the visualization of a DNA molecule. It shows that, with the
i i
i i
i i
2 Computer Graphics
Fig. 1.1 Screen capture of a page during document preparation using MS Word
Fig. 1.2 A CAD system interface—the right-hand side contains buttons to perform various
engineering tasks
aid of computers, we could see something that is not possible with the naked eye. Such a
type of visualization is called scientific visualization, where we try to visualize things that
occur in nature and that we cannot otherwise see. Figure 1.3(b), on the other hand, shows
an instance of traffic in a computer network. Basically, it shows the status of the network
at that instant, such as the active nodes, the active links, the data flow path, and so on.
As you can see, this figure shows something, which is not natural (i.e., it shows informa-
tion about some man-made entity). Visualization of this latter kind is known as information
visualization.
Each of the aforementioned points is basically an example of the usage of computer
graphics. The spectrum of such applications is very wide. In fact, it is difficult to list
i i
i i
i i
(a)
(b)
Fig. 1.3 Two examples of visualization (a) Visualization of a DNA molecule (b) Network
visualization
all applications as virtually everything that we see around us involving computers con-
tains some applications of computer graphics. Apart from the examples we saw and the
typical desktop/laptop/tablet/palmtop applications that we traditionally refer to as com-
puters, computer graphics techniques are used in the mobile phones we use, information
kiosks at popular spots such as airports, ATMs, large displays at open air music con-
certs, air traffic control panels, the latest movies to hit the halls, and so on. The appli-
cations are so diverse and widespread that it is no exaggeration to say that for a lay-
man in this digital age, the term computer graphics has become synonymous with the
computer.
In all these examples, we see instances of images displayed on a computer screen. These
images are constructed with objects, which are basically geometric shapes (characters and
icons) with colors assigned to them. When we write a document, we are dealing with letters,
numbers, punctuations, and symbols. Each of these is an object, which is rendered on the
screen with a different style and size. In case of drawing, we deal with basic shapes such
i i
i i
i i
4 Computer Graphics
as circles, rectangles, curves, and so on. For animation videos or computer games, we are
dealing with virtual characters, which may or may not be human-like. The images or parts
thereof can be manipulated (interacted with) by a user with input devices such as mouse,
keyboard, joystick, and so on.
The question is how can a computer do all these stuff? We know that computers under-
stand only the binary language, that is, the language of 0s and 1s. Letters of an alpha-
bet, numbers, symbols, or characters are definitely not strings of 0s or 1s, or are they?
How we can represent such objects in a language understood by computers so that those
can be processed by the computer. How can we map from the computer’s language to
something that we can perceive (with physical properties such as shape, size, color)? In
other words, how we can create or represent, synthesize, and render imagery on a com-
puter display? This is the fundamental question that is studied in the field of computer
graphics.
This fundamental question can further be broken down into a set of four basic questions.
1. Imagery is constructed from its constituent parts. How to represent those parts?
2. How to synthesize the constituent parts to form a complete realistic imagery?
3. How to allow the users to manipulate the imagery constituents on-screen?
4. How to create the impression of motion?
Computer graphics seeks the answer to these questions. A couple of things are impor-
tant here. First, the term computer screen here is used in a very broad sense and encom-
passes all sorts of displays including small displays on the mobile devices such as smart
phones, tablets, etc., interactive white boards, interactive table tops, as well as large dis-
plays such as display walls. Obviously, these variations in displays indicate corresponding
variations in the underlying computing platforms. The second issue is, computer graph-
ics seeks efficient solutions to these questions. As the computing platforms vary, the
term efficiency refers to ways that make or try to make optimal resource usage for a
given platform. For example, displaying something on mobile phone screens requires tech-
niques that are different from displaying something on a desktop. This is because of the
differences in CPU speed, memory capacity, and power consumption issues in the two
platforms.
Let us delve a little deeper. In the early era of computers, displays constituted a termi-
nal unit capable of showing only characters. In subsequent developments, ability to show
complex 2D images was introduced. However, with the advent of technology, the memory
capacity and processor speeds of computing systems have increased greatly. Along with that,
the display technology has also improved significantly. Consequently, our ability to display
complex processes such as 3D animation in a realistic way has improved to a great extent.
There are two aspects of a 3D animation. One is to synthesize frames, the other is to com-
bine them and render in a way to generate the effects of motion. Both these are complex and
resource-intensive tasks, which are the main areas of activities in the present-day computer
graphics.
Thus, computer graphics can be described in brief as the process of rendering static
images or animation (sequence of images) on computer screens in an efficient way. In the
subsequent chapter, we shall look into the details of this process.
i i
i i
i i
i i
i i
i i
6 Computer Graphics
to create engineering drawings directly on the CRT screen (see Fig. 1.5). Precise drawings
could be created, manipulated, duplicated, and stored. The Sketchpad was the first GUI long
before the term was coined and pioneered several concepts of graphical computing, includ-
ing memory structures to store objects, rubber-banding of lines, the ability to zoom in and
out on the display, and the ability to make perfect lines, corners, and joints. This achievement
made Sutherland to be acknowledged by many as the grandfather of interactive computer
graphics.
In addition to the SAGE and the Sketchpad systems, the gestational period saw devel-
opment of many other influential systems such as the first computer game (Spaceware
developed by Steve Russell and team in 1961 on a PDP-I platform) and the first CAD sys-
tem (DAC-1 by IBM, formally demonstrated in 1964 though the work started in 1959, see
Fig. 1.6).
Adolescence In 1971, Intel released the first commercial microprocessor (the 4004). This
brought in a paradigm shift in the way computers are made, having profound impact on
the advent of the field of computer graphics. In addition, the adolescence period (1970–
1981) saw both the developments of important techniques for realistic and 3D graphics as
well as several applications of the nascent field, particularly in the field of entertainment
Fig. 1.5 The use of the sketchpad software with a light pen to create precise drawings
Source: https://design.osu.edu/carlson/history/lesson3.html
Fig. 1.6 The first CAD system by IBM (called DAC-1)—DAC stands for Design Augmented
by Computer
i i
i i
i i
and movie making, which helped to popularize the field. Notable developments during this
period include the works on lighting model, texture and bump mapping, and ray tracing.
This period also saw the making of movies such as the Westworld (1973, first movie to
make use of computer graphics) and Star Wars (1977). The worldwide success of Star Wars
demonstrated the potential of computer graphics.
Adulthood The field entered its adulthood period (1981 onwards) standing on the plat-
form created by these pioneering works and early successes. The year saw the release of
the IBM PC by IBM, which helped computers to proliferate among the masses. In order
to cater to this new and emerging market, the importance of computer graphics was felt
more intensely. The focus was shifted from graphics for expert to graphics for laymen. This
shift in focus accelerated works for developing new interfaces and interaction techniques
and eventually gave rise to a new field of study: the human-computer interaction or HCI
in short.
The development of software and hardware related to computer graphics has become
a self-sustaining cycle now. As more and more user-friendly systems emerge, they create
more and more interest among people. This in turn brings in new enthusiasm and invest-
ments on innovative systems. The cycle is certainly helped by the huge advancements in
processor technology (from CPU to GPU), storage (from MB to TB), and display (CRT to
touch screen and situated walls) technology. The technological advancements have brought
in a paradigm shift in the field. It is now possible to develop algorithms to generate photo-
realistic 3D graphics in real time. Consequently the appeal and application of computer
graphics have increased manifold. The presence of all these factors implies that the field is
growing rapidly and will continue to grow in the foreseeable future. Table 1.1 highlights the
major developments in computer graphics.
Table 1.1 Major developments in computer graphics
i i
i i
i i
8 Computer Graphics
i i
i i
i i
Host computer
Video Video
memory controller
Display screen
converts the digital image to analog voltages that drive electro-mechanical arrangements,
which ultimately render the image on the screen.
Let us delve a little deeper to understand the working of the graphics system. The
process of generating an image for rendering is a multi-stage process, involving lots of
computation (we shall discuss these computations in subsequent parts of the book). If
all these computations are to be carried out by the CPU, then the CPU may get very
less time for doing other computational tasks. As a result, the system cannot do much
except graphics. In order to avoid such situations and increase system efficiency, the
task of rendering is usually carried out by a dedicated component of the system (the
i i
i i
i i
10 Computer Graphics
graphics card in our computers) having its own processing unit (called GPU or graphics
processing unit). The CPU, when encountering the task of displaying something, simply
assigns the task to this separate graphics unit, which is termed as the display controller
in Fig. 1.7.
Thus the display controller generates the image to be displayed on the screen. The gen-
erated image is in digital format (strings of 0s and 1s). The place where it is stored is the
video memory, which in modern systems is part of the separate graphics unit (the VRAM
in the graphic card). The display screen, however, contains picture elements or pixels (such
as phosphor dots and gas-filled cells). The pixels are arranged in the form of a grid. When
these pixels are excited by electrical means, they emit lights with specific intensities, which
give us the sensation of the colored image on the screen. The mechanism for exciting pixels
is the responsibility of the video controller, which takes as input the digital image stored in
the memory and activates suitable electro-mechanical mechanism such that the pixels can
emit light.
Graphic Devices
Graphic devices can be divided into two broad groups, based on the method used for
rendering (or excitation of pixels): (a) vector scan devices and (b) raster scan devices.
Vector scan devices In vector scan (also known as random-scan stroke-writing, or cal-
ligraphic), an image is viewed as composed of continuous geometric primitives such as
lines and curves. Clearly, this is what most of us intuitively think about images. From
the system’s perspective, the image is rendered by rendering these primitives. In other
words, a vector scan device excites only those pixels of the grid that are part of these
primitives.
Raster scan devices In contrast, in raster scan devices, the image is viewed as repre-
sented by the whole pixel grid. In order to render a raster image, it is therefore necessary
to consider all the pixels. This is achieved by considering the pixels in sequence (typically
left to right, top to bottom). In other words, the video controller starts with the top-left
pixel. It checks if the pixel needs to be excited. Accordingly, it excites the pixel or leaves
it unchanged. It then moves to the next pixel on the right and repeats the steps. It con-
tinues till it reaches the last pixel in the row. Afterward, the controller considers the first
pixel in the next (below the current) row and repeats the steps. This continues till the
right-bottom pixel of the grid. The process of such sequential consideration of the pixel
i i
i i
i i
grid is known as scanning. Each row in the pixel grid in a scanning system is called a
scan line.
This difference between the two rendering methods is illustrated in Fig. 1.8.
Refreshing An important related concept is refreshing. The lights emitted from the pixel
elements, after excitation, starts decaying over time. As a result, the scene looks faded
on the screen. Also, since the pixels in a scene get excited at different points of time,
they do not fade in sync. Consequently, the scene looks distorted. In order to avoid such
undesirable effects, what is done is to keep on exciting the pixels periodically. Such peri-
odic excitation of the same pixels is known as refreshing. The number of times a scene
is refreshed per second is called the refresh rate, expressed in Hz (Hertz, the frequency
unit). Typical refresh rate required to give a user the perception of static image is at
least 60 Hz.
So far, we have discussed the broad concepts of the display system, namely the video
controller, the pixel grid, and the raster and vector displays. The discussion was very generic
and applies to any graphic system. Let us understand these generic concepts in terms of an
actual system, namely the cathode ray tube (CRT) displays. Although CRTs are no longer in
wide use, discussion on CRT serves pedagogical purpose as it enables us to discuss all the
relevant concepts.
Fig. 1.8 Difference between vector and raster scan devices (a) Image to be rendered on the
pixel grid (b) Vector scan method—only those pixels through which the line passes are excited
(black circles) (c) Raster scan method—all pixels are considered during scanning—white cir-
cles show the pixels not excited; black circles show excited pixels; arrows show scanning
direction
i i
i i
i i
12 Computer Graphics
Differentiate between refresh rate and frame rate. Does higher frame rate ensure
better image quality?
The frame rate of a computer is how often a video No, frame rate is not a measure of how good the
processing device can calculate the intensity val- display is. Too high a frame rate is not beneficial
ues for the next frame to be displayed, that is, as any frames sent for rendering above the dis-
the rate at which the frame buffer is filled with play’s refresh rate are not rendered and simply
new intensity values. Refresh rate refers to how lost. Thus it is important to have a display capa-
often the display device can actually render the ble of high refresh rates in order to synchronize
image. the two.
Graphite
Vertical coating
First deflecting
Heater anode plates
Cathode Second
anode
Control Horizontal Screen
grid deflecting (Phosphor
plates coating inside)
(a) (b)
Fig. 1.9 (a) Typical CRT (b) Schematic representation of inner working of CRT
i i
i i
i i
In summary, therefore, in a CRT, the electron gun generates cathode rays, which hits
the phosphor dots to emit light, eventually giving us the sensation of image. Now, let us
go back to the terms we introduced in the previous section and try to understand those
concepts in light of the CRT displays. The video controller of Fig. 1.7 is the system
responsible for generating requisite voltage/fields to generate cathode rays of appropriate
intensity and guide the rays to hit specific phosphor dots (pixels) on the screen. In vec-
tor displays, the video controller guides the electron gun to only the pixels of interest. In
case of raster systems, the electron gun actually moves in a raster sequence (left to right,
top to bottom).
What we have discussed so far assumes that each pixel is a single phosphor dot. By vary-
ing the intensity of the cathode ray, we can generate light of different intensities. This gives
us images having different shades of gray at the most. In the case of color displays, the
arrangement is a little different. We now have three electron guns instead of one. Similarly,
each pixel position is composed of three phosphor dots, each corresponding to the red (R),
Peak Peak
intensity intensity
Pixel Pixel
center center
i i
i i
i i
14 Computer Graphics
green (G), and blue (B) colors. As you know, these three are called the primary colors.
Any color can be generated by mixing these three in appropriate quantities. The same pro-
cess is simulated here. Each electron gun corresponds to one of the R, G, and B dots.
By controlling electron gun intensities separately, we can generate different shades of R,
G, and B for each pixel, resulting in a new color. Figure 1.10 shows the schematic of
the process.
There are two ways computer graphics with color displays are implemented. In the first
method, the individual color information for each of the R, G, and B components of a pixel
are stored directly in the corresponding location of the frame buffer. This method is called
direct coding. Although the video controller gets the necessary information from the frame
buffer directly to drive the electron guns, the method requires a large frame buffer to store
all possible color values (that means all possible combinations of the RGB values. This set is
also known as color gamut). An alternative scheme, used primarily during the early periods
of computers when the memory was not so cheap, makes use of a color look-up table (CLT).
In this scheme, a separate look-up table (a portion of memory) is used. Each entry (row) of
the table contains a specific RGB combination. There are N such combinations (entries) in
the table. The frame buffer location in this scheme does not contain the color itself. Instead,
a pointer to the appropriate entry in the table that contains the required color is stored. The
scheme is illustrated in Fig. 1.11.
Note that the scheme is based on a premise: we only require a small fraction of the whole
color gamut in practice and we, as designers, know those colors. If this assumption is not
valid (i.e., we want to generate high quality images with large number of colors), then the
CLT method will not be of much use.
Picture tube
Electron guns
Electron beams
Color signals
Electron beams Shadow mask
Screen
Phosphor dots
Fig. 1.10 Schematic of a color CRT. The three beams generated by the three electron guns
are passed through a shadow mask—a metallic plate with microscopic holes that direct the
beams to the three phosphor dots of a pixel.
i i
i i
i i
Color R G B
Frame buffer index
0 0 0 0
0 0 0 0 0 0
0 7 7 7 6 1 102 255 53
0 7 7 7
2 255 255 204
0 0 0
0 0 4 255 102 153
0 ...
7 102 0 51
Highlight the advantage of color look-up scheme over direct coding with an example.
Suppose each of the R, G, and B is represented shall use only 256 colors (combinations of R, G
with 8 bits. That means we can have between 0 and B).We shall keep these 256 colors in the look-
and 255 different shades for each of these pri- up table. Thus the size of the table is 256 with
mary colors. Consequently, the total number of each row having 24 bits color value. Each of the
colors (the color gamut) that we can generate is table location requires 8 bits to access. Therefore,
256 × 256 × 256 = 16 M. to access any table location, we need to have 8
Direct coding: The size of each frame buffer bits in each frame buffer location. What will be our
location is 8 × 3 = 24 bits. Thus, the size of the storage requirement for the 100 × 100 screen? It
frame buffer for a system with resolution 100 × is the frame buffer size + the table size. In other
100 will be 24 × 100 × 100 = 234 Kb. words,the size of the frame buffer will be,(8 × 100
Color look-up scheme: Assume that out of 16 M × 100) + (256 × 24) = 84 Kb, much less than the
possible colors, we know that in our image we 234 Kb required for direct coding.
i i
i i
i i
16 Computer Graphics
Scan conversion The device coordinate is a continuous space. However, the display, as
we have seen before, contains a pixel grid, which is a discrete space. Therefore, we need
to transfer the viewport on the (continuous) device coordinate to the (discrete) screen coor-
dinate system. This process is called scan conversion (also called rasterization). An added
concern here is how to minimize distortions (called aliasing effect) that result from the
transformation from continuous to discrete spaces. Anti-aliasing techniques are used during
the scan conversion stage, to minimize such distortions. Scan conversion with anti-aliasing
together forms the last and final stage of the 3D graphics pipeline.
The stages of the pipeline are shown in Fig. 1.12. We shall discuss each of these stages
in subsequent chapters of the book. However, the sequence of the stages mentioned here is
purely theoretical. In practice, the sequence may not be followed strictly. For example, the
i i
i i
i i
Local to world
Modeling transformation coordinate
(Second stage)
World coordinate
Lighting
(Third stage)
World to view
coordinate
Viewing
transformation
View coordinate
Clipping
3D view to 2D view
Projection coordinates
transformation
Fig. 1.12 Stages of 3D graphics pipeline—Boxes in the second column show the substages
of the fourth stage; the third column shows the coordinate systems in which each stage
operates
assigning of colors (third stage) may be performed after projection to reduce computation.
Similarly, the hidden surface removal may be performed after projection.
i i
i i
i i
18 Computer Graphics
Examples of graphic libraries include OpenGL (stands for Open source Graphics Library)
and DirectX (by Microsoft).
These APIs are essentially predefined sets of functions, which, when invoked with the
appropriate arguments, perform the specific tasks. Thus, these functions eliminate the need
for the programmer to know every detail of the underlying system (the processor, mem-
ory, and OS) to build a graphics application. For example, the function ‘xyz’ in OpenGL
assigns the color abc to a 3D point. Note that the color assignment does not require the pro-
grammer to know details such as how color is defined in the system, how such information
is stored (which portion of the memory) and accessed, how the operating system manages
the call, which processor (CPU/GPU) handles the task, and so on. Graphics applications
such as painting systems, CAD tools, video games, or animations are developed using these
functions.
SUMMARY
In this chapter, we have touched upon the background required to understand topics discussed
in later chapters. We have briefly seen some of the applications and discussed the history of
the field, along with the current issues. We also got introduced to the generic architecture of
a graphics system and brief overview of important concepts such as display controller, video
controller, frame buffer, pixels, vector and raster devices, CRT displays, and color coding meth-
ods. We shall make use of this information in the rest of the book. The other important concept
we learnt is the graphics pipeline. The rest of the book shall cover the stages of the pipeline. In
the following chapter, we introduce the first stage: the object representation techniques.
BIBLIOGRAPHIC NOTE
There are many online sources that give good introductory idea on computer graphics. The
website https://design.osu.edu/carlson/history/lessons.html includes in-depth discussion, with
illustrative images, of the historical evolution and application areas of computer graphics. The
textbooks on computer graphics by Hearn and Baker [2004] and Foley et al. [1995] also con-
tain comprehensive introduction to the field. There is a rich literature on application of computer
graphics techniques to various domains as diverse as aircraft design Bouquet [1978], energy
exploration Gardner and Nelson [1983], scientific data visualization Hearn and Baker [1991]
and visualization of music Mitroo et al. [1979]. A good source to learn about the past works,
current trends, and research direction in the field are the issues of the well-known journals and
conference proceedings in the field. Links to various bibliographics resources can be found in
http://www.cs.rit.edu/~ncs/graphics.html. Also see the Bibliographic Note section of Chapter 11
for more references on graphics hardware.
KEY TERMS
Color look-up scheme – color management scheme used in computer graphics in which the
color information is stored in a separate table.
CRT – stands for the cathode ray tube, a very popular technology till the recent times to design
display screens
Direct coding – color management scheme used in computer graphics in which the color
information is stored in the frame buffer itself
i i
i i
i i
Display controller – generic name given to the component of a graphics system that converts
abstract object definitions to bit strings
Electron gun – hardware to excite pixels on a CRT screen
Frame buffer – video memory of raster scan systems
Frame rate – rate (bits/second) at which the frame buffer is filled up
Graphics pipeline – set of stages in sequence that are used to synthesize an image from object
definitions
Input device – hardware for interacting with screen
Pixel – a picture element or each point on the grid used to design display screens
Refresh rate – rate (times/second) at which the screen is redrawn
Refreshing – process by which the screen is redrawn periodically
SAGE system – the first computer graphics system
Sketchpad – the first interactive computer graphics system
Vector and raster graphics – the two types of image representation techniques used in computer
graphics
Vector and raster scan – the two types of techniques used to render images on the screen
Video controller – hardware used to render the actual on-screen image from the bit strings stored
in video memory
Video memory – memory unit used to store the bit strings that represent a (synthesized) image
Visualization – the techniques to visualize real or abstract objects or events
EXERCISES
1.1 At the beginning of this chapter, we learnt a few areas of application of computer graphics.
As we mentioned, such applications are numerous. Find out at least 10 more applications
of computer graphics (excluding those mentioned at the beginning).
1.2 Suppose you are trying to develop a computer game for your iPhone. Make a list of all the
issues that are involved (Hint: combine the discussions of Sections 1.2 and 1.4).
1.3 In Figure 1.7, the generic architecture of a typical graphics system is shown. As we know,
almost all the electronic devices we see around us contain some amount of graphics.
Therefore, they can be dubbed as graphics systems. In light of the understanding of the
generic architecture, identify the corresponding components (i.e., the display, the two con-
trollers, video memory, I/O mechanism) for the following types of devices (you can select
any one commercial product belonging to each of the categories and try to find out the
name of the underlying technologies).
(a) Digital watch
(b) Smartphone
(c) Tablet
(d) ATM
(e) HD TV
1.4 Explain double buffering. Why do we need it?
1.5 While trying to display a scene on the screen, you observe too much flickering. What may
be the cause for this? How can the flickering be overcome?
1.6 In some early raster displays, refreshing was performed at 30 Hz, which was half that of
the minimum refresh rate required to avoid flicker. This was done due to technological lim-
itations as well as to reduce system cost. In order to avoid flickers in such systems, the
scan lines were divided into odd set (1, 3, 5 · · · ) and even set (2, 4, 6 · · · ). In each cycle,
only one set of lines was scanned. For example, first scan odd set, then even set, then odd
set, and so on. This method is called interlacing. List with proper explanation all the factors
that determined if the method would work.
i i
i i
i i
20 Computer Graphics
1.7 Assume you are playing an online multiplayer game on a system having a 100 × 100 color
display with 24 bits for each color. A double buffering technique is used for rendering. Your
system is receving 5 MBps of data from the server over the network. Assuming no data
loss, will you experience flicker?
1.8 What is the main assumption behind the working of the color look-up table? Suppose you
have a graphic system with a display resolution of 64 × 64. The color look-up table has
16 entries. Calculate the percentage saving of space due to the use of the table. Assume
colors are represented by 24 bits.
1.9 Discuss why vector graphics is good for drawing wire frames but not filled objects. Suppose
that you are a gaming enthusiast. Will you prefer a vector-based system or a raster-based
system?
1.10 In a raster scan system, the scanning process starts (from top-left pixel) with the application
of a vertical sync pulse (Vp). The time between the excitation of the last (right-most) pixel
in the current scan line and that of the first (left-most) pixel of the next scan line is known
as the horizontal retrace time (HT). Scanning for the next line starts with the application of
a horizontal sync pulse (Hp). The time it takes to reset the scanning process for the next
frame (i.e. the time gap between the excitation of the bottom-right pixel of the current frame
and the top-left pixel of the next frame) is known as the vertical retrace time (VT). Calculate
the desirable value of M for a CRT raster device having resolution M × 100 with HT = 5 s,
VT = 500 s and 1 s electron gun movement time between two pixels along a scan line.
1.11 Assume you have a raster device with screen resolution = 100 × 100. It is a color display
with 9 bits/pixel. What should be the access rate (i.e., time required to access each bit) of
the video memory to avoid flicker in this system?
1.12 Consider the two objects shown in Fig. 1.13(a).
We want to render a scene in which the bar (the right object in (a)) is inside the hole of
the cube (left object in (a)), as shown in (b). Discuss the tasks performed in each pipeline
stage for rendering the scene.
(a) (b)
i i
i i
i i
CHAPTER
Modeling
3 Transformations
Learning Objectives
After going through this chapter, the students will be able to
• Get an idea of modeling transformations
• Learn about the four basic modeling transformations—translation, rotation, shearing, and
scaling—in both two and three dimensions
• Understand the homogeneous coordinate system used for representing modeling trans-
formations
• Have a basic understanding of the matrix representations of modeling transformations
• Learn and derive composite transformations from the basic transformations through
matrix multiplications, both in two and three dimensions
INTRODUCTION
We have come across different methods and techniques to represent objects in Chapter 2.
These techniques, however, allow us to represent objects individually, in what is called
local/object coordinates. In order to compose a scene, the objects need to be assembled
together in the so called scene/world coordinate system. That means, at the time of defin-
ing the objects, the shape, size, and position of the object is not important. However, when
individual objects are assembled in a scene, these factors become very important. Conse-
quently, we have to perform operations to transform objects (from its local coordinate to
the scene/world coordinate). The stage of the graphics pipeline in which this transformation
takes place is known as the modeling transformation. Figure 3.1 illustrates the idea.
Thus, modeling transformation effectively implies applying some operations on the object
definition (in local coordinate) to transform them as a component of the world coordinate
scene. There are several such operations possible. However, all these operations can be
derived from the following basic operations (the letter in the parenthesis beside each name
shows the common notation for the transformations).
Translation (T) Translates the object from one position to another position
Rotation (R) Rotates the object by some angle in either the clockwise or anticlockwise
direction around an axis
i i
i i
i i
Modeling Transformations 51
Y
Y
Y
X X X
Z Z
Z
(a) (b)
Fig. 3.1 In (a), two objects are defined in their local coordinates. In (b), these objects are
assembled (in world scene coordinate) to compose a complex scene.
Note:In a scene,objects are used multiple times,at different places,and in different sizes.The oper-
ations required to transform objects from their local coordinate to world coordinate are collectively
called modeling transformations.
x′ = x + tx (3.1a)
′
y = y + ty (3.1b)
Unlike translation where linear displacement takes place, rotation involves angular dis-
placement. In other words, the point moves from one position to another on a circular track
about some axis. For simplicity, let us assume that we want to rotate a point around the
Z-axis counterclockwise by an angle φ. The scenario is shown in Fig. 3.3.
i i
i i
i i
52 Computer Graphics
Y
x′ p′(x′, y′)
ty
x
p(x, y)
y y′
X
tx
As the figure shows, the new coordinates can be expressed in terms of the old coordinates
as in Eq. 3.2, where r is the radius of the circular trajectory.
x′ = sx x (3.3a)
′
y = sy y (3.3b)
Y
x′
f r x, y
q
X
i i
i i
i i
Modeling Transformations 53
When the scaling factor is same along both the X and Y directions, the scaling is called
uniform. Otherwise, it is differential scaling. Thus, in order to scale up/down any object,
we need to apply Eq. 3.3 on its surface points, with scaling factor greater/less than one, as
illustrated in Fig. 3.4. Note in the figure that the application of scaling repositions the object
also (see the change in position of vertices in the figure).
We have so far seen transformations that can change the position and size of an object.
With shearing transformation, we can change the shape of an object. The general form
of the transformation to determine the new point (x′ , y′ ) from the current point (x, y) is
shown in Eq. 3.4, where shx and shy are the shearing factors along the X and Y directions,
respectively.
x′ = x + shx y (3.4a)
′
y = y + shy x (3.4b)
Similar to the scaling operation, we can apply Eq. 3.4 to all the surface points of the object
to shear it, as illustrated in Fig. 3.5. Note in the figure that shearing may also reposition the
object like scaling (see the change in position of vertices in the figure).
Y Y
sx = 1/2
sy = 1/3
4
2
2
1
X X
3 9 1 3
(a) (b)
Fig. 3.4 Illustration of the scaling operation. Figure 3.4(a) is shrunk to the shape shown in
Fig. 3.4(b). Note that the new vertices are obtained by applying Eq. 3.3 on the vertices of the
object.
Y Y
shx = 1/2
shy = 0
4 4
2 2
X X
3 9 3 9
(a) (b)
Fig. 3.5 The shearing of the object along horizontal direction (characterized by a positive
shear factor along X-direction and a shear factor of 0 along the Y-direction). Figure 3.5(a) is
distorted to the shape shown in Fig. 3.5(b). The new vertices are obtained by applying Eq. 3.4
on the vertices of the object.
i i
i i
i i
54 Computer Graphics
Table 3.1 Four basic types of geometric transformations along with their characteristics
Translation x ′ = x + tx
[T(tx , ty )] y ′ = y + ty
i i
i i
i i
Modeling Transformations 55
Any point of the form (xh , yh , 0) is assumed to be at infinity and the point (0, 0, 0) is not
allowed in this coordinate system.
When we represent geometric transformations in homogeneous coordinate system, our
earlier 2 × 2 matrices will transform to 3 × 3 matrices (in general, any N × N transforma-
tion matrix is converted to N + 1 × N + 1 matrix). Also, for geometric transformations, we
consider h = 1 (in later chapters, we shall see other transformations with h 6= 1). With these
changes, the matrices for the four basic transformations are given in Table 3.2.
i i
i i
i i
56 Computer Graphics
C′ B′
1
2 5 A′
D′
B
C
1
D
A 5
(a) (b)
Fig. 3.6 Example of composite transformation. The object of the Fig. 3.6(a) is transformed
to the object in Fig. 3.6(b), after application of a series of transformations
Clearly, the transformation of the object ABCD (in local coordinate) to A′ B′ C′ D′ (in
world coordinate) is not possible with a single basic transformation. In fact, we need two
transformations: scaling and translation.
How we can calculate the new object vertices? We shall follow the procedure as before,
namely multiply the current vertices with the transformation matrix. Only, here we have a
transformation matrix that is the composition of two matrices, namely the scaling matrix and
the translation matrix. The composite matrix is obtained by multiplying the two matrices in
sequence, as shown in the following steps.
Step 1: Determine the basic matrices
Note that the object is halved
in length while the height remains the same. Thus,
0.5 0 0
the scaling matrix is, S = 0 1 0 . The current vertex D(0, 0) has now posi-
0 01
tioned at D′ (5, 5). Thus, there are 5 unit displacements along both horizontal
and
105
vertical directions. Therefore, the translation matrix is, T = 0 1 5 .
001
Step 2: Obtain the composite matrix
The composite matrix is obtained by multiplying the basic matrices in sequence.
We follow the right-to-left rule in forming the multiplication sequence. The
first transformation applied on the object is the rightmost in the sequence. The
next transformation is placed on the left and we continue in this way till the
last transformation. Thus, our composite matrix for the example is obtained as
follows.
105 0.5 0 0 0.5 0 5
M = T.S = 0 1 5 0 1 0 = 0 1 5
001 0 01 0 01
Step 3: Obtain new coordinate positions
Next, multiply the surface points with the composite matrix as before, to obtain
the new surface points. In this case, we simply multiply the current vertices with the
composite matrix to obtain the new vertices.
i i
i i
i i
Modeling Transformations 57
0.5 0 5 2 6
′
A = MA = 0 15 0 = 5
0 01 1 1
0.5 0 5 2 6
B′ = MB = 0 1 5 1 = 6
0 01 1 1
0.5 0 5 0 5
′
C = MC = 0 15 1 = 6
0 01 1 1
0.5 0 5 0 5
D′ = MD = 0 1 5 0 = 5
0 01 1 1
Note that the results obtained are in homogeneous coordinates. In order to obtain the
Cartesian coordinates, we divide the homogeneous coordinate values with the homogeneous
factor, which is 1 for geometric transformations. Thus, the Cartesian coordinates of the final
vertices are as follows:
A′ = (6/1, 5/1) = (6, 5),
B′ = (6/1, 6/1) = (6, 6),
C′ = (5/1, 6/1) = (5, 6), and
D′ = (5/1, 5/1) = (5, 5)
In composite transformations, we multiply basic matrices. We know that matrix multi-
plication is not commutative. Therefore, the sequence is very important. If we form the
sequence wrongly, then we will not get the correct result. In the previous example, if we
form the composite matrix M as M = S.T, then we will get the wrong vertices (do the
calculations and check for yourself).
How have we decided in the previous example, the sequence, namely first scaling and
then translation? Remember that in scaling, the position changes. Therefore, if we translate
first to the final position and then scale, the vertex positions would have changed. In stead, if
we first scale with respect to the fixed point D (the origin) and then translate the object (by
applying the same displacement to all the vertices), then the problem of repositioning of the
vertices will not occur. That is precisely what we did.
The example before is a special case where the fixed point was the origin itself. In general,
the fixed point can be anywhere in the coordinate space. In such cases, we shall apply the
aforementioned approach with slight modification.
Suppose we want to scale with respect to the fixed point F(x, y). In order to determine the
composite matrix, we assume the following sequence of steps.
1. The fixed point is translated to origin (−x and −y units of displacements in the horizontal
and vertical directions, respectively).
2. Scaling is performed with respect to origin.
3. The fixed point is translated back to its original place.
i i
i i
i i
58 Computer Graphics
C B C′ B′
1 1
5 A 5 A′
D D′
5 5
(a) (b)
Fig. 3.7 Example of scaling with respect to an arbitrary fixed point (a) Object definition
(b) Object position in world coordinate
Thus, the composite matrix M = T(tx = x, ty = y).S(sx , sy ).T(tx = −x, ty = −y). Figure
3.7 illustrates the concept. This is a modification of Fig. 3.6. In this, the object is now defined
(Fig. 3.7(a)) with the vertices A(7, 5), B(7, 6), C(5, 6), and D(5, 5). Its world coordinate posi-
tion is shown in Fig. 3.7(b). Note that the scaling is done keeping D(5, 5) fixed. Hence, the
composite matrix M = T(tx = 5, ty = 5)S(sx = 0.5, sy = 1)T(tx = −5, ty = −5).
A similar situation arises in the case of rotation and shearing. In rotation, so far we
assumed that the object is rotated around the Z-axis. In other words, we assumed rotation
with respect to the origin through which the Z-axis passes. Similar to scaling, we can derive
the rotation matrix with respect to any fixed point in the XY coordinate space through which
the rotation axis (parallel to the Z-axis) passes. We first translate the fixed point to origin
(aligning the axis of rotation with the Z-axis), rotate the object, and then translate the point
back to its original place. Thus, the composite matrix is M = T(tx = x, ty = y).R(φ).T(tx =
−x, ty = −y), where (x, y) is the fixed point coordinate. Composite matrix for shearing
with respect to any arbitrary (other than origin) fixed point is derived in a similar way:
M = T(tx = x, ty = y).Sh(shx , shy ).T(tx = −x, ty = −y).
What happens when more than one basic transformation is applied to an object with
respect to any arbitrary fixed point? We apply the same process to obtain the compos-
ite transformation matrix. We first translate the fixed point to origin, perform the basic
Rotation, scaling, and shearing with respect to any arbitrary fixed point (other than origin)
The transformation matrix is obtained as a composition of basic transformations. All follow the same
procedure.
1. Translate the fixed point to origin.
2. Perform the transformation (rotation/scaling/shearing).
3. Translate the fixed point back to its original place.
The transformation matrix at the fixed point (x, y) is
T(tx = x, ty = y).R(φ).T(tx = −x, ty = −y) Rotation
M = T(tx = x, ty = y).S(sx , sy ).T(tx = −x, ty = −y) Scaling
T(t = x, t = y).Sh(sh , sh ).T(t = −x, t = −y) Shearing
x y x y x y
i i
i i
i i
Modeling Transformations 59
A′
A 1
1
5 5
5 5
Fig. 3.8 Example of composite transformations with respect to an arbitrary fixed point (5,5).
The object in Fig. 3.8(a) is transformed in Fig. 3.8(b). Note that two basic transformations are
involved: scaling and rotation.
transformations in sequence, and then translate the fixed point back to its original place.
An example is shown in Fig. 3.8.
Figure 3.8(a) shows the object (cylinder) with length 2 units and diameter 1 unit, defined
in its own (local) coordinate. The cylinder is placed on the roof of the house in Fig. 3.8(b)
(world coordinate), after scaling it horizontally by half and rotating it 90◦ anticlockwise with
respect to the fixed point (5,5). How to compute the new (world coordinate) position of the
object? We apply the approach outlined before.
Step 1: Obtain the composite matrix.
(a) Translate the fixed point (5,5) to origin.
(b) Scale by 1/2 in X-direction.
(c) Rotate anticlockwise by 90◦ .
(d) Translate the fixed point back to (5,5).
Composite matrix M =
T(tx = 5, ty = 5)R(90◦ )S(sx = 0.5, sy = 1)T(t − x = −5, ty = −5)
105 0 −1 0 0.5 0 0 1 0 −5 0 −10 10
= 0 1 5 1 0 0 0 1 0 0 1 −5 = 0.5 0 2.5
001 0 0 1 0 0 1 00 1 0 0 1
Step 2: Multiply surface point (column vectors) with composite matrix to obtain new
position (left as an exercise).
3.4 TRANSFORMATIONS IN 3D
Three-dimensional transformations are similar to 2D transformations, with some minor
differences.
1. We now have 4 × 4 transformation matrices (in homogeneous coordinate system) instead
of 3 × 3. However, the homogeneous factor remains the same (h = 1).
2. In 2D, all rotations are defined about Z-axis (or an axis parallel to it). However, in 3D, we
have three basic rotations, with respect to each of the principle axes X, Y, and Z. Also,
i i
i i
i i
60 Computer Graphics
the transformation matrix for rotation about any arbitrary axis (any axis of rotation other
than the principle axes) is more complicated than in 2D.
3. The general form of the shearing matrix is more complicated than in 2D.
In shearing, we can now define distortion along one or two directions keeping one direc-
tion fixed. For example, we can shear along X and Y directions, keeping Z direction fixed.
Therefore, the general form looks different from the one in 2D.
The transformation matrices for translation, rotation, and scaling in 3D are shown in
Table 3.3, which are similar to their 2D counterparts, except that there are three rotation
matrices in 3D.
The composite matrix for scaling and shearing with respect to any arbitrary fixed point is
determined in a similar way as in 2D, namely translate the fixed point to origin, perform the
operation and then translate the point back to its original place. Rotation about any arbitrary
axis, however, is more complicated as discussed next.
Table 3.3 The matrix representation (in homogeneous coor-
dinates) of the three basic geometric transformations in 3D
(Contd)
i i
i i
i i
Modeling Transformations 61
M = T −1 R−1 −1
X (α)RY (β)RZ (θ)RY (β)RX (α)T
Note that the inverse rotations essentially change the sign of the angle. For example, if
α is defined counterclockwise, then the inverse rotation will be clockwise. In other words,
R−1
X (α) = RX (−α).
i i
i i
i i
62 Computer Graphics
Y Y Y
P1 P1
P2
P2 P2
X X X
P1″
Initial position Step 1: Translate Step 2: Align the line
Z Z the line to Z with Z-axis (rotate
origin about X- and Y-axis)
Y Y Y
P1
P1′
P2
P2 P2′
X X X
P1″ Step 5: Translate
Step 3: Rotate the Step 4: Rotate the
the line back to its
Z object (about Z-axis) Z line to its original Z original positions
orientation
Example 3.1
An object ABCD is defined in its own coordinate as A(1, 1, 0), B(3, 1, 0), C(3, 3, 0), and D(1, 3, 0).
The object is required to construct a partition wall A′ B′ C ′ D′ in a world-coordinate scene (A′ cor-
responds to A and so on). The new vertices are A′ (0, 0, 0), B′ (0, 4, 0), C ′ (0, 4, 4), and D′ (0, 0, 4).
Calculate the composite transformation matrix to perform the task.
Y
Y
B′
D C
C′
A B
A′
Initial position Final position
Z D′
Z
Initially the square is in the XY plane with each side equal to 2 units and center at (2, 2, 0). The final
square is on the YZ plane with side equal to 4 units and the center at (0, 2, 2). The transformations
required are as follows:
1 It can be done in multiple ways. The solution presented here is one of those.
i i
i i
i i
Modeling Transformations 63
Example 3.2
Consider a circular track (assume negligible width, radius = 4 units, centered at origin). The track
is placed on the XZ plane. A sphere (of unit radius) is initially resting on the track with center on
the +Z axis, with a black spot on it at (0, 0, 6). When pushed, the sphere rotates around its own
axis (parallel to Y axis) at a speed of 1◦ /min as well as along the track (complete rotation around
the track requires 6 hrs). Assume all rotations are anticlockwise. Suppose an observer is present at
(3, 0, 7). Will the black spot be visible to the observer after the sphere rotates and slides down for
an hour and half?
Z
Black spot Observer
2
Y
Initial position
In the problem, we are required to determine the position of the black spot after one and half hours.
We need to determine the transformation matrix M, which, when multiplied to the initial position
of the black spot, gives its transformed location. Clearly, M is a composition of two rotations, one
for rotation of the sphere around its own axis (Raxis ) and the other for the rotation of the sphere
around the circular track (Rtrack ).
Since Raxis is performed around an axis parallel to Y-axis, we can formulate Raxis as
a composition of translation (of the axis to the Y-axis), the actual rotation with respect to
the Y axis and reverse translation (of the axis to its original place). Therefore, Raxis =
T(0, 0, 5).RY (θ).T(0, 0, −5). Since the sphere can rotate around its axis at a speed of 1◦ /min,
in one and half hours, it can rotate 90◦ . Therefore, θ = 90◦ .
i i
i i
i i
64 Computer Graphics
At the same time, the sphere is rotating along the circular track with a speed of 360◦ /6 = 60◦ /hour.
Therefore, in one and half hours, the sphere can move 90◦ along the track.
Therefore,
M = Rtrack (90◦ )Raxis (90◦ )
= RY (90◦ )T(0, 0, 5)RY (90◦ )T(0, 0, −5)
0 010 1000 0 0 1 0 10 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
=
−1 0 0 0 0 0 1 5 −1 0 01 0 0 0 1 −5
0 001 0001 0 0 0 1 00 0 1
−1 0 0 5
0 1 0 0
=
0
0 −1 5
0 0 0 1
Thus, the position of the point after one and half hours is
−1 0 0 5 0 5
0 1 0 0 0
0
P′ = MP = 0 0 −1 5 6 = −1.
0 0 0 1 1 1
The transformed position of the black spot (5, 0, −1) is clearly not visible from the observer’s
position.
SUMMARY
In this chapter, we have learnt to construct a scene from basic object definitions. The
scene is constructed by applying transformations on the objects. In the process, we learnt
the concepts of local coordinate (the coordinate system in which the object is defined)
and world coordinate (the coordinate in which the objects are assembled to construct
the scene).
There are four basic transformations, which are used either individually or in
sequence to perform various transformations on an object. The four are translation,
rotation, scaling, and shearing, which can change the position, shape, and size of an
object.
While we can represent transformations in analytical form, it is more convenient to represent
them as matrices for implementing graphics systems. In order to be able to represent all trans-
formations in matrix form, we use the homogeneous coordinate system, which is essentially an
abstraction and a mathematical trick. 2D transformations are represented as 3 × 3 matrices in
homogeneous coordinate system.
When we compose two or more basic transformations, we follow the right-to-left rule,
meaning that the first transformation is placed as the rightmost, the second transformation is
placed on the left, and so on till the last transformation. Then we multiply the transformation
i i
i i
i i
Modeling Transformations 65
matrices together to obtain the composite transformation. However, while composing the basic
transformations, it is very important to arrange them in proper sequence. Otherwise, the
composite matrix will be wrong.
The transformations in 3D are performed in almost a similar manner as 2D transformations.
The only notable differences are (a) the matrices are 4 × 4, (b) there are three basic rotation
matrices for each of X, Y and Z axis (as opposed to one in 2D), (c) the way shearing matrix
looks, and (d) the way rotation about any arbitrary axis takes place.
After the first two stages (object definition and geometric transformations), we now know
how to construct a scene. The next task is to assign colors to it, so that it looks realistic. Color
assignment is the next stage of the graphics pipeline, which we shall discuss in the next chapter.
BIBLIOGRAPHIC NOTE
The Graphics Gems series of books (Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994]
and Paeth [1995]) contains useful additional information on geometric transformation. Blinn and
Newell [1978] contains discussion on homogeneous coordinates in computer graphics. More
discussion on the use of homogeneous coordinates in computer graphics can be found in Blinn
[1993].
KEY TERMS
Composite transformation – composition (by matrix multiplication) of two/more basic modeling
transformations to obtain a new transformation
Differential scaling – when the amounts of change to an object size along the axial directions
(X , Y , and Z) are not the same
Homogeneous coordinate system – an abstract mathematical technique to represent any
n-dimensional point with a (n + 1) vector
Local/Object coordinates – the Cartesian coordinate reference frame used to represent individ-
ual objects
Modeling transformation – transforming objects from local coordinates to world coordinates
through some transformation operation
Rotation – the basic modeling transformation that changes the angular position of an object
Scaling factor – the amount of change of object size along a particular axial direction
Scaling – the basic modeling transformation that changes the size of an object
Scene/World coordinates – the Cartesian coordinate reference frame used to represent a scene
comprising of multiple objects
Shearing factor – the amount of change in shape of an object along a particular axial direction
Shearing – the basic modeling transformation that changes the shape of an object
Translation – the basic modeling transformation that changes the position of an object
Uniform scaling – when the amounts of change to an object size along the axial directions (X , Y ,
and Z) are the same
EXERCISES
3.1 What is the primary objective of the modeling transformation stage? In which coordinate
system(s) does it work?
3.2 Discuss how matrix representation helps in implementing modeling transformation in
computer graphics.
i i
i i
i i
66 Computer Graphics
3.3 Suppose you want to animate the movement of a pendulum, fixed at the point (0,5). Ini-
tially, the pendulum was on the Y-axis (along −Y direction) with its tip touching the origin.
The movement is gradual with a rate of 10◦ /s. It first moves counterclockwise for 9 s, then
returns (gradually) to its original position, then moves clockwise for 9 s, then again returns
(gradually) to its original position and continues in this manner. Determine the transforma-
tion matrix for the pendulum in terms of t. Use the matrix to determine the position of the
pendulum tip at t = 15s.
3.4 Derive the matrices for the following 2D transformations.
(a) Reflecting a point about origin
(b) Reflecting a point about the X-axis
(c) Reflecting a point about the Y-axis
(d) Reflecting a point about any arbitrary point
3.5 Explain homogeneous coordinate system. Why do we need it in modeling
transformation?
3.6 Although we have treated the shearing transformation as basic, it can be considered as a
composite transformation of rotation and scaling. Derive the shearing transformation matrix
from rotation and scaling matrices.
3.7 Consider a line with end points A(0, 0) and B(1, 1). After applying some transformation on
it, the new positions of the end points have become A′ (0, −1) and B′ (−1, 0). Identify the
transformation matrix.
3.8 A triangle has its vertices at A(1, 1), B(3, 1), and C(2, 2). Modeling transformations are
applied on this triangle which resulted in new vertex positions A′ (−3, 1), B′ (3, 1), and
C′ (2, 0). Obtain the transformation matrix.
3.9 A square plate has vertices at A(1, 1), B(−1, 1), C(−1, 3), and D(1, 3). It is translated by
5 units along the +X-direction and then rotated by 45◦ about P(0, 2). Determine the final
coordinates of the plate vertices.
3.10 Consider an object made up of a triangle ABC with A(1, 2), B(3, 2), and C(2, 3),
on top of a rectangle DEBA with D(1, 1) and E(3, 1). Calculate the new posi-
tion of the vertices after applying the following series of transformations on the
object.
(a) Scaling by half along the X-direction, with respect to the point (2, 2)
(b) Rotation by 90◦ counterclockwise, with respect to the point (3, 1)
3.11 Consider Fig. 3.12. In this figure, the thin circular ring (with radius = 1 unit) is rolling down
the inclined path with a speed of 90◦ /s. Assuming the ring rolls down along a straight
0,0,20
p
X
0,0,0
20,0,0
i i
i i
i i
Modeling Transformations 67
line without any deviation, determine the transformation matrix to obtain the coordinate
of any surface point on the ring at time t. Use the matrix to determine the position of
p at t = 10s.
3.12 Consider a sphere of diameter 5 units, initially (time t = 0) centered at the point
(10,0,0). The sphere rotates around the Z-axis counterclockwise with a speed of 1◦ /min.
An ant can move along the vertical great circular track (parallel to the XZ plane) on
the sphere counterclockwise. It is initially (t = 0) located at the point (10,0,5) and
can cover 1 unit distance along the track in 1 sec. (assume π ≈ 3). Determine the
composite matrix for the ant’s movement and use it to compute the ant’s position
at t = 10 min.
i i
i i
i i
CHAPTER
Color Models
5 and Texture
Synthesis
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the physiological process behind the perception of color
• Learn about the idea of color representation through the use of color models
• Understand additive color models and learn about RGB and XYZ models
• Understand subtractive color models and learn about the CMY model
• Learn about the HSV color model, which is popularly used in interactive painting/drawing
software applications
• Get an overview of the three texture synthesis techniques-projected texture, texture
mapping, and solid texturing
INTRODUCTION
The fourth stage of the graphics pipeline, namely the coloring of 3D points, has many
aspects. One aspect is the use of the lighting models to compute color values. This has
been discussed in Chapter 4. In the related discussions, we mentioned the basic principle
governing color computation, namely that color of a point is a psychological phenomenon
resulting from the intensities of the reflected light incident on our eyes. In this chapter, we
shall see in some detail what happens inside our eye that gives us a sensation of color. Along
with that, we shall also have a look at different color models, that are essentially alternative
representations of color aimed at simplifying color manipulation. Color models are the sec-
ond aspect of understanding the fourth stage. In addition, we shall have some discussion on
texture synthesis, which acts as an improvement of the simple lighting models to generate
more realistic effects.
i i
i i
i i
Vitreous
Lens
humor
Iris
Retina
Pupil
Central
fovea
Optical
nerve
Cornea
Aqueous
humor Ciliary
muscles
Sclera
i i
i i
i i
96 Computer Graphics
behind it (how it is perceived in our eye). What we can do is to come up with a set of
basic (or primary) colors. We then mix these colors in appropriate amounts to synthesize
the desired color. This gives rise to the notion of color models (i.e., ways to represent and
manipulate colors).
0.4 Y
fR
fB
Green (0, 1, 0) Yellow (1, 1, 0)
Color-matching
RGB amounts
fG
0.2
White (1, 1, 1)
Cyan (0, 1, 1)
X
Black Red (1, 0, 0)
0 (0, 0, 0)
400 500 600 700 l (nm)
Blue (0, 0, 1)
Magenta (1, 0, 1)
Wavelength
Z
(a) (b)
Fig. 5.2 The RGB model. In (a), the basic idea is to illustrate with the primary light waves and the
amount of the light required to generate colors. The 3D cube due to the RGB model is shown in (b).
Color is represented as a point in the cube, corresponding to specific amount of red, green, and blue.
i i
i i
i i
Since there are three primaries, we can think of a color as a point in a three dimensional
color space. The three axes of the space correspond to the three primary colors. If we are
using the normalized values of the colors (within the range [0, 1]), then we can visualize the
RGB model as a 3D color cube, as shown in Fig. 5.2(b). The cube is the color gamut (i.e., set
of all possible colors) that can be generated by the RGB model. The origin or the absence of
the primaries represent the black color whereas we get the white color when all the primaries
are present in equal amount. The diagonal connecting the black and white colors represents
the shades of gray. The yellow color is produced when only the red and green colors are
added in equal amounts in the absence of blue. Addition of only blue and green in equal
amounts with no red produces cyan, while the addition of red and blue in equal amounts
without green produces magenta.
0.5
0.4
primary amounts
Color-matching
Z
0.3
0.2 Y X
0.1
0
390 430 470 510 550 590 630 670 710
Wavelength
Fig. 5.3 The figure illustrates the three hypothecial lightwaves designed for the XYZ model
and the amounts in which they should be mixed to produce a color in the visible range.
Compare this figure with Fig. 5.2(a) and notice the difference.
i i
i i
i i
98 Computer Graphics
C = X X̂ + Y Ŷ + Z Ẑ (5.1)
For convenience, the amounts X , Y , and Z of the primary colors used to generate C
are represented in normalized forms. Calculations of the normalized forms are shown
in Eq. 5.2.
X
x= (5.2a)
X +Y +Z
Y
y= (5.2b)
X +Y +Z
Z
z= (5.2c)
X +Y +Z
0.9 520
530 Green
515
0.8 540
510 550
0.7 Cyan
505 560
0.6 570
500 Yellow
0.5 580
y
590
495
0.4
600 Orange
E 610
0.3 490 620
Achromatic point Red
650
(x = 1/3, y = 1/3) 700
0.2 485
Blue
480 Line of
0.1
purples
470
460 Violet
0.0 400
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
x
Fig. 5.4 CIE chromaticity diagram for the visible color spectrum
i i
i i
i i
530
510
Green G 550
560
500
Yellow
580
B t
RG amu Orange
r g 600
o
col
490 R
White point
Red 700
les
fp urp
480 eo
B Lin
Blue
400 Violet
purple line, is not part of the spectrum. Interior points in the diagram represent all possi-
ble visible colors. Therefore, the diagram is a 2D representation of the XYZ color gamut.
The point E represents the white-light position (a standard approximation for average
daylight).
i i
i i
i i
reflected lights from the points containing the pigments comes to our eye giving us the per-
ception of color. This process is, however, not additive. The color perception results from
the subtraction of primaries.
We can form a subtractive color model with the primaries cyan, magenta, and yellow.
Consequently, the model is called the CMY model. The primary cyan is a combination of
blue and green (see Fig. 5.2b). Thus, when white light is reflected from a cyan pigment on a
paper, the reflected light contains these two colors only and the red color is absorbed (sub-
tracted). Similarly, green component is subtracted by the magenta pigment and the primary
yellow subtracts the blue component of the incident light. Therefore, we get the color due to
the subtraction of the red, green, and blue components from the white (reflected) light by the
primaries.
We can depict the CMY model as a color cube in 3D in the same way we did it for the
RGB model. The cube is shown in Fig. 5.5. Note how we can find the location of the cor-
ner points. Clearly origin (absence of the primaries) represents the white light. When all the
primaries are present in equal amounts, we get black color since all the three components of
the white light (red, green, and blue) are absorbed. Thus, the points on the diagonal joining
white and black represents different shades of gray. When only cyan and magenta are present
in equal amount without any yellow color, we get blue because red and green are absorbed.
Similarly, presence of only cyan and yellow in equal amount without any magenta color
results in green and the red color results from the presence of only the yellow and magenta
in equal amounts without cyan.
The CMY model for hardcopy devices is implemented using an arrangement of three ink
dots, much in the same way three phosphor dots are used to implement the RGB model
on a CRT. However, sometimes four ink dots are used instead of three, with the fourth
dot representing black. In such cases, the color model is called the CMYK model with
K being the parameter for black. For black and white or gray-scale images, the black dot
is used.
Black (1, 1, 1)
Red (0, 1, 1)
X
White Cyan (1, 0, 0)
(0, 0, 0)
Yellow (0, 0, 1)
Green (1, 0, 1)
Z
Fig. 5.5 CMY color cube—Any color within the cube can be described by subtracting the
corresponding primary values from white light
i i
i i
i i
We can convert the CMY color representation to the RGB representation and vice versa,
through simple subtraction of column vectors. In order to convert CMY representation to
RGB, we use the following subtraction.
C 1 R
M = 1 − G
Y 1 B
The opposite conversion (from RGB to CMY representation) is done in a likewise manner,
as given here.
R 1 C
G = 1 − M
B 1 Y
i i
i i
i i
V
Green
Yellow
120°
1.0 Red
Cyan White
0°
Blue Magenta
240°
H
Black S
0.0
Fig. 5.6 HSV color space. Movement from boundary towards center on the same plane
represents tinting; movement across planes parallel to the V axis represents shading; and
movement across planes from boundary towards center represent toning.
plane, we are reducing S. Reduction in S value is equivalent to the tinting process. Alterna-
tively, when we move from a point on any cross-sectional plane (same or parallel to the top
hexagon) towards the hexcone apex, we are only changing V. This is equivalent to the shad-
ing process. Any other movement (from boundary to the center across planes) in the hexcone
represents the toning process.
Fig. 5.7 A block made from wood. Notice the patterns (texture) on the surface. It is not
posible to generate such patterns with only the lighting model.
i i
i i
i i
MIPMAP
MIPMAPs are special types of projected texturing method. MIP stands for Multum In Parvo or many
things in a small space. In this technique, a series of texture maps with decreasing resolutions, for the
same texture image, are stored, as illustrated in the following figure.
Original texture
1/4
1/16
1/64
etc
1 pixel
We can broadly categorize the various texture synthesis techniques into the following
three types1 .
1. Projected texture
2. Texture mapping
3. Solid texture
1 Thenames of the categories, however, are not standard. You may find different names used to denote the
same concepts.
i i
i i
i i
Example 5.1
Texture mapping method
Consider the situation shown in Fig. 5.8. On the left side is a (normalized) texture map defined in
the (u, w) space. This map is to be ‘pasted’ on a 50 × 50 square area in the middle of the object
surface as shown in the right side figure. What are the linear mappings we should use?
w y
100
1.0
100
u z 100
0 1.0
Fig. 5.8
Solution As we can see, the target surface is a square of side 100 units, on a plane parallel to the
XY plane. Therefore, the parametric representation of the target area (middle of the square) is,
x = θ with 25 ≤ θ ≤ 75
y = φ with 25 ≤ φ ≤ 75
z = 100
Now let us consider the relationships between the parameters in the two spaces with respect to the
corner points.
The point u = 0, w = 0 in the texture space is mapped to the point θ = 25, φ = 25.
The point u = 1, w = 0 in the texture space is mapped to the point θ = 75, φ = 25.
The point u = 0, w = 1 in the texture space is mapped to the point θ = 25, φ = 75.
The point u = 1, w = 1 in the texture space is mapped to the point θ = 75, φ = 75.
We can substitute these values into the linear mappings θ = Au + B, φ = Cw + D to determine
the constant values. The values thus determined are: A = 50, B = 25, C = 50, and D = 25 (left as
an exercise for the reader). Therefore, the mappings we can use to synthesize the particular texture
on the specified target area on the cube surface are θ = 50u + 25, φ = 50w + 25.
This idea is implemented in the projected texture method. We create a texture image, also
known as texture map from synthesized or scanned images. The map is a 2D array of color
values. Each value is called a texel. There is a one-to-one correspondance between the texel
array and the pixel array. We now replace the pixel color values with the corresponding texel
values to mimic the ‘pasting of the texture on the surface’. The replacement can be done in
one of the following three ways.
i i
i i
i i
1. We can replace the pixel color value on a surface point with the corresponding texel value.
This is the simplest of all.
2. Another way is to blend the pixel and texel values. Let C be the color after blending the
pixel value Cpixel and the texel value Ctexel . Then, we can use the following equation for
smooth blending of the two: C = (1 − k)Cpixel + kCtexel where 0 ≤ k ≤ 1.
3. Sometimes a third approach is used in which we perform a logical operation (AND, OR)
between the two values (pixel and texel) represented as bit strings. The outcome of the
logical operation is the color of the surface point.
Projected texture method is suitable when the target surfaces are relatively flat and facing
the reference plane (roughly related to the screen, as we shall see later). However, for curved
surfaces, it is not very useful and we require some other method, as discused in the next
section.
i i
i i
i i
Fig. 5.9 An example situation where solid texturing method is required. The white lines
show the surface boundaries. Note the continuation of the texture patterns across adjacent
surfaces.
SUMMARY
In this chapter, we have discussed the fundamental idea behind the sensation of color, through a
brief discussion on the physiology of vision. We learnt that the three cone type photoreceptors
are primarily responsible for our perception of color, which gives rise to the Tristimulus theory
of color. We also learnt that the existence of metamers, which are the various spectra related
to the generation of a color, makes it possible to synthesize colors artificially, without mimicking
the exact natural process.
Next, we discussed about the idea of representing colors through the color models. These
models typically use a combination of three primary colors to represent any arbitrary color. Two
types of combinations are used in practice: additive and subtractive. We discussed two additive
color models, namely the RGB and the XYZ models. The CMY model is discussed to illustrate
the idea of the subtractive model. Finally, we discussed the HSV model, which is primarily used
to design user interfaces for interactive painting/drawing applications.
The third topic we learnt about in this chapter is synthesis of textures/patterns on a surface, to
make the surfaces look realistic. Three types of textures synthesis techniques are introduced. In
the projected texture method, a texture pattern (obtained from a synthesized or scanned image)
is imposed on a surface through the use of blending functions. In the texture mapping technique,
a texture map/pattern defined in a two-dimensional space is mapped to the object space. The
solid texturing method extends the texture mapping idea to three dimensions, in which a 3D
texture pattern is mapped to object surfaces in three dimensions.
Once the coloring is done, we transform the scene defined in the world coordinate system
to the eye/camera coordinate system. This transformation, known as the view transformation,
is the fourth stage of the 3D graphics pipeline, which we shall learn in Chapter 6.
BIBLIOGRAPHIC NOTE
More discussion on human visual system and our perception of light and color can be found
in Glassner [1995]. Wyszecki and Stiles [1982] contains further details on the science of color.
Color models and its application to computer graphics are described in Durrett [1987], Hall
[1989] and Travis [1991]. Algorithm for various color applications are presented in the com-
puter gems series of books (Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994] and
Paeth [1995]). More on texture-mapping can be found in Demers [2002].
i i
i i
i i
KEY TERMS
Additive model – a color model that represents arbitrary colors as a sum (addition) of primary
colors
CIE chromaticity diagram – the range of all possible colors in two-dimension
CMY color model – a subtractive color model in which the cyan, magenta, and yellow are the
primary colors
Color gamut – the set of all possible colors that can be represented by a color model
Color models – the ways to represent and manipulate colors
Cones – one type of photoreceptors that help in image resolution or acuity
HSV color model – a color model typically used to design the user interfaces in painting
applications
Metamerism – the phenomenon that color perception can result from different spectra
Metamers – different spectra that gives rise to the perception of the same color
MIPMAP – Multum In Parvo Mapping, which is a special type of projected texturing technique
Photoreceptors – the parts of the eye that are sensitive to light
Primary colors – the basic set of colors in any color model, which are combined together to
represent arbitrary colors
Projected texture – the technique to synthesize texture on a surface by blending the surface color
with the texture color
RGB color model – an additive color model in which the red, green, and blue colors are the three
primary colors
Rods – one type of photoreceptors that are sensitive to lower levels of light
Solid texturing – the texture mapping technique applied in three-dimension
Subtractive model – a color model that represents arbitrary colors as a difference (subtraction)
of primary colors
Texel – the color of each pixel in the texture map grid
Texture image/texture map – a grid of color values obtained from a synthesized/scanned image
Texture mapping – the technique to synthesize texture on a surface by mapping a texture defined
in the texture space to the object space
Tristimulus theory of vision – the theory that color perception results from the activation of the
three cone types together
Visible light – a spectrum of frequencies of the electromagnetic light wave (between 400 nm to
700 nm wavelength)
XYZ color model – an additive standardized color model in which there are three hypothetical
primary colors denoted by the letters X, Y, and Z.
EXERCISES
5.1 Explain the process by which we sense colors.
5.2 It is not possible to exactly mimic the lighting process that occurs in nature. If so, how does
computer graphics work?
5.3 What is the basis for developing models with three primary colors such as RGB?
5.4 Discuss the limitation of the RGB model that is overcome with the XYZ model.
5.5 Briefly discuss the relationship between the RGB and the CMY models. When is the CMY
model useful?
5.6 Explain the significance of using the HSV model instead of the RGB or CMY models.
5.7 Mention the three broad texture synthesis techniques.
5.8 Explain the key idea of the projected texture method. How are the texels and pixels
combined?
5.9 Explain the concept of MIPMAP. How is it different from projected texture methods?
i i
i i
i i
5.10 Discuss how texture mapping works. In what respect is it different from projected texture
methods? When do we need it?
5.11 Discuss the basic idea behind solid texturing. When do we use it?
5.12 Consider a cube ABCDEFGH with side length 8 units; E is at origin, EFGH on the XY plane,
AEHD on the YZ plane and BFEA on the XZ plane (defined in a right-handed coordinate
system). The scene is illuminated with an ambient light with an intensity of 0.25 units and
a light source at location (0,0,10) with an intensity of 0.25 unit. An observer is present at
(5,5,10). Assume ka = kd = ks = 0.25 and ns = 10 for the surfaces. We want to render
a top view of the cube on the XY plane. A texture map is defined in the uw space as a
circle with center at (1.0,1.0) and radius 1.0 unit. Any point p (u,w) within this circle has
u . We need to map this texture on a circular region of radius 3 units in the mid-
intensity u+w
dle of the top of the cube. What would be the color of the points P1(4,2,8) and P2(3,1,8),
assuming Gouraud shading (ignore attenuation)? [Hint: See Example 4.1 in Chapter 4 and
Example 5.1].
i i
i i
i i
CHAPTER
3D Viewing
6
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the 3D viewing transformation stage and its importance in computer
graphics
• Set-up the 3D viewing coordinate reference frame
• Understand the mapping from the world coordinate frame to the viewing coordinate
frame
• Get an overview of the parallel and perspective projections with subcategories
• Learn to perform parallel projection of a 3D scene in the view coordinate frame to the
view plane
• Learn to perform perspective projection of a 3D scene in the view coordinate frame to
the view plane
• Understand the concept of canonical view volumes
• Learn to map objects from clipping window to viewport
INTRODUCTION
Let us recollect what we have learnt so far. First, we saw how to represent objects of varying
complexities in a scene (Chapter 2). Then, we saw how to put those objects together through
modeling transformations (Chapter 3). Once the objects are put together to synthesize the
scene in the world coordinate system, we learnt how to apply colors to make the scene real-
istic (Chapters 4 and 5). All these discussions up to this point, therefore, equipped us to
synthesize a realistic 3D scene in the world coordinate system. When we show an image on
a screen, however, we are basically showing a projection of a portion of the 3D scene.
The process is similar to that of taking a photograph. The photo that we see is basically a
projected image of a portion of the 3D world we live in. In computer graphics, this process
is simulated with a set of stages. The very first of these stages is to transform the 3D world
coordinate scene to a 3D view coordinate system (also known as the eye or camera coordi-
nate system). This process is generally known as the 3D viewing transformation. Once this
transformation is done, we then project the transformed scene onto the view plane. From the
view plane, the objects are projected onto a viewport in the device coordinate system. In this
i i
i i
i i
chapter, we shall discuss about the 3D viewing, projection, and viewport transformation
stages.
Yworld Camera
Yview
Xview
Zview
Xworld
Object
Zworld
Fig. 6.1 Visualization of the view coordinate system, defined by the mutually orthogonal
xview , yview , and zview axes. The xworld , yworld , and zworld axes define the world coordinate
system.
i i
i i
i i
3D Viewing 111
View-up vector
View-up
point
View plane
Look-at point
World coordinate
origin
n
Camera position
Fig. 6.2 Illustration of the determination of the three basis vectors for the view coordinate
system
How do we set-up the view coordinate system. The first thing is to determine the ori-
gin, where the three axes meet. This is easy. We assume that the camera is represented
as a point in the 3D world coordinate system. We simply choose this point as our origin
(denoted by oE). However, determining the vectors uE, Ev, and nE that meet at this origin is
tricky.
When we try to bring something into focus with our camera, the first thing we do is to
choose a point (in the world coordinate). This is the center of interest or the look-at point
(denoted by pE). Then, using simple vector algebra, we can see that nE = oE − pE, as depicted in
nE
Fig. 6.2. Finally, we normalize nE as n̂ = to get the unit basis vector.
|En|
Next, we specify an arbitrary point (denoted by pEup ) along the direction of our head while
looking through the camera. This is the view-up direction. With this point, we determine the
VEup
view-up vector VEup = pEup − oE (see Fig. 6.2). Then, we get the unit basis vector v̂ = .
|VEup |
We know that the unit basis vector û is perpendicular to the plane spanned by n̂ and
v̂ (see Fig. 6.2). Hence, û = v̂ × n̂ (i.e., the vector cross-product assuming a right-
handed coordinate system). Since both n̂ and v̂ are unit vectors, we do not need any further
normalization.
Example 6.1
Consider Fig. 6.3. We are looking at the square object with vertices A(2,1,0), B(2,3,0), C(2,3,3)
and D(2,1,3). The camera is located at the point (1,2,2) and the look-at point is the cen-
ter of the object (2,2,2). The up direction is parallel to the positive z direction. What is
the coordinate of the center of the object, after its transformation to the viewing coordinate
system?
Solution First, we determine the three unit basis vectors for the viewing coordinate system.
i i
i i
i i
B(2, 3, 0)
Y
C(2, 3, 3)
(2, 2, 2)
(1, 2, 2) A(2, 1, 0)
X
D(2, 1, 3)
Z
Fig. 6.3
The camera position oE is (1,2,2) and the look-at point is (2,2,2). Therefore nE = oE − pE
= (−1, 0, 0) = n̂.
Since it is already mentioned that the up direction is parallel to the positive z direction, we can
directly determine that v̂ = (0, 0, 1) without any further calculations. Note that this is another way
of specifying the up vector (instead of specifying a point in the up direction and computing the
vector).
Finally, the cross product of the two vectors (i.e., v̂ × n̂) gives us û = (0, 1, 0).
After the three basis vectors are determined, we compute the transformation matrix Mw2v which
is a composition of translation and rotation (i.e., Tw2v = R.T).
Since the camera position is (1,2,2), we have
1 0 0 −1
1 0 0 −2
T =1 0 0 −2
000 1
From the unit basis vectors that we have already derived [i.e., n̂(−1, 0, 0), û(0, 1, 0), v̂(0, 0, 1)],
we have
0 100
0 0 1 0
R= −1 0 0 0
0 001
Therefore,
0 1 0 0 1 0 0 −1 10 0 −2
0 0 1 01
0 0 −2 1 0
0 −2
Mw2v = R.T = =
−1 0 0 01 0 0 −2 −1 0 0 1
0 0 0 1 0 0 0 1 00 0 1
i i
i i
i i
3D Viewing 113
0 00 1 1 1
In other words, the object center gets transformed to the point (0,0,−1) in the view coordinate
system.
v
View plane
u
P
World coordinate
origin
n
Camera position
Fig. 6.4 Visualization of the transformation process. P is any arbitrary point, which has to
be transformed to the view coordinate system. This requires translation and rotation.
i i
i i
i i
Mw2v , which, when multiplied to P, gives the transformed point P′ in the view coordinate
system (i.e., P′ = Mw2v .P).
In order to do so, we need to find out the sequence of transformations required to align
the two coordinate systems. We can achieve this by two operations: translation and rotation.
We first translate the view coordinate origin to the world coordinate origin. The necessary
translation matrix T is (in homogeneous form),
1 0 0 −ovx
1 0 0 −ovy
T = 1 0 0 −ovz
000 1
Then, we rotate the view coordinate frame to align it with the world coordinate frame.
The rotation matrix R is (in homogeneous form),
ux uy uz 0
vx vy vz 0
R= nx ny nz 0
0 0 0 1
This sequence is then applied in reverse order on P. Thus, we get Mw2v = R.T. Hence,
P′ = (R.T)P.
6.2 PROJECTION
When we see an image on a screen, it is two-dimensional (2D). The scene in the view coor-
dinate system, on the other hand, is three-dimensional (3D). Therefore, we need a way to
transform a 3D scene to a 2D image. The technique to do that is projection. In general, pro-
jection allows us to transform objects from n dimensions to (n − 1) dimensions. However,
we shall restrict our discussion on projections from 3D to 2D.
In computer graphics, we project the 3D objects onto the 2D view plane (see Fig. 6.2). In
order to do that, we define an area (usually rectangular) on the view plane that contains the
projected objects. This area is known as the clipping window. We also define a 3D volume
i i
i i
i i
3D Viewing 115
in the scene, known as the view volume. Objects that lie inside this volume are projected on
the clipping window. Other objects are discarded (through the clipping process that we shall
discuss in Chapter 7). Note that the entire scene is not projected; instead, only a portion of
it enclosed by the view volume is projected. This approach gives us flexibility to synthesize
images as we want. The trick lies in choosing an appropriate view volume, for which we
require an understanding of the different types of projections.
Projection
Object
Direction of projection
Projectors
View plane
(a)
Object
View plane
Projection
Point of projection
(b)
i i
i i
i i
View plane
View plane B
D′ D B View plane
Object after
B′ A′
projection
B′
A
A′ Center of
Center of C′ C A Vanishing points projection
projection
(a) (b) (c)
Fig. 6.6 The different types of anomalies associated with perspective projection. Foreshortening is
depicted in (a). Note the projected points (A′ ,B′ ) for object AB and (C′ ,D′ ) for object CD, although both
are of the same size. In (b), the concept of vanishing points is illustrated. View confusion is illustrated
in (c).
Since the projectors converge at a point, perspective projection gives rise to several
anomalies (i.e., the appearance of the object in terms of shape and size gets changed).
Perspective foreshortening If two objects of the same size are placed at different distances
from the view plane, the distant object appears smaller than the near objects (see Fig. 6.6(a)).
Vanishing points Lines that are not parallel to the view plane appear to meet at some point
on the view plane after projection. The point is called vanishing point (see Fig. 6.6(b)).
View confusion If the view plane is behind the center of projection, objects in front of the
center of projection appear upside down on the view plane after projection (see Fig. 6.6(c)).
As you can see, the anomalies actually help in generating realistic images since this is the
way we perceive objects in the real world. In contrast, the shape and size of objects are
preserved in parallel projection. Consequently, such projections are not used to generate
Projections
Parallel Perspective
One-point Three-point
Two-point
Orthographic Oblique
Cavalier Cabinet
Multi-view Axonometric
i i
i i
i i
3D Viewing 117
realistic scenarios (such as in computer games or animations). Instead, they are more useful
for graphics systems that deal with engineering drawings (such as CAD packages).
Although we have mentioned two broad categories of projection, there are many sub-
categories under parallel and perspective projection. The complete taxonomy of projections
is shown in Fig. 6.7.
When projectors are perpendicular to the view plane, the resulting projection is called
orthographic. There are broadly two types of orthographic projections. In multiview ortho-
graphic projection, principle object surfaces are parallel to the view plane. Three types of
principle surfaces are defined: top (resulting in top view), front (resulting in front view), and
side (resulting in side view). The three views are illustrated in Fig. 6.8.
In contrast, no principal surface (top, front, or side) is parallel to the view plane in
axonometric orthographic projection. Instead, they are at certain angles with the view plane
(see Fig. 6.9). Note that the principal faces can make three angles with the view plane as
Top view
The object
Side view
Front view
Fig. 6.8 Three types (top, side, and front view) of multiview orthographic projections
Angle q
Principle
Principle surface
surface
Angle a Angle b
View plane
Axonometric
view
Fig. 6.9 Axonometric orthographic projection where principal surfaces are at certain angles
with the view plane
i i
i i
i i
illustrated in Fig. 6.9. Depending on how many of these angles are equal to each other, three
types of axonometric projections are defined: isometric, when all the three angles are equal
to each other; dimetric, when two of the three angles are equal to each other; and trimetric,
when none of the angles is equal to the other.
When the projectors are not perpendicular to the view plane (but parallel to each other),
we get oblique parallel projection, as illustrated in Fig. 6.10. In oblique projection, if lines
View plane
Projectors
Object
Fig. 6.10 Oblique parallel projection where projectors are not perpendicular to the view
plane
VP
(a)
VP VP
(b)
VP
VP VP
(c)
VP - Vanshing point
Fig. 6.11 Three types of perspective projections, depending on the number of vanishing
points (a) One-point (b) Two-point (c) Three-point
i i
i i
i i
3D Viewing 119
perpendicular to the view plane are foreshortened by half after projection, it is known as
cabinet projection. When there is no change in the perpendicular lines after projection, it is
called cavalier projection.
Recall that in perspective projection, we have the idea of vanishing points. These are
basically perceived points of convergence of lines that are not parallel to the view plane.
Depending on the orientation of the object with respect to the view plane, we can have
between one to three vanishing points in the projected figure. Depending on the number of
vanishing points, we define perspective projections as one-point (see Fig. 6.11a), two-point
(see Fig. 6.11b) or three-point (see Fig. 6.11c).
In the next section, we shall outline the fundamental concepts involved in computing
projected points on the view plane. However, we shall restrict our discussion to the two
broad classes of projections. Details regarding the individual projections under these broad
classes will not be discussed, as such details are not necessary for our basic understanding
of projection. In all subsequent discussions, the term parallel projection will be used to refer
to parallel orthographic projections only.
1 0 0 0
0 1 0 0
Tpar =
0
0 0 −d
0 0 0 1
i i
i i
i i
Far Far
Y Y
Z Bottom Z Bottom
X X
Right Right
View coordinate Center of projection
origin Near Near
(a) (b)
Fig. 6.12 The shape of the view volumes for the two basic projection types. The paral-
lelepipe in (a) is used for parallel projection and the frustum in (b) is used for perspective
projection.
Z
X
d
Near
(Clipping window)
Fig. 6.13 Illustration for derivation of the transformation matrix for parallel projection
i i
i i
i i
3D Viewing 121
P′(X ′, Y ′, Z ′)
Y P(X, Y, Z)
Z
d Near
X
Center of projection (Clipping window)
Fig. 6.14 Illustration for the derivation of the transformation matrix for perspective projection
d
x′ as x′ = x . It is obvious that z′ = −d. We can use these expressions to construct the
z
transformation matrix in homogeneous coordinate form as,
1 0 0 0
0 1 0 0
Tpsp
= 0 0 −1 0
1
0 0 0
d
1 0 0 0 x
′′
x
y′′ 0 1 0 0 y
P′′
z′′ = Tpsp P =
0 0 −1 0
z
1
w 0 0 0 1
d
As in the case of parallel projection, we need to compute the coordinates of the projected
x′′ y′′ z′′
point as x′ = , y′ = , and z′ = , since the transformation matrix is in homogeneous
w w w
form.
Example 6.2
What would be the coordinates of the object center in Example 6.1 on a view plane z = −0.5, if
we want to synthesize images of the scene for parallel projection. Assume that the view volume is
sufficiently large to encompass the whole transformed object.
Solution The coordinates of the point in the view coordinate system is (0, 0, −1). We also know
that the transformation matrix for parallel projection
100 0
0 1 0 0
Tpar =
0 0 0 −0.5
000 1
.
i i
i i
i i
000 1 1 1
In other words, the point will be projected to (0,0,−0.5) on the view plane.
Example 6.3
What would happen to the point if we perform a perspective projection on the view plane z = −0.5
with the view coordinate origin to be the center of projection. Assume that the view volume is
sufficiently large to encompass the whole transformed object.
Solution We proceed as before. The transformation matrix for perspective projection
1 0 0 0
0 1 0 0
Tpsp
= 0 0 −1 0 .
1
0 0 0
0.5
The point before projection is at (0,0,−1). The new coordinate of the point after projection will be,
1 0 0 0 0 0
0 1 0 0 0 0
Tpsp P = 0
=
0 −1 0
−1 1
1
0 0 0 1 −2
0.5
The derived point is in homogeneous form as before.
However, we have −2 as the homogeneous
0 0 1
factor. Thus, the projected point is , , . In other words, the point will be projected to
−2 −2 −2
(0,0,−0.5) on the view plane.
i i
i i
i i
3D Viewing 123
respect to a standard view volume. This standard volume is known as the canonical view
volume (CVV).
Figure 6.15 shows the CVV for parallel projection. Note that the CVV is a cube within
the range [−1,1] along the x, y, and z directions1 . As you can see, any arbitrary view vol-
ume can be transformed to the CVV using the scaling operations along the three axial
directions.
Canonical view volume for perspective projection is a little tricky. For ease of clipping
computations, perspective view frustums are also transformed to parallel CVV (i.e., the
cube within the range [−1,1] along the x, y, and z directions). Clearly, this transforma-
tion is not as straightforward as in the case of parallel projection and involves composition
of two types of modeling transformations: shearing and scaling. The idea is illustrated
in Fig. 6.16.
With the idea of CVV, let us now try to understand the sequence of transformations that
take place when a point P in the world coordinate is projected on the view plane. First, it
gets transformed to the view coordinate system. Next, the view volume in which the point
lies is transformed to CVV. Finally, the point in the CVV is projected. In matrix notation,
we can write this series of steps as: P′ = Tproj .Tcvv .TVC P, where Tproj is the projection
transformation matrix, Tcvv is the matrix for transformation to the canonical view volume
and TVC is the matrix for transformation to the view coordinate.
1, 1, −1
−1, −1, 1
Y 1, 1, −1
Z
X −1, −1, 1
Fig. 6.16 The canonical view volume for perspective projection.Note that the frustum should
be sheared along x and y directions and scaled along z direction to obtain the CVV.
1 Another variation is also used for parallel projection in which the CVV is a unit cube that lies within the
range [0,1] along each of the three axial directions.
i i
i i
i i
Window This is the same as (normalized) clipping window. The world coordinate objects
that we want to display are projected on this window.
Viewport The objects projected on the window may be displayed on the whole screen or
a portion of it. The rectangular region on the screen, on which the content of the window is
rendered, is known as the viewport.
Device
space
Window Viewport
Fig. 6.17 The difference between window and viewport. Window contains the objects to be
displayed (left figure). Viewport is the region on the screen, where the window contents are
displayed (right figure).
i i
i i
i i
3D Viewing 125
Window Viewport
Wy2 Vy2
Wx, Wy Vx, Vy
Wy1 Vy1
Wx1 Wx2 Vx1 Vx2
Fig. 6.18 Window-to-viewport mapping. Note that the window coordinates are generalized
in the illustration instead of the normalized coordinates.
Note that the viewport is defined in the device space. In other words, it is defined with
respect to the screen origin and dimensions. So, one more transformation is required to
transfer points from the window (in the view coordinate system) to the viewport (in the
device coordinate system). Let us try to understand the derivation of the transformation
matrix.
Consider Fig. 6.18. The point (Wx, Wy) in the window is transformed to the viewport
point (Vx, Vy). The window lies within [Wx1, Wx2] along the X axis and [Wy1, Wy2]
along the Y axis. The viewport ranges between [Vx1, Vx2] and [Vy1, Vy2] along the X
and Y directions, respectively. In order to maintain the relative position of the point in the
viewport, we must have,
Wx − Wx1 Vx − Vx1
=
Wx2 − Wx1 Vx2 − Vx1
This relation can be rewritten as,
Vx = sx.Wx + tx
where,
Vx2 − Vx1
sx = and tx = sx.(−Wx1) + Vx1.
Wx2 − Wx1
Maintenance of relative position of the point in the viewport also implies that,
Wy − Wy1 Vy − Vy1
=
Wy2 − Wy1 Vy2 − Vy1
From this relation, we can derive the y-coordinate of the point in viewport Vy in a way
similar to that of Vx as,
Vy = sy.Wy + ty
where,
Vy2 − Vy1
sy = and ty = sy.(−Wy1) + Vy1.
Wy2 − Wy1
i i
i i
i i
From the expressions for Vx and Vy as derived here, we can form the viewport
transformation matrix Tvp as,
sx 0 tx
Tvp = 0 sy ty
0 0 1
Thus, to get the point on the viewport Pvp (x′ , y′ ) from the window point Pw (x, y), we need
to perform the following matrix multiplication.
′′
x sx 0 tx x
y′′ = Tvp Pw = 0 sy ty y
w 0 0 1 1
x′′ ′ y′′
The coordinates of Pvp are computed as x′ = ,y = , since the matrices are in
w w
homogeneous form.
Example 6.4
Let us assume that the point is projected on a normalized clipping window (as you can see, the
projected point in either parallel or perspective projection is (0,0,−0.5), which lies at the cen-
ter of the normalized window). We want to show the scene on a viewport having lower left and
top right corners at (4,4) and (6,8) respectively. What would the position of the point be in the
viewport?
Solution Since the clipping window is normalized, we have Wx1 = −1, Wx2 = 1, Wy1 = −1
and Wy2 = 1. Also, from the viewport specification, we have Vx1 = 4, Vx2 = 6, Vy1 = 4 and
6−4 8−4
Vy2 = 8. Therefore, sx = = 1, sy = = 2, tx = 1. − (−1) + 4 = 5 and
1 − (−1) 1 − (−1)
105
ty = 2. − (−1) + 4 = 6. Thus, the viewport transformation matrix is, Tvp = 0 2 6. The new
001
coordinate of the point after viewport transformation will be,
105 0 −2.5
0 2 6 0 = −3
001 −0.5 −0.5
Since the derivedpoint is in homogeneous
form with −0.5 as the homogeneous factor, the trans-
−2.5 −3
formed point is , . In other words, the viewport coordinates of the projected point
−0.5 −0.5
will be (5,6).
i i
i i
i i
3D Viewing 127
SUMMARY
In this chapter, we learnt about the process of transforming objects from world coordinate to
viewport, which is an important and essential part of the image synthesis process. The trans-
formation takes place in distinct stages, which are analogous to the process of capturing a
picture with your camera. The very first step is to set-up the view coordinate system. In this
stage, the three orthogonal unit basis vectors of the view coordinate system are determined
from three input parameters: the camera position or the view coordinate origin, the look-at point,
and the view-up point (or view-up vector). After the formation of the view coordinate system, we
transform the object to the view coordinate system.
The transformed objects are still in 3D. We transform them to 2D view plane through pro-
jection. We learnt about the two basic types of projections: parallel and perspective. In parallel
projection, the projectors are parallel to each other. The projectors converge to a center of pro-
jection in perspective projection.Before projection,we first define a view volume,a 3D region that
encloses the objects we want to be part of the image. For parallel projection, the view volume
takes the shape of a rectangular parallelepipe. It takes the shape of a frustum for perspective
projection. For computational efficiency in subsequent stages of the graphics pipeline, the view
volumes are transformed to canonical view volumes which is a cube with all its points lying within
the range [−1,−1,−1] to [1,1,1]. Transformation to canonical view volume requires scaling for
parallel view volume and a combination of shear and scale for perspective view volume.
The objects in the canonical volume is projected on its near or view plane, which acts as the
normalized clipping window. The window is in view coordinate system. A final transformation is
applied on the points in the window to transform it to the points in the viewport, which is defined
in the device coordinate system.
As we have seen,totally four transformations take place in this stage:world to view coordinate
transformation, view volume to canonical view volume transformation, projection transformation,
and window-to-viewport transformation. When we define a view volume, the objects that lie out-
side neeed to be removed. This is done in the clipping stage, that we shall discuss in Chapter 7.
Moreover, to generate realistic images, we need to perform hidden surface removal, which we
shall learn in Chapter 8.
i i
i i
i i
BIBLIOGRAPHIC NOTE
Please refer to the bibliographic note of Chapter 7 for further reading.
KEY TERMS
Axonometric projection – a type of parallel projection in which the principal object surfaces are
not parallel to the view plane
Canonical view volume – a standardized view volume
Center of interest/Look-at point – the point in the world coordinate frame with respect to which
we focus our camera while taking a photograph
Center of projection – the point where the projectors meet in perspective projection
Clipping window – a region (usually rectangular) of the view plane
Oblique projection – a type of parallel projection in which the projectors are not perpendicular to
the view plane
Orthographic projection – a type of parallel projection in which projectors are perpendicular to
the view plane.
Parallel projection – a type of projection in which the projectors are parallel to each other
Perspective foreshortening – an effect due to perspective projection in which the closer objects
appear larger
Perspective projection – a type of projection in which the projectors meet at a point
Projection – the process of mapping an object from an n-dimensional space to an n − 1
dimensional space
Projectors – lines that originate from the object points to be projected and intersect the view plane
Vanishing points – an effect due to perspective projection in which lines that are not parallel
appear to meet at a point on the view plane
View confusion – an effect due to perspective projection in which the objects appear upside down
after projection
View coordinate – the coordinate reference frame used to represent a scene with respect to
camera parameters
View plane – the plane on which a 3D object is projected
View volume – a 3D region in space (in the view coordinate system) that is projected on the view
plane
View-up vector – a vector towards the direction of our head while taking a photograph with a
camera
Viewing transformation – the process of mapping a world coordinate object description to the
view the coordinate frame
Viewport – a rectangular region on the display screen where the content of the window is rendered
Window – a term used to denote the clipping window on the view plane
EXERCISES
6.1 Discuss the similarities between taking a photograph with a camera and transforming a
world coordinate obejct to view plane. Is there any difference?
6.2 Mention the inputs we need to construct a view coordinate system.
6.3 Explain the process of setting-up of the view coordinate system.
6.4 How do we transform objects from world coordinate to view coordinate? Explain.
6.5 What are the broad categories of projections? Discuss their difference(s).
i i
i i
i i
3D Viewing 129
6.6 Why are perpective projections preferable over parallel projections to generate realistic
effects? When do we need parallel projection?
6.7 Illustrate with diagrams the view volumes associated with each of the two broad projection
types.
6.8 Why do we need canonical view volumes? Discuss how we can transform arbitrary volumes
to canonical forms.
6.9 Derive the transformation matrices for parallel and perspective projections.
6.10 How is viewport different from window? Derive the window-to-viewport transformation
matrix.
6.11 Does the projection transformation truly transform objects from 3D to 2D? Discuss.
6.12 Consider a spherical object centered at (1,1,1) with a radius of 1 unit. A camera is located
at the point (1,1,4) and the look-at point is (1,1,2). The up direction is along the negative Y
direction. Answer the following.
(a) Determine the coordinates of the point Pv in the view coordinate system to which the
point P(1,1,2) is transformed.
(b) Assume a sufficiently large view volume that encloses the entire sphere after transfor-
mation. Its near plane is z = −1 and the clipping window is defined between [−10,−10]
to [10,10]. What would the position Pw of the point Pv on the clipping window be after
we perform (i) parallel and (ii) perspective projection? Ignore canonical transformations.
(c) Assume a view port defined between [2,3] (lower left corner) and [5,6] (top right corner)
in the device coordinate system. Determine the position of Pw on this viewport after
window-to-viewport transformation.
i i
i i
i i
CHAPTER
Clipping
7
Learning Objectives
After going through this chapter, the students will be able to
• Understand the idea of clipping in two and three dimensions
• Learn about point clipping in two and three dimensions
• Learn the Cohen–Sutherland line clipping algorithm in two and three dimensions
• Understand the working of the parametric line clipping algorithm with the Liang–Barsky
algorithm
• Know about the fill area clipping issues
• Learn about the Sutherland–Hodgeman fill area clipping algorithm for two and three
dimensions
• Understand the steps of the Weiler–Atherton fill area clipping algorithm
• Learn the algorithm to convert a convex polygon into polygonal meshes,which is required
for three-dimensional fill area clipping
INTRODUCTION
In Chapter 6, we discussed the concept of view volume. As you may recall, before projec-
tion on the view plane, we define a 3D region (in the view coordinate system) that we call as
view volume. Objects within this region are projected on the view plane while objects that
lie outside the volume boundary are discarded. An example is shown in Fig. 7.1. The process
of discarding objects that lie outside the volume boundary is known as clipping. How does
the computer discard (or clip) objects? We employ some programs or algorithms for this
purpose, which are collectively known as the clipping algorithms. In this chapter, we shall
learn about these algorithms.
Two things are to be noted here. Recall the important concept we learnt in Chapter 6,
namely the canonical view volume (the cube). The algorithms we discuss in this chapter
shall assume that the clipping is done against canonical view volumes only. Moreover, for
the ease of understanding the algorithms, we shall first discuss the clipping algorithms
in 2D. Clipping in 3D is performed by extending the 2D algorithms, which we shall
discuss next.
i i
i i
i i
Clipping 131
Z
X
Object fully inside
View coordinate the volume boundary
origin
Fig. 7.1 Concept of clipping. The objects that lie outside the boundary, either fully or partially,
are to be discarded or clipped. This is done with the clipping algorithms.
7.1 CLIPPING IN 2D
Unlike the view volume which is a 3D concept, we shall assume a view window, which is a
square-shaped region on the view plane, to discuss 2D clipping algorithms. This is equiva-
lent of assuming that the view volume and all the objects are already projected on the view
plane. The view volume is projected to form the window. Other objects are projected to form
points, lines, and fill-areas (e.g., an enclosed region such as a polygon). Thus, our objective
is to clip points, lines, and fill-areas with respect to the window.
The simplest is point clipping. Given a point with coordinate (x,y), we simply check if
the coordinate lies within the window boundary. In other words, if wxmin ≤ x ≤ wxmax
AND wymin ≤ y ≤ wymax , we keep the point; otherwise, we clip it out. (wxmin , wxmax ) and
(wymin , wymax ) are the minimum and maximum x and y coordinate values of the window,
respectively.
Line clipping is not so easy, however. We can represent any line segment with its end
points. For clipping, we can check the position of these end points to decide whether to clip
the line or not. If we follow this approach, either of the following three scenarios can occur,
which is illustrated in Fig. 7.2.
1. Both the end points are within the window boundary. In such cases, we don’t clip the line.
2. One end point is inside and the other point is outside. Such lines must be clipped.
3. Both the end points are outside. We cannot say for sure if the whole line is outside the
window or part of it lies inside (see Fig. 7.2). Thus, we have to check for line–boundary
intersections to decide if the line needs to be clipped or not.
As you can see in Fig. 7.2, when both the line end points are outside of the window, the
line may be fully or partially outside. We cannot determine this from just the position of
the end points. What we can do is to determine if there are intersection points of the line
and the window boundaries (see Appendix for calculation of intersection points between
two lines). Thus, given a line with both end points outside, we have to check for line–
window intersection for all the four window boundary line segments. Clearly, the process
i i
i i
i i
L2
Wymax
L1
L4
L3
Wymin
Wxmin Wxmax
Fig. 7.2 Three scenarios for line clipping. For the line L1 , both the line endpoints are inside
the window, so we don’t clip the line. For L2 , one end point is inside and the other one is
outside, so we clip it. In case of L3 and L4 , both the end points are completely outside the
window. However, L3 is partially inside and needs further checking to determine the portion
to be clipped.
Fig. 7.3 The nine regions of the Cohen–Sutherland algorithm. The left figure on top shows
the regions while the right figure shows the region codes. The four bit code is explained in
the bottom figure. Note that the window gets a code 0000.
i i
i i
i i
Clipping 133
Given the two end points of a line, the algorithm first assigns region codes to the
end points. Let an end point be denoted by P(x, y) and the window be specified by
(xmin , xmax , ymin , ymax ) (i.e., the x and y extents of its boundary). Then, we can determine
region code of P through the following simple check.
1: Input: A line segment with end points PQ and the window parameters (xmin , xmax , ymin , ymax )
2: Output: Clipped line segment (NULL if the line is completely outside)
3: for each end point with coordinate (x,y), where sign(a) = 1 if a is positive, 0 otherwise do
4: Bit 3 = sign (y − ymax )
5: Bit 2 = sign (ymin − y)
6: Bit 1 = sign (x − xmax )
7: Bit 0 = sign (xmin − x)
8: end for
9: if both the end point region codes are 0000 then
10: RETURN PQ.
11: else if logical AND (i.e., bitwise AND) of the end point region codes 6= 0000 then
12: RETURN NULL
13: else
14: for each boundary bi where bi = above, below, right, left, do
15: Check corresponding bit values of the two end point region codes
16: if the bit values are same, then
17: Check next boundary
18: else
19: Determine bi -line intersection point using line equation
20: Assign region code to the intersection point
21: Discard the line from the end point outside bi to the intersection point (as it is outside the
window)
22: if the region codes of both the intersection point and the remaining end point are 0000 then
23: Reset PQ with the new end points
24: end if
25: end if
26: end for
27: RETURN modified PQ
28: end if
i i
i i
i i
1. If both the end point region codes are 0000, the line is completely inside the window.
Retain the line.
2. If logical AND (i.e., bitwise AND) of the end point region codes is not equal to 0000, the
line is completely outside the window. Discard the entire line.
However, when none of these above cases occur, the line is partially inside the window
and we need to clip it. For clipping, we need to calculate the line intersection point with
window boundaries. This is done by taking one end point and following some order for
checking, e.g., above, below, right, and left. For each boundary, we compare the correspond-
ing bit values of the two end point region codes. If they are not the same, the line intersects
that particular boundary. Using the line equation, we determine the intersection point and
assign the region code to the intersection point as before. In the process, we discard the line
segment outside the window. Next, we compare the two new end points to see if they are
completely inside the window. If not, we take the other end point and repeat the process. The
pseudo-code of the algorithm is shown in Algorithm 7.1.
Example 7.1
Consider the line segment AB in Fig. 7.4.
4
A(5,3)
2
B(6,2)
2 4
Fig. 7.4
From the figure, we see that xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, A(5,3) and
B(6,2). The first step is to determine the region codes of A and B (lines 3–8 of Algorithm 7.1).
Let’s consider A first. We can see that for A,
since sign(a) = 0 if a ≤ 0. Thus, the region code of A is 0010. Similarly the region code of B
is derived as 0010. The next step (lines 9–27 of Algorithm 7.1) is the series of checks. The first
check fails as both the end points are not 0000. However, the second check succeeds since the
logical AND of AB is 0010 (i.e., 6= 0000). Hence, we do not need to go any further. The line is
totally outside the window boundary. We do not need to clip it and discard it as a whole.
i i
i i
i i
Clipping 135
Example 7.2
Consider the line segment PQ in Fig. 7.5.
P(3,3)
Q(5,2)
2
2 4
Fig. 7.5
We have xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, P(3,3) and Q(5,2). We first determine
the region codes of P and Q (lines 3–8 of Algorithm 7.1). Let us consider P first. We can see that
for P,
Thus, the region code of P is 0000. Similarly the region code of Q is derived as 0010. The next step
(lines 9–27 of Algorithm 7.1) is the series of checks. The first check fails as both the end points
are not 0000. The second check also fails as the logical AND of PQ is 0000. Hence, we need to
determine line–boundary intersection.
From the end points, we can derive the line equation as: y = − 12 x + 29 (see Appendix for the
derivation of line equation from end points). Now, we have to check for the intersection of this
line with the boundaries following the order: above, below, right, left. The aforementioned bit val-
ues (bit 3) of P and Q are the same. Hence, the line does not cross above boundary (lines 16–19
of Algorithm 7.1). Similarly, we see that it does not cross the below boundary. However, for the
right boundary, the two corresponding bits (bit 1) are different. Hence, the line crosses the right
boundary.
The equation of the right boundary is x = 4. Putting this value in the line equation, we get
the intersection point as Q′ (4, 25 ). We discard the line segment Q′ Q since Q is outside the right
boundary (line 21 of Algorithm 7.1). Thus, the new line segment becomes PQ′ . We determine
the region code of Q′ as 0000. Since both P and Q′ have region code 0000, the algorithm resets
PQ by changing Q to Q′ (lines 22–23 of Algorithm 7.1). Finally, we check the left boundary.
Since there is no intersection (bit 0 is same for both end points), the algorithm returns PQ′
and stops.
i i
i i
i i
Example 7.3
Consider the line segment MN in Fig. 7.6.
M(1, 3)
N(5, 2)
2
2 4
Fig. 7.6
Here we have xmin = 2, xmax = 4, ymin = 2, and ymax = 4 and the two end points M(1,3) and
N(5,2). We determine the region code for M first as,
Thus, the region code of M is 0001. Similarly the region code of N is derived as 0010. The next
step (lines 9–27 of Algorithm 7.1) is the series of checks. The first check fails as both the end
points are not 0000. The second check also fails as the logical AND of MN is 0000. Hence, we
need to determine line–boundary intersection points.
From the end points, we can derive the line equation as: y = − 14 x + 13 4 (see Appendix for the
derivation). Next, we check for the intersection of this line with the boundaries following the order:
above, below, right, left. These bit values (bit 3) of M and N are the same. Hence, the line does not
cross above boundary (lines 16–19 of Algorithm 7.1). Similarly, we see that it does not cross the
below boundary (bit 2 is same for both). However, for the right boundary, the two corresponding
bits (bit 1) are different. Hence, the line crosses the right boundary.
The equation of the right boundary is x = 4. Putting this value in the line equation, we get
the intersection point as N ′ (4, 94 ). We discard the line segment N ′ N since N is outside the right
boundary (line 21 of Algorithm 7.1). Thus, the new line segment becomes MN ′ . We determine the
region code of N ′ as 0000.
We now have two new end points M and N ′ with the region codes 0001 and 0000, respectively.
The boundary check is now performed for the left boundary. Since the bit values are not the same,
we check for intersection of the line segment MN ′ with the left boundary. The equation of the left
boundary is x = 2. Putting this value in the line equation, we get the intersection point as M ′ (2, 11
4 ).
We discard the line segment MM ′ since M is outside the left boundary (line 21 of Algorithm 7.1).
Thus, the new line segment becomes M ′ N ′ . We determine the region code of M ′ as 0000.
Since both M ′ and N ′ have the region code 0000, the algorithm resets the line segment to M ′ N ′
(lines 22–23 of Algorithm 7.1). As no more boundary remains to be checked, the algorithm returns
M ′ N ′ and stops.
i i
i i
i i
Clipping 137
x = x1 + u1x where 1x = x2 − x1
y = y1 + u1y where 1y = y2 − y1
where 0 ≤ u ≤ 1 is the parameter. Given the window parameters (xmin , xmax , ymin , ymax ),
the following relationships should hold for the line to be retained.
p1 = −1x, q1 = x1 − xmin
p2 = 1x, q2 = xmax − x1
p3 = −1y, q3 = y1 − ymin
p4 = 1y, q4 = ymax − y1
where k = 1, 2, 3, 4 corresponds to the left, right, below, and above window boundaries, in
that order. If for any k for a given line, pk = 0 AND qk < 0, discard the line as it is com-
pletely outside the window. Otherwise, we calculate two parameters u1 and u2 , that define the
line segment within the window. In order to calculate u1 , we first calculate the ratio rk = pqkk
for all those edges for which pk < 0. Then, we set u1 = max{0, rk }. Similarly for u2 , we
calculate the ratio rk = qpkk for all those edges for which pk > 0 and then set u2 = min{1, rk }.
If u1 > u2 , the line is completely outside, so we discard it. Otherwise, the end points of the
clipped lines are calculated as follows.
1. If u1 = 0, there is one intersection point which is calculated as x2 = x1 + u2 1x, y2 =
y1 + u2 1y (note that the other end point remains the same).
2. Otherwise, there are two intersection points (i.e., both the end points need to be changed).
The two new end points are calculated as x′1 = x1 + u1 1x, y′1 = y1 + u1 1y and
x2 = x1 + u2 1x, y2 = y1 + u2 1y.
The pseudocode of the algorithm is shown in Algoritm 7.2.
i i
i i
i i
1: Input: A line segment with end points P(x1 , y1 ) and Q(x2 , y2 ), the window parameters
(xmin , xmax , ymin , ymax ). A window boundary is denoted by k where k can take the values 1, 2, 3,
or 4 corresponding to the left, right, below, and above boundary, respectively.
2: Output: Clipped line segment
3: Calculate 1x = x2 − x1 and 1y = y2 − y1
4: Calculate p1 = −1x, q1 = x1 − xmin
5: Calculate p2 = 1x, q2 = xmax − x1
6: Calculate p3 = −1y, q3 = y1 − ymin
7: Calculate p4 = 1y q4 = ymax − y1
8: if pk = 0 and qk < 0 for any k = 1, 2, 3, 4 then
9: Discard the line as it is completely outside the window
10: else
11: Compute rk = pqkk for all those boundaries k for which pk < 0. Determine parameter u1 =
max{0, rk }.
12: Compute rk = qpkk for all those boundaries k for which pk > 0. Determine parameter u2 =
min{1, rk }.
13: if u1 > u2 then
14: Eliminate the line as it is completely outside the window
15: else if u1 = 0 then
16: There is one intersection point, calculated as x2 = x1 + u2 1x, y2 = y1 + u2 1y
17: Return the two end points (x1 , y1 ) and (x2 , y2 )
18: else
19: There are two intersection points, calculated as: x′1 = x1 + u1 1x, y′1 = y1 + u1 1y and
x2 = x1 + u2 1x, y2 = y1 + u2 1y
20: Return the two end points (x′1 , y′1 ) and (x2 , y2 )
21: end if
22: end if
Example 7.4
We will show here the running of the algorithm for one of the three examples (Example 7.2) used
to illustrate the Cohen–Sutherland algorithm. The working of the Liang–Barsky algorithm for the
other two examples can be understood in a likewise manner and left as an exercise for the reader.
Now let us reconsider the line segment PQ in Example 7.2. The figure is reproduced here for
convenience.
P(3,3)
Q(5,2)
2
2 4
i i
i i
i i
Clipping 139
From the figure, we see that xmin = 2, xmax = 4, ymin = 2, and ymax = 4. Also, P(3,3) and Q(5,2).
We first calculate 1x = 2, 1y = −1, p1 = −2, q1 = 1, p2 = 2, q2 = 1, p3 = 1, q3 = 1, p4 = −1,
and q4 = 1 (lines 3–7 of Algorithm 7.2). Since the condition pk = 0 and qk < 0 for any k is not
true, the first condition (lines 8–9 of Algorithm 7.2) fails. Hence, we calculate u1 and u2 .
Note that p1 , p4 < 0. Hence we calculate r1 = − 12 and r4 = −1. Thus, u1 = max{0, − 21 ,
−1} = 0 (line 11 of Algorithm 7.2). Also, p2 , p3 > 0. Hence we calculate r2 = 21 and r3 = 1.
Thus, u2 = min{1, 12 , 1} = 21 (line 12 of Algorithm 7.2). Since u1 < u2 , the line is not eliminated
(lines 13–14 of Algorithm 7.2). However, u1 = 0. Thus, the next condition satisfies (line 15 of
Algorithm 7.2). So, we calculate the single intersection point: x2 = 4, y2 = 25 . Thus, the end
points of the clipped line segment returned are (3,3) and (4, 52 ) (lines 16–17 of Algorithm 7.2).
i i
i i
i i
1: Input: Four clippers: cl = xmin , cr = xmax , ct = ymax , cb = ymin corresponding to the left, right,
top, and bottom window boundaries, respectively. The polygon is specified in terms of its vertex list
Vin = {v1 , v2 , · · · , vn }, where the vertices are named anti-clockwise.
2: for each clipper in the order cl , cr , ct , cb do
3: Set output vertex list Vout = NULL, i = 1, j = 2
4: repeat
5: Consider the vertex pair vi and vj in Vin
6: if vi is inside and vj outside of the clipper then
7: ADD the intersection point of the clipper with the edge (vi , vj ) to Vout
8: else if both the vertices are inside the clipper then
9: ADD vj to Vout
10: else if vi is outside and vj inside of the clipper then
11: ADD the intersection point of the clipper with the edge (vi , vj ) and vj to Vout
12: else
13: ADD NULL to Vout
14: end if
15: until all edges (i.e., consecutive vertex pairs) in Vin are checked
16: Set Vin = Vout
17: end for
18: Return Vout
Example 7.5
Consider the polygon with vertices {1,2,3} (named anti-clockwise) shown in Fig. 7.7. We wish
to determine the clipped polygon (i.e., the polygon with vertices {2′ ,3′ ,3′′ ,1′ ,2}) following the
Sutherland–Hodgeman algorithm.
1
1′
3″ 3′
2
2′ 3
Fig. 7.7
We check the vertex list against each clipper in the order left, right, top, bottom (the outer for
loop, line 2 of Algorithm 7.3). For the left clipper, the input vertex list Vin = {1, 2, 3}. The pair
of vertices to be checked for the left clipper are {1,2}, {2,3}, and {3,1} (the inner loop, line 4 of
Algorithm 7.3). For each of these pairs, we perform the checks (lines 6–13 of Algorithm 7.3) to
determine Vout for the left clipper. We start with {1,2}. Since both the vertices are on the right side
of the left clipper (i.e., both are inside), we set Vout = {2}. Similarly, after checking {2,3}, we set
Vout = {2, 3} and after checking {3,1}, the final output list becomes Vout = {2, 3, 1}.
i i
i i
i i
Clipping 141
In the next iteration of the outer loop (check against right clipper), we set Vin = Vout = {1, 2, 3}
and Vout = NULL. Thus the three pair of vertices to be checked are {1,2}, {2,3}, and {3,1}.
In {1,2}, both the vertices are inside (i.e., they are on the left side of the right clipper); hence
Vout = {2}. For the next pair {2,3}, we notice that vertex 2 is inside while vertex 3 is out-
side. Thus, we compute the intersection point 2′ of the right clipper with the edge {2,3} and set
Vout = {2, 2′ }. For the remaining pair {3,1}, vertex 3 is outside (on the right side) and vertex 1
inside (on the left side). Thus, we calculate the intersection point 3′ of the edge with the clipper
and set Vout = {2, 2′ , 3′ , 1}. The inner loop stops as all the edges are checked.
Next, we consider the top clipper. We set Vin = Vout = {2, 2′ , 3′ , 1} and Vout = NULL. The pair
of vertices to be checked are {2,2′ }, {2′ ,3′ }, {3′ ,1}, and {1,2}. Since both the vertices of {2,2′ } are
inside (i.e., below the clipper), Vout = {2′ }. Similarly, after checking {2′ ,3′ }, we set Vout = {2′ , 3′ }
as both are inside. In the pair {3′ ,1}, the vertex 3′ is inside whereas the vertex 1 is outside (i.e.,
above the clipper). Hence, we calculate the intersection point 3′′ between the clipper and the edge
and set Vout = {2′ , 3′ , 3′′ }. In the final edge {1,2}, the first vertex is outside while the second ver-
tex is inside. Thus, we calculate the intersection point 1′ between the edge and the clipper and set
Vout = {2′ , 3′ , 3′′ , 1′ , 2}. After this, the inner loop stops.
Finally, we check against the bottom clipper. Before the checking starts, we set Vin = Vout =
{2′ , 3′ , 3′′ , 1′ , 2} and Vout = NULL. As all the vertices are inside (i.e., above the clipper), after the
inner loop completes, the output list becomes Vout = {2′ , 3′ , 3′′ , 1′ , 2} (i.e., same as the input list,
check for yourself).
Thus, after the input polygon is checked against all the four clippers, the algorithm returns the
vertex list {2′ ,3′ ,3′′ ,1′ ,2} as the clipped polygon.
i i
i i
i i
2. From an intersection point due to an inside-to-outside fill-area edge (with respect to a clip
boundary), follow the window boundaries.
In both these cases, the traversal direction remains the same. At the end of the process-
ing, when we encounter a previously processed intersection point, we output the vertex
list representing a clipped area. However, if the whole fill-area polygon is not fully cov-
ered at this point, we resume our traversal along the polygon edges in the same direction
from the last intersection point of an inside-outside polygon edge. The pseudocode of the
Weiler–Atherton algorithm is shown in Algorithm 7.4.
Example 7.6
Let us consider the fill-area shown in Fig. 7.8. The vertices of the polygon are named anti-
clockwise. Note that this is a concave polygon, which is to be clipped against the rectangular
window. Let us try to perform clipping following the steps of the Algorithm 7.4. In the figure, the
traversal of the fill-area edges and window boundaries is shown with arrows.
We first start with the edge {1,2}. Both the vertices are inside the window. Since there is no inter-
section, we add these vertices to the output vertex list Vout and continue processing the fill-area
edges anti-clockwise.
Next we process the edge {2,3}. As we can observe, the edge goes from inside of the
clip window to outside. We record the intersection point 2′ and add it to Vout . 2′ is the
i i
i i
i i
Clipping 143
2
2′
1
1′
6
3
6′
5
3′
4
Fig. 7.8
exit-intersection point. At this point, we make a detour and proceed along the window boundary
in the anti-clockwise direction.
While traversing along the boundary, we encounter the intersection point 1′ of the fill-area edge
{6,1} with the boundary. This is a new intersection point not yet processed. We add this to Vout .
Then, we start processing the fill-area edges again.
The fill-area edge processing takes us to the vertex 1. We have already processed this vertex.
Thus, we have completed determining one clipped segment of the fill area, represented by the ver-
tex list Vout = {1, 2, 2′ , 1′ , 1}. We output this list. Since some edges of the fill-area are not yet
processed, we return to the exit-intersection point 2′ and continue processing the fill-area edges.
We first check the edge segment {2′ ,3}. Since the vertex 3 is on the left side of the left window
boundary (i.e., outside the clip window), we continue processing the fill-area anti-clockwise.
The next edge processed is {3,4}. Note that the edge intersects the window boundary at 3′ .
Since the vertex 4 is inside the window and 3′ is a new intersection point, we add the intersection
point 3′ and the vertex 4 to the output vertex list Vout . We continue processing the fill area edges.
Both the vertices of the next edge {4,5} are inside the clipping window. So, we simply add them
to Vout and continue processing the fill-area edges.
The next edge {5,6} intersects the window boundary at the intersection point 6′ . This is an
exit-intersection point. We record the intersection point and add it to the output vertex list Vout .
From 6′ , we start processing the window boundaries again (anti-clockwise). During this pro-
cessing, we encounter the intersection point 3′ . This is already processed before. So, we output
Vout = {3′ , 4, 5, 6′ , 3′ } that represents the other clipped region of the fill area. Since now all the
edges of the fill-area are processed, the algorithm stops at this stage.
7.2 3D CLIPPING
So far, we have discussed clipping algorithms for 2D. The same algorithms, with a few mod-
ifications, are used to perform clipping in 3D. A point to be noted here is that clipping is
performed against the normalized view volume (usually the symmetric cube with each coor-
dinates in the range [−1,1] in the three directions). In the following, we will discuss only the
extension of the 2D algorithms without further elaborations, as the core algorithms remain
the same.
Point clipping in 3D is done in a way similar to 2D. Given a point with coordinate (x,y,z),
we simply check if the coordinate lies within the view volume. In other words, if −1 ≤ x ≤ 1
AND −1 ≤ y ≤ 1 AND −1 ≤ z ≤ 1, we keep the point; otherwise, we clip it out.
i i
i i
i i
Fig. 7.9 The 27 regions of the Cohen–Sutherland algorithm for 3D. The 6-bit code is
explained in the bottom figure. Note that only the view volume interior gets a code 0000.
i i
i i
i i
Clipping 145
triangle in the mesh is processed at a time. Thus, there are two more outer loops in
Algorithm 7.3. The outermost loop is for checking one surface at a time. Inside this
loop, the next level loop processes each triangle in the mesh (of that surface). Then the
two loops of Algorithm 7.3 are executed in sequence.
2. Instead of four clippers, we now have six clippers corresponding to the six bounding sur-
faces of the normalized view volume. Hence, the for loop in Algorithm 7.3 (lines 2–17)
is executed six times.
Algorithm 7.5 shows a quick and easy way of creating a triangle mesh from a convex
polygon.
Let us consider an example to understand the idea of Algorithm 7.5. Suppose we want to
create a triangle mesh from the polygon shown in Fig. 7.10.
4
5
2
1
The input vertex list is V = {1, 2, 3, 4, 5} (we followed an anti-clockwise vertex naming con-
vension). In the first iteration of the loop, we create vt = {1, 2, 3} and reset V = {1, 3, 4, 5}
after removing vertex 3 (the middle vertex of vt ). Then we set VT = {{1, 2, 3}}. In the next
iteration, V = {1, 3, 4, 5}. We create vt = {1, 3, 4} and reset V = {1, 4, 5}. Also, we set
VT = {{1, 2, 3}, {1, 3, 4}}. Since V now contains three vertices, the iteration stops. We set
VT = {{1, 2, 3}, {1, 3, 4}, {1, 4, 5}} and return VT as the set of three triangles.
Note that Algorithm 7.5 works when the input polygon is convex. In case of concave poly-
gons, we first split it into a set of convex polygons and then apply Algorithm 7.5 on each
member of the set. There are many efficient methods for splitting a concave polygon into a
set of convex polygons such as the vector method, the rotation method and so on. However,
i i
i i
i i
we shall not go into the details of these methods any further, as they are not necessary to
understand the basic idea of 3D clipping.
SUMMARY
In this chapter, we learnt the basic idea behind the clipping stage of the graphics pipeline. For
ease of understanding, we started with the 2D clipping process. We covered three basic clipping
algorithms for line- and fill-area (polygon) clipping, namely the Cohen–Sutherland line clipping
algorithm, the Liang–Barsky line clipping algorithm, and the Sutherland–Hodgeman polygon
clipping algorithm.
The core idea of the Cohen–Sutherland algorithm is the division of world space into regions,
with each region having its own and unique region code. Based on a comparison of region
codes of the end points of a line, we decide if the line needs to be clipped or not. On the
other hand, the Liang–Barsky algorithm makes use of a parametric line equation to perform
clipping. Clipping is done based on the line parameters determined from the end points of the
line. The algorithm reduces line-window boundary intersection calculation to a great extent. In
the Sutherland–Hodgeman algorithm, polygons are clipped against window boundaries, on the
basis of the inside-outside test.
The same 2D algorithms are applicable in 3D with some minor modifications. The first thing
to note is that the clipping algorithms are designed keeping the normalized view volume in mind.
There are 27 regions and a 6-bit region code to be considered for using the Cohen–Sutherland
algorithm in 3D. In order to use the Sutherland–Hodgeman algorithm, we need to consider six
clippers as against four in 2D.
In clipping, we discard the portion of the object that lies outside the window/view vol-
ume. However, depending on the position of the viewer, some portion of an inside object
also sometimes needs to be discarded. When we want to discard parts of objects that are
inside window/view volume, we apply another set of algorithms, which are known as hid-
den surface removal (or visible surface detection) methods. Those algorithms are discussed
in Chapter 8.
BIBLIOGRAPHIC NOTE
Two-dimensional line clipping algorithms are discussed in Sproull and Sutherland [1968],
Cyrus and Beck [11978], Liang and Barsky [1984], and Nicholl et al. [1987]. In Sutherland
and Hodgman [1974] and Liang and Barsky [1983], basic polygon-clipping methods are pre-
sented. Weiler and Atherton [1977] and Weiler [1980] contain discussions on clipping arbitrarily
shaped polygons with respect to arbitrarily shaped polygonal clipping windows. Weiler and
Atherton [1977], Weiler [1980], Cyrus and Beck [1978], and Liang and Barsky [1984] also
describe 3D viewing and clipping algorithms. Blinn and Newell [1978] presents homogeneous-
coordinate clipping. The Graphics Gems book series (Glassner [1990], Arvo [1991], Kirk
[1992], Heckbert [1994], and Paeth [1995]) contain various programming techniques for
3D viewing.
KEY TERMS
Clipping – the process of eliminating objects fully or partially that lie outside a predefined region
Clipping window – the predefined region with respect to which clipping is performed
Cohen–Sutherland line clipping – an algorithm used to perform line clipping
i i
i i
i i
Clipping 147
EXERCISES
7.1 Briefly explain the basic idea of clipping in the context of 3D graphics pipeline. In which
coordinate system does this stage work?
7.2 Write an algorithm to clip lines against window boundaries using brute force method (i.e.,
intuitively what you do). What are the maximum and minimum number of operations (both
integer and floating point) required? Give one suitable example for each case.
7.3 Consider the clipping window with vertices A(2,1), B(4,1), C(4,3), and D(2,3). Use the
Cohen–Sutherland algorithm to clip the line A(−4,−5) B(5,4) against this window (show
all intermediate steps).
7.4 Determine the maximum and minimum number of operations (both integer and floating
point) required to clip lines using the Cohen–Sutherland clipping algorithms. Give one
suitable example for each case.
7.5 Consider the clipping window and the line segment in Exercise 7.2. Use the Liang–Barsky
algorithm to clip the line (show all intermediate steps).
7.6 Answer Exercise 7.4 using the Liang–Barsky algorithm.
7.7 In light of your answer to Exercised 7.2, 7.4, and 7.6, which is the best method (among
brute force, Cohen–Sutherland, and Liang–Barsky)?
7.8 Discuss the role of the clippers in the Sutherland–Hodgman algorithm.
7.9 Write a procedure (in pseudocode) to perform inside-outside test with respect to a clipper.
Modify Algorithm 7.3 by invoking the procedure as a sub-routine.
7.10 Consider a clipping window with corner points (1,1), (5,1), (5,5), and (1,5). A square with
vertices (3,3), (7,3), (7,7), and (3,7) needs to be clipped against the window. Apply Algorithm
7.3 to perform the clipping (show all intermediate stages).
7.11 Modify Algorithm 7.1 for 3D clipping of a line with respect to the symmetric normalized view
volume.
7.12 Modify Algorithm 7.3 for 3D clipping of a polyhedron (with convex polygonal surfaces) with
respect to the symmetric normalized view volume.
i i
i i
i i
CHAPTER
Hidden
8 Surface
Removal
Learning Objectives
After going through this chapter, the students will be able to
• Get an overview of the concept of hidden surface removal in computer graphics
• Understand the two broad categories of hidden surface removal techniques—object
space method and image space method
• Get an idea about the object space method known as back face elimination
• Learn about two well-known image space methods—Z-buffer algorithm and A-buffer
algorithm
• Learn about the Painter’s algorithm, a popular hidden surface removal algorithm con-
taining elements of both the object space and image space techniques
• Get an overview of the Warnock’s algorithm, which belongs to a group of techniques
known as the area subdivision methods
• Learn about the octree method for hidden surface removal,which is another object space
method based on the octree representation
INTRODUCTION
In Chapter 7, we learnt to remove objects that are fully or partially outside the view vol-
ume. To recap, this is done using the clipping algorithms. However, sometimes, we need
to remove, either fully or partially, objects that are inside the view volume. An example is
shown in Fig. 8.1. In the figure, object B is partially blocked from the viewer by object A. For
realistic image generation, the blocked portion of B should be eliminated before the scene
is rendered. As we know, clipping algorithms cannot be used for this purpose. Instead, we
make use of another set of algorithms to do it. These algorithms are collectively known as the
hidden surface removal methods (often also called the visible surface detection methods). In
this chapter, we shall learn about these methods.
In all the methods we discuss, we shall assume a right-handed coordinate system with
the viewer looking at the scene along the negative Z direction. One important thing should
be noted: when we talk about hidden surface, we assume a specific viewing direction. This
is so since a surface hidden from a particular viewing position may not be so from another
position. Moreover, for simplicity, we shall assume only objects with polygonal surfaces.
i i
i i
i i
This surface of A
Object A is blocked by B
Z
X
Object B
View coordinate
origin
Fig. 8.1 The figure illustrates the idea of hidden surface. One surface of object A is blocked
by object B and the back surfaces of A are hidden from view. So, during rendering, these
surfaces should be removed for realistic image generation.
This is not an unrealistic assumption after all, since curved surfaces are converted to
polygonal meshes anyway.
i i
i i
i i
Clearly, such methods work after the surfaces are projected and rasterized (i.e.,
mapped to pixels). The computations involved are usually less, although the methods
depend on the display resolution. A change in resolution requires recomputation of
pixel colors.
In this chapter, we shall have a closer look-at some of the algorithms belonging to both
these classes.
i i
i i
i i
X
D(3, 2, 2)
C(6, 1, 4)
Since the method works on surfaces, this is an object space method. Using this simple
method, about half of the surfaces in a scene can be eliminated. However, note that the
method does not consider obscuring of a surface by other objects in the scene. For such situ-
ations, we need to apply other algorithms (in conjunction with this method), as we shall see
next.
i i
i i
i i
called z-buffer). Its size is the same as that of the frame buffer (i.e., one storage for each
pixel). As we assume canonical volumes, we know that the depth of any point within the
surface cannot exceed the normalized range; hence, we can fix the size of the depth-buffer
(number of bits per pixel).
The idea of the method is simple (we assume that 0 ≤ depth ≤ 1): at the begin-
ning, we initialize the depth-buffer locations with 1.0 (the maximum depth value) and
the frame buffer locations with the value corresponding to the background color. Then,
we process each surface of the scene at a time. For each projected pixel position (i, j)
of the surface s, we calculate the depth of the point in 3D dijs . Then, we compare dijs
value with the corresponding entry in the depth-buffer (i.e., (i, j)th depth-buffer value
DBij ). If dijs < DBij , we set DBij = dijs and the surface color value is set to the
corresponding location in the frame buffer. The process is repeated for all projected
points of the surface and for all surfaces. The pseudocode of the method is shown
in Algorithm 8.1.
1: Input: Depth-buffer DB[][] initialized to 1.0, frame buffer FB[][] initialized to background color
value, list of surfaces S, list of projected points for each surface.
2: Output: DB[][] and FB[][] with appropriate values.
3: for each surface in S do
4: for each projected pixel position of the surface i, j, starting from the top-leftmost projected pixel
position do
5: Calculate depth d of the projected point on the surface.
6: if d <DB[i][j] then
7: Set DB[i][j]=d
8: Set FB[i][j]=surface color
9: end if
10: end for
11: end for
We can follow an iterative procedure to calculate the depth of a surface point. We can repre-
sent a planar surface with its equation ax + by + cz + d = 0, where a, b, c, and d are surface
−ax − by − d
constants. Thus, depth of any point on the surface is represented as z = .
c
Since we are assuming a canonical view volume, all projections are parallel. Thus, a point
(x, y, z) is projected to the point (x, y) on the view plane. Now, consider a projected pixel
−ai − bj − d
(i, j) of the surface. Then, depth of the original surface point is z = . As we
c
progress along the same scan line, the next pixel is at (i + 1, j). Thus, the depth of the
−a(i + 1) − bj − d −ai − bj − d a a
corresponding surface point is z′ = = − = z− .
c c c c
Hence, along the same scan line, we can calculate the depth of consecutive surface pixels
a
by adding a constant term (− ) from the current depth value. We can perform similar iter-
c
ations across scan lines also. Assume a point (x, y) on an edge of the projected surface. If
i i
i i
i i
1
we go down to the next scan line, the x value of the edge point will be x′ = x − where
m
m 6= 0 is the slope of the edge. The y value also becomes y − 1. Hence, depth of that point is
−a(x − m1 ) − b(y − 1) − d a
+b
z′ = . Rearranging, we get z′ = z + m . In other words, the
c c
depth of the starting x position of the projected points on a scan line can be found by adding
a constant term to the depth of the starting x position of the previous line (m 6= 0). The idea
is illustrated in Fig. 8.3 with the pseudocode shown in Algorithm 8.2.
x x+1
a
z″ = z′ −
c
Let us try to understand Algorithm 8.1 in terms of a numerical example. For simplicity,
we shall assume an arbitrary (that means, the coordinate extents are not normalized) parallel
view volume. In that case, we shall initialize depth-buffer with a very large value (let us
denote that by the symbol ∞).
i i
i i
i i
Example 8.1
Assume there are two triangular surfaces s1 and s2 in the view voloume. The vertices of s1 are
[(0,0,6), (6,0,0), (0,6,0)] and that of s2 are [(2,0,6), (6,0,6), (4,4,0)]. Since we are assuming parallel
projection, the projected vertices of s1 on the view plane are [(0,0), (6,0), (0,6)] (simply drop the z
coordinate value). Similarly, for s2 , the projected vertices are [(2,0), (6,0), (4,4)]. The situation is
depicted in Fig. 8.4.
Y
6
(4, 4)
X
(0, 0) 2 6
(3, 1)
Solution Let us follow the steps of the algorithm to determine the color of the pixel (3,1). Assume
that cl1 and cl2 are the colors of the surfaces s1 and s2 , respectively and the background color is bg.
After initialization, the depth-buffer value DB[3][1]= ∞ and the frame buffer value FB[3][1]=bg.
We will process the surfaces one at a time in the order s1 followed by s2 .
From the vertices, we can determine the surface equation of s1 as x + y + z − 6 = 0 (see
Appendix A for details). Using the surface equation, we first determine the depth of the leftmost
projected surface pixel on the topmost scan line. In our case, the pixel is (0,6) with a depth of
−1.0 − 1.6 − (−6)
z = = 0. Since this is the only point on the topmost scan line, we move
1
to the next scan line below (y = 5). Using iterative method, we determine the depth of the
a
+b
leftmost projected pixel on this scan line (0,5) to be z′ = z + m . However, note that we
c
a 1
have the slope of the left edge m = ∞. Hence, we set = 0. Therefore, z′ = 0 + = 1.
m 1
The algorithm next computes depth and determines the color values of the pixel along the scan
line y = 5 till it reaches the right edge. At that point, it goes to the next scan line down
(y = 4). For brevity, we will skip these steps and go to the scan line y = 1, as our point of
interest is (3,1).
Following the iterative procedure across scan lines, we compute the depth of the left most pro-
jected surface point (0,1) as z = 5 (check for yourself). We now move along the scan line to the
a
next projected pixel (1,1). Its depth can be iteratively computed as z = z + (− ) = 5 − 1 = 4.
c
Similarly, the depth of the next pixel (2,1) is z = 4 − 1 = 3. In this way, we calculate depth of s1
at (3,1) as z = 3 − 1 = 2. As you can see, this depth value at (3,1) is less than DB[3][1], which is
∞. Hence, we set DB[3][1]=2 and FB[3][1]= cl1 .
i i
i i
i i
Afterwards, the other projected points of s1 are computed in a likewise manner, till the rightmost
edge point on the lowermost scan line. However, we shall skip those steps for brevity and move to
the processing of the next surface.
From the vertices of s2 , we derive the surface equation as 3y + 2z − 12 = 0 (see Appendix A
for details). The projected point on the topmost scan line is (4,4). Therefore, depth at this point is
−3.4 − (−12)
z= = 0. Going down the scan lines (skipping the pixel processing along the scan
2
lines for brevity, as before), we reach y = 1. Note that the slope of the left edge of the projected
surface is m = 2. We can calculate the left-most projected point on y = 1 iteratively based on the
fact that the x-coordinate of the intersection point of the line with slope m and the (y − 1)th scan
1
line is x − , if the x-coordinate of the intersection point of the same line with the yth scan line is
m
x. In this way, we compute x = 2.5 for y = 1. In other words, the leftmost projected point of s2 on
y = 1 is (2.5,1). Using the iterative procedure for depth calculation across scan line (with z = 0
at y = 4), we compute depth at this point to be z = 4.5 (check for yourself). Next, we apply the
iterative depth calculation along the scan line to determine depth of the projected point (3,1) (the
very next projected pixel) to be z = 4.5.
Note that z = 4.5 > DB[3][1], which is having the value 2 after the processing of s1 . There-
fore, the DB value and the corresponding FB value (which is cl1 ) is not changed. The algorithm
processes all the other pixels along y = 1 till the right edge in a likewise manner and the pixels for
y = 0. However, as before we skip those calculations for brevity. Thus, at the end of processing
the two surfaces, we shall have DB[3][1]=2 and FB[3][1]= cl1 .
i i
i i
i i
Surface
depth ≥ 0
data
(a)
Surface 1 Surface 2
depth ≥ 0
data data
(b)
Fig. 8.5 The figure illustrates the organization of an A-buffer location for the two possible
cases. In (a), organization for an opaque visible surface is shown. The case for a transparent
visible surface is shown in (b).
i i
i i
i i
maximum depth of the other surface, see Fig. 8.6). If no overlap is found, we render the
surface and remove it from S. Otherwise, we perform the following checks.
Surface 2
Z Z
Zmax Zmax
Zmin Zmax
Zmax Zmin
Zmin Zmin
X X
Surface 1
Fig. 8.6 The figure illustrates the idea of depth overlap. No depth overlap between the two
surface is there on the left figure whereas in the right figure, the surfaces overlap.
Z
Zmax Surface 2
Zmin Surface 1
Zmax
Zmin
X
Xmin Xmin Xmax Xmax
Fig. 8.7 The figure illustrates the idea of bounding rectangle overlap of two surfaces along
the x axis
i i
i i
i i
Zmax
Surface 2
Zmax
Zmin
Surface 1
Zmin
X
Fig. 8.8 An example showing one surface (surface 1) completely behind the other surface,
viewed along the -z direction
vertices, the equation should return positive value). Figure 8.9 depicts the situation for two
surfaces. In order to check the final condition, we need to have the set of projected pixels
for each surface and then check if there are any common pixels in the two sets (se Fig. 8.10
for illustration). As you can see, the first and the last checks are performed at the pixel level,
whereas the other two checks are performed at the object level. Hence, the depth sorting
algorithm incorporates elements of both the object space and image space methods.
The tests are performed following the order as in our preceeding discussion. As soon
as one of the checks is true, we move to check for overlap with the next surface of the list.
Z
Zmax
Surface 2
Zmax
Zmin
Surface 1
Zmin
X
Fig. 8.9 Illustration of one surface (surface 2) completely in front of surface 1, although
surface 1 is not completely behind surface 2
Y
Surface 1 Bounding box
Surface 2
Fig. 8.10 An example where the projected surfaces do not overlap although their bounding
rectangles do
i i
i i
i i
If all tests fail, we swap the order of the surfaces in the list (called reordering) and stop. Then,
we restart the whole process again. The steps of the depth sorting method, in pseudocode,
are shown in Algorithm 8.3.
1: Input: List of surfaces S = {s1 , s2 , · · · sn }, in sorted order (of increasing maximum depth value).
2: Output: Final frame buffer values.
3: Set a flag Reorder=OFF
4: repeat
5: Set s = sn (i.e., the last element of S)
6: for for each surface si in S where 1 ≤ i < n do
7: if zmin (s) < zmax (si ) (that means, there is depth overlap) then
8: if (bounding rectangles of the two surfaces on the view plane do not overlap then
9: Set i = i + 1 and continue loop.
10: else if s is completely behind si then
11: Set i = i + 1 and continue loop
12: else if si is completely in front of s then
13: Set i = i + 1 and continue loop
14: else if projections of s and si do not overlap then
15: Set i = i + 1 and continue loop
16: else
17: Swap the positions of s and si in S
18: Set Reorder = ON
19: Exit inner loop
20: end if
21: end if
22: end for
23: if Reorder = OFF then
24: Invoke rendering routine for s
25: Set S = S − s
26: else
27: Set Reorder = OFF
28: end if
29: until S = NULL
Sometimes, there are surfaces that intersect each other. As an example, consider Fig. 8.11,
in which the two surfaces intersect. As a result, one part of surface 1 is at a depth larger than
surface 2, although the other part has lesser depth. Therefore, we may initially keep surface 1
after surface 2 in the sorted list. However, since the conditions fail (check for yourself), we
have to reorder them (bring surface 1 in front of surface 2 in the list). As you can check, the
conditions shall fail again and we have to reorder again. This will go on in an infinite loop
and Algorithm 8.3 will loop forever.
In order to avoid such situations, we can use an extra flag (a Boolean variable) for each
surface. If a surface is reordered, the corresponding flag will be set on. If the surface needs
to be reordered next time, we shall do the following.
i i
i i
i i
1. Divide the surface along the intersection line of the two surfaces.
2. Add the two new surfaces in the sorted list, at appropriate positions.
Z
Surface 1
Intersection line
Surface 2
Fig. 8.11 Two surfaces that intersect each other. Algorithm 8.3 should be modified to take
care of such cases
i i
i i
i i
P2 P1
P312
P32 P31
P313 P314 P4
P3
P33 P34
this region. Next, we check the region P2 . Again, no surface is contained within this region.
So, we assign background color to the region and proceed to the next region P3 .
We determine that P3 contains the surface. However, it is not completely overlapped by
the surface. Therefore, we go for dividing the region into four subregions of equal size (the
recursive call in the Algorithm 8.4, line 10). The four subregions are denoted by P31 , P32 ,
P33 , and P34 in the figure. For each of these subregions, we perform the checks again.
We find that the subregion P31 contains the surface. However, the surface does not com-
pletely overlap P31 . Therefore, we go for subdividing the region. The four subregions of
P31 are denoted as P311 , P312 , P313 , and P314 . We then check each of these subregions for
surface visibility.
Since the surface lies in the subregion P311 and is completely overlapping it, we assign
the surface color to the subregion P311 . The other three subregions of P31 do not contain
i i
i i
i i
any surface. Therefore, all these subregions are assigned background color. This completes
our processing of the subregion P31 .
We then retrace the recursive step and go for checking the other three subregions of the
region P3 , namely P32 , P33 , and P34 . Since none of them contains any surface, we assign
background color to them and complete our processing of the subregion P3 .
We then return from the recursive step to check the remaining subregion P4 of the screen.
We find that the region contains no surface. Therefore, background color is assigned to it.
Since all the regions have been checked, the algorithm stops.
1 2
8 7
4 3
Viewing direction
Fig. 8.13 The naming of regions with respect to a viewer in an octree method
i i
i i
i i
nodes (i.e., voxels), each voxel shall have the information about its position with respect to
the viewer, along with the color of the object associated with it.
In order to render the scene represented by the octree, we project the voxels on the view
plane in a front-to-back manner. In other words, we start with the voxels nearest to the
viewer, then move to the next nearest voxels, and so on. It is easy to see that the terms
such as voxels nearest to the viewer indicates a voxel grid (having voxels at the same dis-
tance from the viewer) with a one-to-one correspondance to the pixel grid on the view plane
(since both the grid sizes are same). When a color is encountered in a voxel (of a particular
grid), the corresponding pixel in the frame buffer is painted only if no previous color has
been loaded into the same pixel position. This we can achieve by initially assuming that the
frame buffer locations contains 0 (no color). A pseudocode of the method (for our simplest
octree representation) is shown in Algorithm 8.5.
1: Input: The set of octree leaf nodes (voxels) OCT with each voxel having two attributes (distance from
the viewer, color). We assume that the nearest voxel to viewer has a distance 0.
2: Output: Frame buffer with appropriate color in its locations
3: Set all frame buffer locations to 0 (to denote no color)
4: Set distance = 0
5: repeat
6: for From the leftmost to the rightmost element of OCT do
7: Take an element of OCT (denoted by v)
8: if distance of v from viewer = distance then
9: if corresponding frame buffer location has a value 0 then
10: Set the voxel color as the corresponding frame buffer value
11: end if
12: end if
13: Set OCT = OCT − v
14: end for
15: Set distance = distance + 1
16: until OCT = NULL
Example 8.2
Let us consider an example to illustrate the idea of our simplistic octree method for hidden surface
removal. Assume that we have a display device with a 4 × 4 pixel grid. On this display, we wish
to project a scene enclosed in a cubical region with each side = 4 unit. Note that, we shall have
two levels of recursion to create the octree representation of the scene till we reach the voxel level.
In the first recursion, we create eight regions from the original volume given, each region having
a side length = 2 units. In the next level of recursion, we divide each of the subregions further,
so that we reach the voxel level. The divisions are illustrated in Fig. 8.14, showing two levels of
recursion for one quadrant. Other quadrants are divided similarly.
i i
i i
i i
7 6
5
5
8 6
1 2
1 2 7
8 4 3
4 3
Viewing direction
Note how the naming convention is used. In the first level of recursion, we named the quadrants
as we discussed before ({1,2,3,4} are the front regions while {5,6,7,8} are the back regions with
respect to the viewer). In the second level of recursion, we have done the same thing. Thus, after
the recursion ends, each voxel is associated with two numbers in the form {first level number, sec-
ond level number}. For example, the voxels shown in Fig. 8.14 will have numbers as {1,1}, {1,2},
{1,3}, {1,4}, {1,5}, {1,6}, {1,7} and {1,8}.
As you can see, from these numbers, we can make out the relative position of the voxels with
respect to the viewer. Hence, we can easily determine the voxel grids (i.e., voxels at the same
distance from the viewer). There are four such grids as follows:
Grid 1: {1,1}, {1,2}, {1,3}, {1,4}, {2,1}, {2,2}, {2,3}, {2,4}, {3,1}, {3,2}, {3,3}, {3,4}, {4,1},
{4,2}, {4,3}, and {4,4} [distance = 0 from the viewer]
Grid 2: {1,5}, {1,6}, {1,7}, {1,8}, {2,5}, {2,6}, {2,7}, {2,8}, {3,5}, {3,6}, {3,7}, {3,8}, {4,5},
{4,6}, {4,7}, and {4,8} [distance = 1 from the viewer]
Grid 3: {5,1}, {5,2}, {5,3}, {5,4}, {6,1}, {6,2}, {6,3}, {6,4}, {7,1}, {7,2}, {7,3}, {7,4}, {8,1},
{8,2}, {8,3}, and {8,4} [distance = 2 from the viewer]
Grid 4: {5,5}, {5,6}, {5,7}, {5,8}, {6,5}, {6,6}, {6,7}, {6,8}, {7,5}, {7,6}, {7,7}, {7,8}, {8,5},
{8,6}, {8,7}, and {8,8} [distance = 3 from the viewer]
It is also easy to define a mapping between voxel and pixel grids. For example, we may define
that a voxel with location (i,j) in a grid maps to the pixel (i,j) in the pixel grid.
Now, let us try to execute the steps of Algorithm 8.5. Initially, all pixels have a color value 0
and OCT contains all the voxels of the four grids. Since the distance of the voxels in grid 1 is 0,
all these voxels will be processed in the inner loop first. During the processing of each voxel, its
color (if any) will be set as the color of the corresponding frame buffer location. Afterwards, the
voxel will be removed from the list of voxels in OCT. Thus, after the first round of processing of
the inner loop, OCT shall contain voxels of grids 2, 3 and 4.
In a similar manner, voxels of grid 2 will be processed during the second round of inner loop
execution and the frame buffer colors modified appropriately (i.e., if a frame buffer location already
contains a non zero color value and the corresponding voxel in grid 2 has a color, the frame buffer
color value remains unchanged. Otherwise, the current voxel color will replace the frame buffer
color) . After the second round of inner loop execution, OCT shall contain grid 3 and 4 voxels
and the inner loop executes third time. The process continues in this way till the fourth round exe-
cution of the inner loop is over, after which we will have OCT = NULL and the execution of the
algorithm stops.
i i
i i
i i
SUMMARY
In this chapter, we learnt about the idea of hidden surface removal in a scene. The objective is
to eliminate surfaces that are invisible to a viewer, with respect to a viewing positions. We learnt
about the two broad types of methods—image space and object space methods. The former
works at the pixel level while the later works at the level of object representations.
In order to reduce computations, coherence properties are used in conjunction with the
algorithms. We mentioned seven such properties, namely (a) object coherence, (b) face coher-
ence, (c) edge coherence, (d) scan line coherence, (e) area and span coherence, (f) depth
coherence, and (g) frame coherence. We discussed the ideas in brief. In addition to these, we
also saw how the back face elimination method provides a simple and efficient way for removal
of a large number of hidden surfaces.
Among the many hidden surface removal algorithms available, we discussed three in detail
along with illustrative examples. The first algorithm, namely the depth-buffer algorithm, is one of
the most popular algorithms which works in the image space. The depth sorting algorithm that
we discussed next works at both the image and object space. As we saw, it is more complex and
computation-intensive compared to the depth-buffer algorithm. The third algorithm, namely the
octree method, is an object space method that is based on the octree representation of objects.
We illustrated the idea of octree methods considering few simplisitic assumptions.
In the next chapter, we will discuss the final stage of a 3D graphics pipeline, namely rendering.
BIBLIOGRAPHIC NOTE
There are a large number of hidden surface removal techniques. We have discussed a few
of those. More techniques can be found in Elber and Cohen [1990], Franklin and Kankanhalli
[1990], Segal [1990], and Naylor et al. [1990]. A well-known hidden surface removal technique
is the A-buffer method. Works on this are presented in Cook et al. [1987], Haeberli and Akeley
[1990], and Shilling and Strasser [1993]. Hidden surface removal is also important in three-
dimensional line drawings. For curved surfaces, contour plots are displayed. Such contouring
techniques are summarized in Earnshaw [1985]. For various programming techniques for hid-
den surface detection and removal, the graphics gems book series can be referred (Glassner
[1990], Arvo [1991], Kirk [1992], Heckbert [1994], and Paeth [1995]).
KEY TERMS
(A)ccumulation buffer – a data structure to store depth and associated information for each
surface to which a pixel belongs
A-buffer method – an image space technique for hidden surface removal that works with
transparent surfaces also
Area subdivision – recursive subdivision of the projected area of a surface
Back face elimination – an object space method for hidden surface removal
Coherence – the property by which we can apply some results calculated for one part of a scene
or image to the other parts
Depth (Z) buffer algorithm – an image space method for hidden surface removal
Depth-buffer – a data structure to store the depth information of each pixel
Depth coherence – the depth of nearby parts of a surface is similar
Depth overlap – the minimum depth of one surface is greater than the maximum depth of another
surface
i i
i i
i i
Depth sorting (Painter’s) algorithm – a hidden surface removal technique that combines
elements of both object space and image space methods
Face coherence – the property by which we can check visibility of one part of a surface by
checking its properties at other parts
Frame coherence – pictures of the successive frames are likely to be similar
Hidden surfaces – object surfaces that are hidden with respect to a particular viewing position
Image space method – hidden surface removal techniques that work with the pixel level
projections of object surfaces
Object coherence – determining visibility of an object surface with respect to nearby object
surfaces by comparing their bounding volumes
Object space methods – hidden surface removal techniques that work with the objects, rather
than their projections on the screen.
Octree method – an object space method for hidden surface removal
Scan line coherence – a line or surface segment visible in one scan line is also likely to be visible
in the adjacent scan line
Visible surfaces – surfaces that are visible with respect to the viewing position
Warnock’s algorithm – the earliest area subdivision method for hidden surface removal
EXERCISES
8.1 Discuss the importance of hidden surface removal in a 3D graphics pipeline. How is it
different from clipping?
8.2 What are the broad classes of hidden surface removal methods? Describe each class in
brief along with its pros and cons.
8.3 Briefly explain the idea of coherence. Why is it useful in hidden surface removal?
8.4 In which category of methods does the depth-buffer algorithm belong to? Justify.
8.5 We have discussed the depth-buffer algorithm (Algorithm 8.1) and the iterative depth
calculation (Algorithm 8.2) separately. Write the pseudocode of an algorithm combining
the two.
8.6 The depth sorting method is said to be a hybrid of the object space and image space
methods. Why?
8.7 Why does Algorithm 8.3 fail for intersecting surfaces? Explain with a suitable example.
8.8 Modify Algorithm 8.3 to take into account intersecting surfaces.
8.9 Consider the objects mentioned in Example 8.1. Can Algorithm 8.3 be applied to these
objects or the modified algorithm you wrote to answer Example 8.8? Show the execution
of the steps of the appropriate algorithm for the objects.
8.10 Both the back face elimination and octree methods belong to the object space methods.
What is the major difference between them?
8.11 Assume that during octree creation, we named each region as illustrated in Example 8.2.
Write the pseudocode of an algorithm to determine the distance of a voxel from the viewer.
Integrate this with Algorithm 8.5.
8.12 Algorithm 8.5 assumed a simplistic octree representation. Discuss ways to improve it.
i i
i i
i i
CHAPTER
Rendering
9
Learning Objectives
After going through this chapter, the students will be able to
• Understand the concept of scan conversion
• Get an overview of the issues involved in line scan conversion
• Learn about the digital differential analyser (DDA) line drawing algorithm and its
advantage over the intuitive approach
• Understand the Bresenham’s line drawing algorithm and its advantage over the DDA
algorithm
• Get an overview of the issues involved in circle scan conversion
• Learn about the mid-point algorithm for circle scan conversion
• Understand the issues and approaches for fill area scan conversion
• Learn about the seed fill, flood fill, and scan line polygon fill algorithms for fill area scan
conversion
• Get an overview of character rendering methods
• Understand the problem of aliasing in scan conversion
• Learn about the Gupta-Sproull algorithm for anti-aliasing lines
• Learn about the area sampling and supersampling approaches towards anti-aliasing
INTRODUCTION
Let us review what we have learnt so far. In a 3D graphics pipeline, we start with the def-
inition of objetcs that make up the scene. We learnt different object definition techniques.
We also learnt the various geometric transformations to put the objects in their appropriate
place in a scene. Then, we learnt about the lighting and shading models that are used to
assign colors to the objects. Subsequently, we discussed the viewing pipeline comprising
the three stages: (a) view coordinate formation, (b) projection, and (c) window-to-viewport
transformations. We also learnt various algorithms for clipping and hidden surface removal.
Thus, we now know the stages involved in transforming a 3D scene to a 2D viewport
in the device coordinate system. Note that a device coordinate system is continuous in
nature (i.e., coordinates can be any real number). However, we must use the pixel grid
to render a scene on a physical display. Clearly, pixel grids represent a discrete coordi-
nate system, where any point must have integer coordinates. Thus, we need to map the
i i
i i
i i
scene defined in the viewport (continuous coordinates) to the pixel grid (discrete coor-
dinates). The algorithms and techniques used for perfoming this mapping are the subject
matter of this chapter. These techniques are collectively known as rendering (often called
scan conversion or rasterization). In the rest of this chapter, these terms will be used
synonymously.
The most basic problem in scan conversion is to map a point from the viewport to the
pixel grid. The approach is very simple: just round off the point coordinates to their nearest
integer value. For example, consider the point P(2.6,5.1). This viewport point is mapped to
the pixel grid point P′ (3,5) after rounding off the individual coordinates to their nearest inte-
gers. However, scan conversion of more complex primitives such as line and circle are not
so simple and we need more complex (and efficient) algorithms. Let us learn about those
algorithms.
3 4
y(x = 3) = 3+ = 2.6
5 5
3 4
y(x = 4) = 4 + = 3.2
5 5
3 4
y(x = 5) = 5 + = 3.8
5 5
3 4
y(x = 6) = 6 + = 4.4
5 5
i i
i i
i i
Rendering 169
5 (7, 5)
Scan converted
pixel (3, 3)
Actual point on
the line(3, 2.6)
X
(0, 0) 2 7
(2, 2)
Fig. 9.1 Simple line scan conversion—Note that the actual points on the line are scan
converted to the nearest pixels after rounding off
Thus, between A′ and B′ , we obtain the four points on the line as (3,2.6), (4,3.2), (5,3.8),
and (6,4.4). Following point conversion technique, we determine the pixels for these points
as (3,3), (4,3), (5,4), and (6,4), respectively. The idea is illustrated in Fig. 9.1.
The approach is very simple. However, there are mainly two problems with this approach.
First, we need to perform the multiplication m.x. Second, we need to round off the y-
coordinates. Both of these may involve floating point operations, which are computationally
expensive. In typical graphics applications, we need to scan convert very large number of
lines within a very small time. In such cases, floating point operations make the process slow
and flickers may occur. Thus we need some better solution.
i i
i i
i i
i i
i i
i i
Rendering 171
Based on these two quantities, we can take a simple decision about the pixel closer to the
actual line, by taking the difference of the two.
If the difference is less than 0, the lower pixel is closer to the line and we choose it. Oth-
△y
erwise, we choose the upper pixel. Now, we substitute m in this expression with △x , where
△y and △x are the differences between the end points. Rearranging, we get,
△y
△x(dlower − dupper ) = △x 2 (xk + 1) − 2yk + 2b − 1
△x
= 2 △ y.xk − 2 △ x.yk + 2 △ y + △x(2b − 1)
= 2 △ y.xk − 2 △ x.yk + c
dlower
(xk, yk)
X
Lower candidate pixel
(xk + 1, yk)
Fig. 9.2 The key idea of Bresenham’s line scan conversion algorithm. The algorithm
chooses one of the two candidate pixels based on the distance of the pixels from the actual
line. The decision is taken entirely based on integer calculations.
i i
i i
i i
Note that △x(dlower − dupper ) can also be used to make the decision about the closeness
of the pixels to the actual line. Let us denote this by pk , a decision parameter for the kth
step. Clearly, the sign of the decision parameter will be the same as that of (dlower − dupper ).
Hence, if pk < 0, the lower pixel is closer to the line and we choose it. Otherwise, we choose
the upper pixel.
At step k + 1, the decision parameter is,
Note that in Eq. 9.1, if pk < 0, we set yk+1 = yk , otherwise we set yk+1 = yk + 1. Thus,
depending on the sign of pk , the difference (yk+1 − yk ) in this expression becomes either 0
or 1. The first decision parameter at the starting point is given by p0 = 2 △ y − △x.
What is the implication of this? We are choosing pixels at each step, depending on the
sign of the decision parameter. The decision parameter is computed entirely with inte-
ger operations only. All floating point operations are eliminated. Thus, the approach is a
huge improvement, in terms of speed of computation, over the previous approaches we dis-
cussed. The pseudocode of the Bresenham’s algorithm for line scan conversion is given in
Algorithm 9.2.
i i
i i
i i
Rendering 173
Example 9.1
In order to understand Algorithm 9.2, let us execute its steps for the line segment defined by
the end points A(2,2) and B(7,5) in our previous example. Following line 3 of Algorithm 9.2,
we compute △x = 5, △y = 3, and p = 1. The two variables x and y are set to the
end point A′ as x = 2, y = 2 (line 4). Also, the end point A′ (2,2) is added to P
(line 5).
Note that p = 1 ≥ 0. Therefore, we execute the ELSE part of the loop (lines 10–12) and we get
x = 3, y = 3, p = −3. The pixel (3,3) is added to the output list P (line 14). Since x = 3 < 6 (the
loop termination condition), loop is executed again.
In the second execution of the loop, we have p = −3 < 0. Thus, the IF part (lines 7–9) is
executed and we get x = 4, y = 3 (no change), and p = 3. The pixel (4,3) is added to the output
list P. Since x = 4 < 6, the loop is executed again.
Now we have p = 3 ≥ 0. Therefore, in the third execution of the loop, the statements in the
ELSE part are executed. We get x = 5, y = 4, p = −1. The pixel (5,4) is added to the output pixel
list P. Since x = 5 < 6, the loop is executed again.
In the fourth loop execution, p = −1 < 0. Hence, the IF part is executed with the result x = 6,
y = 4 (no change), and p = 5. The pixel (6,4) is added to the output list P. As the loop termination
condition (x < 6) is no longer true (x = 6), the loop stops.
Finally, we add the other end point B′ (7,5) to the output list P and the algorithm
stops.
Thus, we get the output list P = {(2, 2), (3, 3), (4, 3), (5, 4), (6, 4), (7, 5)} after the termination
of the algorithm.
Algorithm 9.2 works for line segments with 0 ≤ m ≤ 1 or −1 ≤ m ≤ 0. For other line
segments, minor modification to the algorithm is needed, which is left as an exercise for the
reader.
i i
i i
i i
(−x, y) (x, y)
Top right eighth
quadrant
(−y, x) (y, x)
Fig. 9.3 The eight way symmetry and its use to determine points on a circle centered at
origin. We compute pixels for the marked quadrant. The other pixels are determined based
on the property.
(xk, yk)
Midpoint
between
Actual point on candidate pixels
the line X
i i
i i
i i
Rendering 175
We evaluate this function at the midpoint of the two candidate pixels to make our decision
(see Fig. 9.4). In other words, we compute the value f (xk + 1, yk − 21 ). Let us call this the
decision variable pk after the kth step. Thus, we have
1
pk = f xk + 1, yk −
2
1 2
2
= (xk + 1) + yk − − r2
2
Note that if pk < 0, the midpoint is inside the circle. Thus, yk is closer to the circle
boundary and we choose the pixel (xk + 1, yk ). Otherwise, we choose (xk + 1, yk − 1) as the
midpoint is outside the circle and yk − 1 is closer to the circle boundary.
To come up with an efficient algorithm, we perform some more tricks. First, we consider
the decision variable for the (k + 1)th step pk+1 as,
1
pk+1 = f xk+1 + 1, yk+1 −
2
1 2
2
= [(xk + 1) + 1] + yk+1 − − r2
2
After expanding the terms and rearranging, we get Eq. 9.2.
pk+1 = pk + 2(xk + 1) + (y2k+1 − y2k ) − (yk+1 − yk ) + 1 (9.2)
In Eq. 9.2, yk+1 is yk if pk < 0. In such a case, we have pk+1 = pk + 2xk + 3. If pk > 0,
we have yk+1 = yk − 1 and pk+1 = pk + 2(xk − yk ) + 5. Thus, we can choose pixels based on
an incremental approach (computing the next decision parameter from the current value).
One thing remains, that is the first decision variable p0 . This is the decision variable at
(0, r). Using the definition of p0 , we can compute it as follows.
1
p0 = f 0 + 1, r −
2
2
1
= 1+ r− − r2
2
5
= −r
4
The pseudocode of the algorithm is shown in Algorithm 9.3, where RoundOff(a) rounds
off the number a to its nearest integer.
With very simple modifications to Algorithm 9.3, we can determine pixels for circles
about any arbitrary center. The modifications are left as an exercise for the reader.
i i
i i
i i
Example 9.2
Let us consider a circle with radius r = 2.7. We will execute the steps of Algorithm 9.3 to see how
the pixels are determined.
First, we compute p = −1.05 and set x = 0, y = 3 (lines 3–4). Also, we add the axis pixels
{(0,3), (3,0), (0,−3), (−3,0)} to the output pixel list P (line 5). Then, we enter the loop.
Note that p = −1.45 < 0. Hence, the IF part (lines 7–9) is executed and we get p = 1.55 and
x = 1 (y remains unchanged). So, the pixels added to P are (line 14): {(1,3), (3,1), (3,−1), (1,−3),
(−1,−3), (−3,−1), (−3,1), (−1,3)}. Since x = 1 < y = 3, the loop is executed again.
In the second run of the loop, we have p = 1.55 > 0. Hence, the ELSE part is now executed
(lines 10–12) and we get p = 2.55, x = 2, and y = 2. Therefore, the pixels added to P are {(2,2),
(2,2), (2,−2), (2,−2), (−2,−2), (−2,−2), (−2,2), (−2,2)}. Since now x = 2 = y, the algorithm
terminates.
Thus, at the end, P consists of the 20 pixels {(0,3), (3,0), (0,−3), (−3,0), (1,3), (3,1), (3,−1),
(1,−3), (−1,−3), (−3,−1), (−3,1), (−1,3), (2,2), (2,2), (2,−2), (2,−2), (−2,−2), (−2,−2), (−2,2),
(−2,2)}.
Note that the output set P contains some duplicate entries. Before rendering, we perform further
checks on P to remove such entries.
i i
i i
i i
Rendering 177
of the region). In other words, we want to fill the region with a specified color. For exam-
ple, consider an interactive painting system. You draw an arbitrary shape and color it (both
boundary and interior). Now, you want to change the color of the shape interactively (e.g.,
select a color from a menu and click in the interior of the shape to indicate that the new
color be applied to the shape). There are many ways to perform such region filling. The
techniques depend on how the regions are defined. There are broadly the following two
types of definitions of a region.
Pixel level definition A region is defined in terms of its boundary pixels (known as
boundary-defined) or the pixels within its boundary (called interior defined). Such defi-
nitions are used for regions having complex boundaries or in interactive painting systems.
Geometric definition A region is defined in terms of geometric primitives such as edges
and vertices. Primarily meant for defining polygonal regions, such definitions are commonly
used in general graphics packages.
In the following section, we shall discuss algorithms used to fill regions defined in either
of the ways.
i i
i i
i i
1: Input: Boundary pixel color, specified color, and the seed (interior pixel) p
2: Output: Interior pixels with specified color
3: Push(p) to Stack
4: repeat
5: Set current pixel = Pop(Stack)
6: Apply specified color to the current pixel
7: for Each of the four connected pixels (four-connected) or eight connected pixels (eight-connected)
of current pixel do
8: if (connected pixel color 6= boundary color) OR (connected pixel color 6= specified color) then
9: Push(connected pixel)
10: end if
11: end for
12: until Stack is empty
1: Input: Interior pixel color, specified color, and the seed (interior pixel) p
2: Output: Interior pixels with specified color
3: Push(p) to Stack
4: repeat
5: Set current pixel = Pop(Stack)
6: Apply specified color to the current pixel
7: for Each of the four connected pixels (four-connected) or eight connected pixels (eight-connected)
of current pixel do
8: if (Color(connected pixel) = interior color then
9: Push(connected pixel)
10: end if
11: end for
12: until Stack is empty
i i
i i
i i
Rendering 179
Y C(3, 6)
6
B(6, 4)
D(1, 3)
X
(0, 0) 7
A(5, 1)
Fig. 9.5 Illustrative example of the scan line polygon fill algorithm
First, we determine the maximum and minimum scanlines (line 3) from the coordinate of
the vertices as: maximum = max{1,4,6,3} (i.e., maximum of the vertex y-coordinates) = 6,
minimum = min{1,4,6,3} (i.e., minimum of the vertex y-coordinates) = 1.
In the first iteration of the outer loop, we first determine the intersection points of the scan
line y = 1 with all the four edges in the inner loop (lines 6–10). For the edge AB, the IF
condition is satisfied and we determine the intersection point as the vertex A (lines 7–8). For
BC and CD, the condition is not satisfied. However, for DA, again the condition is satisfied
and we get the vertex A again. Thus, the two intersection points determined by the algorithm
are the same vertex A. Since this is the only pixel between itself, we apply specified color to
it (lines 11–12). Then we set scanline = 2 (line 11). Since 2 6= maximum = 6, we reenter
the outer loop.
In the second iteration of the outer loop, we check for the intersection points between the
edges and the scanline y = 2. For the edge AB, the IF condition is satisfied. So there is an
intersection point, which is (5 31 , 2). The edges BC and CD do not satisfy the condition, hence
i i
i i
i i
there are no edge–scanline intersections. The condition is satisfied by the edge DA and the
intersection point is (3,2). After sorting (line 11), we have the two intersection points (3,2)
and (5 13 , 2). The pixels in between them are (3,2), (4,2), and (5,2). We apply the specified
color to these pixels (line 12) and set scanline = 3. Since 3 6= maximum = 6, we reenter the
outer loop.
The algorithm works in a similar way for the the remaining scanlines y = 3, y = 4,
y = 5, and y = 6 (the execution is left as an exercise for the reader). There are two
things in the algorithm that require some elaboration. First, how do we determine the edge–
scanline intersection point? Second, how do we determine pixels within two intersection
points?
We can use a simple method to determine the edge–scanline intersection point. First,
from the vertices, determine the line equation for an edge. For example, for the edge AB in
4−1
Fig. 9.5, we compute m = 6−4 = 3. Thus, the line equation is y = 3x + b. Now, evaluating
the equation at the end point A (i.e., x = 5, y = 1), we get b = 1 − 3.5 = −14. Therefore
the equation for AB is y = 3x − 14. Now, to determine the edge–scanline intersection point,
simply replace the scanline (y) value in the equation and compute the x-coordinate. Thus, to
get the x-coordinate of the intersection point between the scanline y = 2 and AB, we evaluate
2 = 3x − 14 or x = 5 13 .
Given two intersection points (x1 , y1 ) and (x2 , y2 ) where x1 < x2 , determination of the
pixels between them is easy. Increment x1 by one to get the next pixel and continue till
the current x value is less than x2 . If either or both the intersection points are pixels them-
selves, they are also included. As an illustration, consider the two intersection points (3,2)
and (5 31 , 2) of the polygon edges (AB and DA, respectively) with the scanline y = 2 in
Fig. 9.5. Here x1 = 3, x2 = 5 13 . The first pixel is the intersection point (3,2) itself. The next
pixel is (3 + 1, 2) or (4,2). We continue to get the next pixel as (4 + 1, 2) or (5,2). Since
5 + 1 = 6 > x2 = 5 31 , we stop.
An important point to note here is that Algorithm 9.6 works for convex polygons only.
For concave polygons, an additional problem needs to be solved. As we discussed before,
we determine pixels between the pair of edge–scanline intersection points. However, all
these pixels may not be inside the polygon in case of a concave polygon, as illustrated in
Fig. 9.6. Therefore, in addition to determining pixels, we also need to determine which
pixels are inside.
In order to make Algorithm 9.6 work for concave polygons, we have to perform an inside–
outside test for each pixel between a pair of edge–scanline intersection points (an additional
overhead). The following are the steps for a simple inside–outside test for a pixel p.
1. Determine the bounding box (maximum and minimum x and y coordinates) for the
polygon.
2. Choose an arbitrary pixel po outside the bounding box (This is easy. Simply choose a
point whose x and y coordinates are outside the minimum and maximum range of the
polygon coordinates).
3. Create a line by joining p and po (i.e., determine the line equation).
4. If the line intersects the polygon edges an even number of times, p is an outside pixel.
Otherwise, p is inside the polygon.
i i
i i
i i
Rendering 181
Y (3, 6) (4, 3)
6
(6, 4)
Intersection
point A Intersection
point D
(1, 1)
X
Intersection (5, 1)
point B These pixels Intersection
are outside point C
Fig. 9.6 The problem with concave polygons—two pixels (3,2) and (4,2), which are inside
the pair of intersection points B and C, are not inside the polygon
i i
i i
i i
On pixel
Off pixel
X
8 × 8 pixel grid
Line pixels
Vertex pixel
Fig. 9.8 The same character B of Fig. 9.7 is defined in terms of vertices and edges, as out-
line definition. The intermediate pixels are computed using scan conversion procedure during
rendering.
Outline fonts, on the other hand, require less storage and we can perform geometric trans-
formations with satisfactory effect to reshape or resize such fonts. Moreover, they are not
resolution-dependent. However, rendering is slow since we have to perform scan conversion
procedures before rendering.
9.5 ANTI-ALIASING
Let us consider Fig. 9.9. This is basically a modified form of Fig. 9.1, where we have
seen the pixels computed to render the line. As shown in Fig. 9.9, the scan converted line
(shown as a continuous line) does not look exactly like the original (shown as a dotted
line). Instead, we see a stair-step like pattern, often called the jaggies. This implies that,
after scan conversion, some distortion may occur in the original shape. Such distortions are
called aliasing (we shall discuss in the next section why it is called so). Some additional
operations are performed to remove such distortions, which are known as anti-aliasing
techniques.
i i
i i
i i
Rendering 183
Scan converted
line
Actual line
X
(0, 0) 2 7
Fig. 9.9 Problem of aliasing—Note the difference between actual line (dotted) and scan
converted line
Pre-filtering It works on the true signal in the continuous space to derive proper val-
ues for individual pixels. In other words, it is filtering before sampling. There are various
pre-filtering techniques, often known as area sampling.
i i
i i
i i
is filtering after sampling. The post-filtering techniques are often known as super
sampling.
Let us now get some idea about the working of these two types of filtering techniques.
Line (with
Pixel (0, 1) - area 2 width)
overlap 50%
0 X
0 1 2
Fig. 9.10 Illustration of area sampling technique. Each square represents a pixel area.
Depending on the area overlap of the line with the pixels, pixel intensities are set.
dk = F(M)
1
= F xk + 1, yk +
2
1
= 2(a(xk + 1) + b yk + + c)
2
i i
i i
i i
Rendering 185
Y
Upper candidate pixel
NE(xk + 1, yk + 1)
Midpoint between
candidate pixels
M(xk + 1, yk + 1/2)
(xk, yk) X
Fig. 9.11 Midpoint line drawing algorithm. The decision variable is based on the midpoint
between the candidate pixels.
If d > 0, the midpoint is below the line. Thus, the pixel NE is closer to the line and we
choose it. In such a case, the next decision variable is,
1
dk+1 = F (xk + 1) + 1, (yk + 1) +
2
1
= 2 a((xk + 1) + 1) + b (yk + 1) + +c
2
1
= 2(a(xk + 1) + b yk + + c + 2(a + b)
2
= dk + 2(a + b)
i i
i i
i i
NE(xk + 1, yk + 1)
Dupper
Midpoint
1−v
(xk, yk) v
D 1+v
E(xk + 1, yk )
Dlower
d + △x
D= p (9.3)
2 △x2 + △y2
In the expression, d is the midpoint decision variable and △x and △y are the differences
in x and y coordinate values of the line endpoints, respectively. Note that the denominator is
a constant.
The intensity of E will be a fraction of the original line color. The fraction is determined
based on D. This is unlike Algorithm 9.7 where the line color is simply assigned to E. In
order to determine the fraction, a cone filter function is used. In other words, the more the
distance of the line from the chosen pixel center, the lesser will be the intensity. The function
i i
i i
i i
Rendering 187
is implemented in the form of a table. In the table, each entry represents the fraction with
respect to a given D.
In order to increase the line smoothness, the intensity of the two vertical neighbours of E,
namely the points (xk + 1, yk + 1) and (xk + 1, yk − 1) are also set in a similar way according
to their distances Dupper and Dlower respectively from the line. We can analytically derive the
two distances as (derivation is left as an exercise),
(1 − v)△x
Dupper = 2 p (9.4a)
2 △x2 + △y2
(1 + v)△x
Dlower = 2 p (9.4b)
2 △x2 + △y2
d − △x
D= p (9.5a)
2 △x2 + △y2
(1 − v)△x
Dupper = 2 p (9.5b)
2 △x2 + △y2
(1 + v)△x
Dlower = 2 p (9.5c)
2 △x2 + △y2
Note that in Eq. 9.5(b), Dupper is the perpendicular distance of the pixel (xk + 1, yk + 2)
and Dlower denotes the distance of the pixel E (xk + 1, yk ) from the line.
Thus in the Gupta–Sproull algorithm (Algorithm 9.8), we perform the following addi-
tional steps in each iteration of the midpoint line drawing algorithm.
i i
i i
i i
Example 9.3
Let us understand the working of the Gupta–Sproull algorithm in terms of an example. Consider
the line segment shown in Fig. 9.13 between the two end points A(1,1) and B(4,3). Our objective
is to determine the following two things.
1. The pixels that should be colored to render the line.
2. The intensity values to be applied to the chosen pixels (and its vertical neighbours) to reduce
aliasing effect.
Y
Actual line
3 E′(3, 2)
NE(2, 2)
X
(0, 0) 4
Let us first determine the pixels to be choosen to render the line following the midpoint
algorithm (Algorithm 9.7). From the line end points, we can derive the line equation as 2x −
3y + 1 = 0 (see Appendix A for derivation of line equation from two end points). Thus, we have
a = 2, b = −3, and c = 1. Hence the initial decision value is: d = 1 (lines 3–4, Algorithm 9.7).
In the first iteration of the algorithm, we need to choose between the two pixels: the upper
candidate pixel NE (2,2) and the lower candidate pixel E (2,1) (see Fig. 9.11). Since d > 0, we
choose the NE pixel (2,2) and reset d = −1 (lines 8–10, Algorithm 9.7). In the next iteration,
the two possibilities are: the upper candidate pixel NE′ (3,3) and the lower candidate pixel E′
(3,2). Since now d < 0, we choose E′ (3,2) as the next pixel to render and reset d = 3 (lines
11–13, Algorithm 9.7). However, since now x = 3, the looping condition check fails (line 16,
Algorithm 9.7). The algorithm stops and returns the set of pixels {(1,1), (2,2), (3,2), (4,3)}. These
are the pixels to be rendered.
Next, we determine the intensity values for the choosen pixels and its two vertical neighbours
according to the Gupta–Sproull algorithm (Algorithm 9.8). We know that △x = 4 − 1 = 3 and
△y = 3 − 1 = 2. Let us start with the first intermediate pixel. Note that the first intermediate pixel
chosen is the upper candidate pixel NE (2,2). For this pixel, we have d = 1. Therefore, we compute
the perpendicular distance D from the line as,
DNE = √−1 (line 4, Algorithm 9.8)
13
Next, we have to compute the distances of the vertical neighbours of the choosen pixels Dupper
and Dlower . The line equation is 2x − 3y + 1 = 0. At the chosen pixel position, x = 2. Putting this
value in the line equation, we get y = 35 . Therefore, v = 35 − 1 = 23 (see Fig. 9.12). Hence,
i i
i i
i i
Rendering 189
A pixel with 2 × 2 2
sub-pixel grid
0
X
0 1 2
Fig. 9.14 The idea of super sampling with a 2 × 2 sub-pixel grid for each pixel. The pixel
intensity is determined based on the number of sub-pixels through which the line passes. For
example, in the pixel (0,0), the line passes through 3 sub-pixels. However, in (1,0), only one
sub-pixel is part of the line. Thus, intensity of (0,0) will be more than (1,0)
i i
i i
i i
2
Line with
width
1
0
X
0 1 2
Fig. 9.15 The idea of super sampling for lines with finite width
outside of the line. For example, consider Fig. 9.15 where each pixel is divided into a 2 × 2
sub-pixel grid.
Assume that the original line color is red (R = 1, G = 0, B = 0) and background is
light yellow (R = 0.5, G = 0.5, B = 0). Note that three sub-pixels (top right, bottom left,
and bottom right) are inside the line in pixel (1,1). Therefore, the fraction of sub-pixels that
are inside is 43 and the outside sub-pixel fraction is 41 . The weighted average for individual
intensity components (i.e., R, G, and B) for the pixel (1,1) therefore are,
3 1 7
AverageR = 1 × + 0.5 × =
4 4 8
3 1 1
AverageG = 0 × + 0.5 × =
4 4 8
3 1
AverageB = 0 × + 0 × = 0
4 4
Note that the intensity contribution of a sub-pixel is its corresponding mask value divided
4
by 16 (sum of all the values). For example, the contribution of the center sub-pixel is 16 .
Now suppose a line passes through (or encloses) the sub-pixels top, center, bottom left, and
bottom of a pixel (x, y). Thus, if the line intensity is cl (rl , gl , bl ) and the background color
i i
i i
i i
Rendering 191
is cb (rb , gb , bb ), then the pixel intensity can be computed as: Intensity = (total contribution
of sub-pixels) × line color + (1 − total contribution of sub-pixels) × background color for
each of the R, G, and B color components as we have done before.
SUMMARY
In this chapter, we learnt about the last stage of the 3D graphics pipeline, namely the rendering
of objects on the screen (also known as scan conversion or rasterization). We discussed ren-
dering for geometric primitives such as lines and circles. In line rendering, we started with the
simple and intuitive algorithm and saw its inefficiency in terms of the floating point operations
it requires. Only some and not all these operations can be eliminated in the DDA algorithm,
which thus offer little improvement. Bresenham’s algorithm is the most efficient as it renders a
line using integer operations only. The midpoint circle rendering algorithm similarly increases the
efficieny by performing mostly integer operations. However, unlike the Bresenham’s line drawing,
some floating point operations are still required in midpoint circle drawing.
An issue in interactive graphics is to render a fill area (i.e. an enclosed region). We discussed
the two ways to define a fill area, namely the pixel-level definition and geometric definition.
Depending on the definition, we discussed various fill area rendering algorithms such as seed
fill, flood fill, and scanline polygon fill. The first two rely on pixel-level definitions while the third
algorithm assumes a geometric definition of fill area.
A frequent activity in computer graphics is to display characters. We discussed both the
bitmap and outlined character rendering techniques along with their pros and cons.
Finally, the problem of distortion in original shapes (known as aliasing) that arises during
the rendering process is discussed, along with the various techniques (called anti-aliasing) to
overcome it. Following an explanation of the origin of the term aliasing with the signal pro-
cessing concepts, we mentioned and briefly discussed the two broad groups of anti-aliasing
techniques: pre-filtering and post-filtering. The pre-filtering or area sampling is discussed includ-
ing the Gupta–Sproull algorithm. We also learnt about the idea of various post-filtering or super
sampling techniques with illustrative examples.
BIBLIOGRAPHIC NOTE
Bresenham [1965] and Bresenham [1977] contain the original idea on the Bresenham’s
algorithm. More on the midpoint methods can be found in Kappel [1985]. Fill area scan con-
version techniques are discussed in Fishkin and Barsky [1984]. Crow [1981], Turkowski [1982],
Fujimoto and Iwata [1983], Korien and Badler [1983], Kirk and Arvo [1991], and Wu [1991]
can be referenced for further study on anti-aliasing techniques. The Computer Gems book
series ((Glassner [1990], Arvo [1991], Kirk [1992], Heckbert [1994] and Paeth [1995]) contain
additional discussion on all these topics.
KEY TERMS
Aliasing – the distortions that may occur to an object due to scan conversion
Anti-aliasing – techniques to eliminate/reduce the aliasing effects
Bitmapped fonts – a character representation scheme in which each character is represented in
terms of on and off pixels in a pixel grid
Boundary defined – defining a fill area in terms of its boundary pixels
i i
i i
i i
Bresenham’s algorithm – a more efficient line scan conversion algorithm that works based on
integer operations only
DDA algorithm – a line scan conversion algorithm
Decision parameter – a parameter used in the Bresenham’s algorithm
Eight-connected – an interior pixel is connected to eight of its neighbours
Flood fill algorithm – a fill area scan conversion algorithm that works with the interior defined
regions
Font/Typeface – overall design style of a character
Four-connected – an interior pixel is connected to four of its neighbours
Geometric definition – defining a fill area in terms of geometric primitives such as edges and
vertices
Gupta-Sproull algorithm – a pre-filtering anti-aliasing technique for lines
Interior defined – defining a fill area in terms of its interior pixels
Midpoint algorithm – an algorithm for circle scan conversion
Outlined font – a character representation scheme in which each character is represented in
terms of some geometric primitives such as points and lines
Pixel level definition – defining a fill area in terms of constituent pixels
Point (of font) – size of a character
Post-filtering/Super sampling – anti-aliasing techniques that work on the pixels to modify their
intensities
Pre-filtering/Area sampling – anti-aliasing techniques that work on the actual signal and derive
appropriate pixel intensities
Scan conversion/Rasterization/Rendering – the process of mapping points from continuous
device space to discrete pixel grid
Scan line polygon fill algorithm – a fill area scan conversion algorithm that works with geometric
definition of fill regions
Seed – an interior pixel inside a fill area
Seed fill algorithm – a fill area scan conversion algorithm that works with the boundary defined
regions
Sub pixel – a unit of (conceptual) division of a pixel for super sampling techniques
EXERCISES
9.1 Discuss the role played by the rendering techniques in the context of the 3D graphics
pipeline.
9.2 Derive the incremental approach of the Bresenham’s line drawing algorithm. Algorithm 9.2
works for lines with slope 0 ≤ m ≤ 1 or −1 ≤ m ≤ 0. Modify the algorithm for slopes that
are outside this range.
9.3 The midpoint line drawing algorithm is shown in Algorithm 9.7. Is there any difference
between Algorithms 9.2 and 9.7? Discuss with respect to a suitable example.
9.4 Derive the incremental computation on which the midpoint circle algorithm is based.Explain
the importance of the eight-way symmetry in circle drawing algorithms.
9.5 Algorithm 9.3 works for circles having the origin as center. Modify the algorithm so as to
make it work for circles with any arbitrary center.
9.6 Explain the different definitions of an enclosed region with illustrative examples. Calculate
the pixels for the scanlines y = 3, y = 4, y = 5, and y = 6 in the example mentioned in
Section 9.3.3.
9.7 Algorithm 9.6 works for convex polygons only. Modify the algorithm, by incorporating the
steps for the simple inside-outside test, so that the algorithm works for concave polygons
also.
i i
i i
i i
Rendering 193
9.8 Discuss the advantanges and disadvantages of the bitmapped and the outlined font ren-
dering methods. Suppose we have a 20′′ × 10′′ display with resolution 720 × 360. What
would be the bitmap size (in pixels) to produce a 12-point font on this display?
9.9 Explain the term aliasing. Why it is called so? How is the concept of filtering related to
anti-aliasing?
9.10 Discuss the basic idea of area sampling with an illustrative example.
9.11 Derive the expressions of Eqs. 9.3, 9.4, and 9.5 using analytical geometry. Modify
Algorithm 9.7 to include the Gupta–Sproull anti-aliasing algorithm.
9.12 Explain the basic idea of super sampling. Discuss, with illustrative examples other than the
ones mentioned in the text,the three super sampling techniques we learnt,namely (a) super
sampling for lines without width, (b) super sampling for lines with finite width, and (c) super
sampling with weighting masks. Do you think the use of masks offers any advantage over
the non-mask-based methods? Discuss.
i i
i i
i i
CHAPTER
Graphics
10 Hardware and
Software
Learning Objectives
After going through this chapter, the students will be able to
• Review the generic architecture of a graphics system
• Get an overview of the input and output devices of a graphics system
• Understand the basics of the flat panel displays including the plasma panels, thin-
film electroluminescent displays, light-emitting diode (LED) displays, and liquid crystal
displays (LCDs)
• Get an overview of the common hardcopy output devices-printers and plotters
• Know about the widely used input devices including keyboards, mouse, trackballs,
spaceballs, joysticks, data gloves, and touch screen device.
• Learn the fundamentals of the graphics processing unit (GPU)
• Get an overview of shaders and shader programming
• Know about graphics software and software standards
• Learn the basics of OpenGL, a widely used open source graphics library
INTRODUCTION
We are now in a position to understand the fundamental process involved in depicting an
image on a computer screen. In very simple terms, the process is as follows: we start with
the abstract representation of the objects in the image, using points (vertices), lines (edges),
and other such geometric primitives in the 3D world coordinate; the pipeline stages are
then applied to convert the abstract representation to a sequence of bits (i.e., a sequence of
0’s and 1’s); the sequence is stored in the frame buffer and used by the video controller to
activate appropriate pixels on the screen, so that we perceive the image. So far, we have
discussed only the theoretical aspects of this process; how it works conceptually without
elaborating on the implemetation issues. Although we touched upon the topic of displaying
images on a CRT screen Chapter 1, it was very brief. In this chapter, we shall learn in more
detail the implementation aspect of the fundamental process. More specifically, we shall
learn about the overall architecture of a graphics system, the technology of few important
display devices, introductory concepts on the graphics processing unit (GPU), and how the
rendering process (the 3D pipeline) is actually implemented on the hardware. The chapter
i i
i i
i i
will also introduce the basics of OpenGL, an open source graphics library widely used to
write computer graphics programs.
System
memory Video
Display controller
CPU controller
System bus
Display
I/O devices
(a)
Display Frame
memory buffer
Display Video
CPU controller controller
System bus
Display
I/O devices
System
memory
(b)
Fig. 10.1 Different ways to integrate memory in the generic graphics system (a) No sepa-
rate graphics memory is present and the system memory is used in shared mode by both
the CPU and the graphics controller (b) Controller has its own dedicated memory
i i
i i
i i
(as specialized instructions executed in the GPUs) to convert them to the sequence of bits.
The bit sequence gets stored in the frame buffer. In the case of interactive systems, the frame
buffer content may be changed depending on the input coming from the input devices such
as a mouse.
The video controller acts based on the frame buffer content (the bit sequence). The job of
the video controller is to map the value represented by the bit(s) in each frame buffer location
to the activation of the corresponding pixel on the display screen. For example, in the case
of CRT devices, such activation refers to the excitation (by an appropriate amount) of the
corresponding phosphor dots on the screen. Note that the amount of excitation is determined
by the intensity of the electron beam, which in turn is determined by the voltage applied on
the electron gun, which in turn is determined by the frame buffer value.
The frame buffer is only a part of the video memory required to perform graphics oper-
ations. Along with the frame buffer, we also require memory to store object definitions and
instructions for graphics operations (i.e., to store code and data as in any other program).
The memory can be integrated in the generic architecture either as shared system mem-
ory (shared by CPU and GPU) or dedicated graphics memory (part of graphics controller
organization). The two possibilities are illustrated in Fig. 10.1. Note that when the memory
is shared, the execution will be slower since the data transmission takes place through the
common system bus.
i i
i i
i i
Horizontal conductor
(inside of the plate) Vertical conductor
(inside of the plate)
electrical-to-optical energy conversion takes place. Instead, such devices convert light (either
natural light or light from some other sources) to a graphics pattern on the screen through
some optical effects. A widely used non-emissive display is the liquid crystal display (LCD).
Plasma Panels
The schematic of a plasma panel is shown in Fig. 10.2. As the figure illustrates, there are two
glass plates placed parallelly. The region between the plates is filled with a mixture of gases
(xeon, neon, and helium). The inner walls of each glass plate contains a set of parallel con-
ductors (very thin and shaped like ribbons). One plate has a set of vertical conductors while
the other contains a set of horizontal conductors. The region between each corresponding
pair of conductors on the two plates (e.g., two consecutive horizontal and opposite vertical
conductors) form a pixel. The screen side wall of the pixel is coated with phosphors like
in CRT (three phosphors corresponding to RGB for color displays). With the application
of appropriate firing voltage, the gas in the pixel cell breaks down into electrons and ions.
The ions rush towards the electrodes and collide with the phosphor coating emitting lights.
Separation between pixels is achieved by the electric fields of the conductors.
LED Displays
Thin-film electroluminescent displays are similar in construction to plasma panels; they also
have mutually perpendicular sets of conducting ribbons on the inside walls of two parallel
glass plates. However, instead of gases, the region between the glass plates is filled with a
phosphor, such as zinc sulphide doped with manganese. The phosphor becomes a conductor
at the point of intersection when a sufficiently high voltage is applied to a pair of cross-
ing electrotodes. The manganese atom absorbes the electrical energy and releases photons,
generating the perception of a glowing spot or pixel on the screen. Such displays, however,
require more power than plasma panels. Also, good color displays with this technology is
difficult to achieve.
Light emitting diode or LED displays are another type of emissive devices, which are
becoming popular nowadays. In such devices, each pixel position is represented by an LED.
Thus, the whole display is a grid of LEDs corresponding to the pixel grid. Based on the frame
buffer information, suitable voltage is applied to each diode to make it emit appropriate
amount of light.
i i
i i
i i
Light
Horizontal polarizer
(a)
Vertical conductor
Liquid crystal with
rod-shaped cells
Light
Horizontal polarizer
Fig. 10.3 The working of a transmissive LCD. A light source at the back sends light through
the polarizer. The polarized light gets twisted and passes through the opposite polarizer to
the viewer in the active pixel state, shown in (a). (b) shows the molecular arrangement of
the liquid crystal at a pixel position, after voltage is applied to the perpendicular pair of con-
ductors on the opposite glass plates. The arrangement prevents light from passing between
polarizers, indicating deactivation of the pixel.
i i
i i
i i
their long axes. The intersection points of each pair of mutually perpendicular conductors
define pixel positions. When a pixel position is active, the molecules are aligned as shown
in Fig. 10.3(a). In a reflective display, external light enters through one polarizer and gets
polarized. The molecular arrangement of the liquid crystal ensures that the polarized light
gets twisted so that it can pass through the opposite polarizer. Behind the polarizer, a reflec-
tive surface is present and the light is reflected back to the viewer. In a transmissive display,
a light source is present on the back side of the screen. Light from the source gets polar-
ized after passing through the back side polarizer, twisted by the liquid crystal molecules
and passes through the screen-side polarizer to the viewer. In order to de-activate the pixel,
a voltage is applied to the intersecting pair of conductors. This leads to the molecules in
the pixel region (between the conductors) getting arranged as shown in Fig. 10.3(b). This
new arrangement prevents the polarized light to get twisted and pass through the oppo-
site polarizer. The technology, both reflective and trasmissive, described here is known as
passive-matrix LCD. Another method for constructing LCDs is to place thin-film transistors
at each pixel location to have more control on the voltage at those locations. The transistors
also help prevent charges to be leaking out gradually to the liquid crystal cells. Such types
of LCDs are called active matrix LCDs.
i i
i i
i i
Spare pen in
different colours Moving arm
Pen
Moving pen
carriage
In order to get colored printouts, impact printers use different-colored ribbons. However,
the range of color produced and the quality are usually limted. Non-impact printers, on
the other hand, are good at producing color images. In such devices, color is produced by
combining the three color pigments (cyan, magenta, and yellow) (see the discussion on
the CMY model in Chapter 5). Laser and electrostatic devices deposit the three pigments
on separate passes; the three colors are shot together on a single pass along each line in
ink-jet printers.
Plotters are another class of hardcopy graphics output devices, which are typically used to
generate drafting layouts and other drawings. In a pen plotter (see Fig. 10.4), one/more pens
are mounted on a carriage, or crossbar, that spans a sheet of paper. The paper can lie flat or
rolled onto a drum or belt and held in place with clamps, a vaccum or an electrostatic charge.
In order to generate different shading and line styles, pens with varying colors and widths
are used. Pen-holding crossbars can either move or remain stationary. In the latter case, the
pens themselves move back and forth along the bar. Instead of pen, ink-jet technology is also
used to design plotters.
i i
i i
i i
Mouse A mouse is primarily used to position the cursor on the screen, select on-screen
items, and perform a host of menu operations. Wheels or rollers at the bottom of the mouse
record the amount and direction of mouse movement which is converted to screen-cursor
movement. Sometimes, instead of a roller, optical sensing techniques are used to deter-
mine the amount and direction of mouse movement. There are between one to three buttons
present on a mouse, although the two-button mouse is more common. Along with the but-
tons, it may be equipped with a wheel to perform positioning operations more conveniently.
A mouse may be attached to the main computer with wires or it may be a wireless mouse.
Trackballs and spaceballs Similar to a mouse, trackball devices (Fig. 10.5) are used to
position screen-cursors. A trackball device contains a ball. When the ball is rotated by a fin-
ger/palm/hand, a screen-cursor movement takes place. A potentiometer is connected to the
ball and measures the amount and direction of the ball rotation, which is then mapped to the
screen-cursor movement. While the term trackball typically denotes devices used to con-
trol cursor in a 2D-space (screen), cursor-control in 3D-space is done through spaceballs. A
spaceball provides six degrees of freedom. However, in a spaceball, there are no actual ball
movements. Instead, the ball is pushed and pulled in various directions, which is mapped to
cursor positioning in 3D space.
Joysticks Another positioning device is the joystick (Fig. 10.6). It contains a small, ver-
tical lever (called stick) attached to a base. The stick can be moved in various directions to
move the screen-cursor. The amount and direction of cursor movement is determined by the
i i
i i
i i
amount and direction of stick movement from its center position (measured with a poten-
tiometer mounted at the base). In isometric joysticks, however, no movable sticks are present.
Instead, the sticks are pushed or pulled to move on-screen cursor.
Data gloves Commonly used in virtual reality systems, data gloves are devices that allow
a user to position and manipulate virtual objects in a more natural way—through hand or
finger gestures. It is a glove-like device, containing sensors. These sensors can detect the
finger and hand movements of the glove-wearer and map the movement to actions such as
grasping a virtual object, moving a virtual object, rotating a virtual object, and so on. The
position and orientation information of the finger or hand are obtained from electromagnetic
coupling of transmitter and receiver antennas. Each of the transmitting and receiving anten-
nas are constructed as a set of mutually perpendicular coils, thereby creating a 3D cartesian
reference frame.
Touch screens Touch input systems are the preferred mode of input in most consumer-
grade graphics systems nowadays. As the name suggests, on-screen elements are selected
and manipulated through the touch of fingers or stylus (a special pen-like device) in touch
screen systems. Touch input can be recorded using electrical, optical, or accoustic methods.
In optical touch screens, an array of infrared LEDs are placed along one vertical and one hor-
izontal edge. Light detectors are placed along the opposite horizontal and vertical edges. If
a position on the screen is touched, lights coming from the vertical and horizontal LEDs get
interrupted and recorded by the detectors, giving the touch location. In an electrical touch
screen, there are two transparent plates separated by a small distance. One plate is coated
with conducting material while the other is coated with resistive material. When the outer
plate is touched, it comes into contact with the inner plate. This creates a voltage drop across
the resistive plate, which is converted to the coordinate value. Less common are the acoustic
touch screens devices, in which high-frequency sound waves are generated in horizontal and
vertical directions across a glass plate. Touching the screen results in reflecting part of the
waves (from vertical and horizontal directions) back to the emitters. The touch position is
computed from the time interval between transmission of each wave and its reflection back
to the emitter.
Apart from these, many more techniques are used to input data to a graphics system.
These include image scanners (to store drawings or photographs in a computer), digitiz-
ers (used primarily for drawing, painting, or selecting positions), light pens (pencil shaped
device used primarily for selecting screen positions), and voice-based input system (using
speech-recognition technology).
i i
i i
i i
matrix-vector pair at a time, if we can apply the operation on all vectors at the same time,
there will be significant gain in performance. The gain becomes critical in real-time render-
ing, where millions of vertices need to be processed per second. CPUs, owing to their design,
cannot take advantage of this inherent parallelism in graphics operations. As a result, almost
all graphics systems nowadays come with a separate graphics card containing its own pro-
cessing unit and memory elements. The processing unit is known as the graphics processing
unit or GPU.
Instruction
Data
(a)
A0 + B0 = C0
A0 B0 C0
+ =
A1 + B1 = C1 A1 B1 C1
(b) (c)
Fig. 10.7 Single instruction multiple data (SIMD) (a) The idea (b) Serial additions performed
on inputs (c) Same output obtained with a single addition applied on data streams
i i
i i
i i
Host
Input assembler
Core Core
Global memory
Fig. 10.8 An illustrative GPU organization. Each core is a very simple processing unit
capable of performing simple floating point and integer arithmetic operations only.
Surfaces that are not expressed in terms of triangles, such as quadrilaterals or curved surface
patches (see Chapter 2), are converted to triangular meshes. Through the APIs supported in
a computer graphics library, such as OpenGL or Direct3D, the triangles are sent to the GPU
one vertex at a time. The GPU assembles vertices into triangles as needed.
The vertices are expressed with homogeneous coordinates (see Chapter 3). The objects
they define are represented in local or modeling coordinate system. After the vertices have
been sent to the GPU, it performs modeling transformations on these vertices. The transfor-
mation (single or composite), as you may recall, is acheived with a single matrix-vector
multiplication: the matrix represents the transformation while the vector represents the
vertex. The multicore GPU architecture can be used to perform multiple such operations
simultaneously. In other words, multiple vertices can be simultaneously transformed. The
output of this stage is a stream of triangles, all represented in a common (world) coordinate
system in which the viewer is located at the origin and the direction of view is aligned with
the z-axis.
In the third stage, the GPU computes the color of each vertex based on the light defined
for the scene. Recall the structure of the simple lighting equation we discussed in Chapter 4.
The color of any vertex can be computed by evaluating vector dot products and a series of
i i
i i
i i
add and multiply operations. In a GPU, we can perform these operations simultaneously for
multiple vertices.
In the next stage, each colored 3D vertex is projected onto the view plane. Similar to the
modeling transformations, the GPU does this using matrix-vector multiplication (see Chap-
ter 6 for the maths involved), again leveraging efficient vector operations in hardware. The
output after this stage is a stream of triangles in screen or device coordinates, ready to be
converted to pixels.
Each device space triangle, obtained in the previous stage, overlaps some pixels on the
screen. In the rasterization stage, these pixels are determined. GPU designers over the years
have incorporated many rasterization algorithms, such as those we discussed in Chapter 9.
All these algorithms exploit one crucial observation: each pixel can be treated indepen-
dently from all other pixels. This leads to the possibility of handling all pixels in parallel.
Thus, given the device space triangles, we can determine the color of the pixels for all pixels
simultaneously.
During the pixel processing stage, two more activities take place—surface texturing and
hidden surface removal. In the simplest surface texturing method, texture images are draped
over the geometry to give the illusion of detail (see Chapter 5). In other words, the pixel color
is replaced or modified by the texture color. GPUs store the textures in high-speed memory,
which each pixel calculation must access. Since this access is very regular in nature (nearby
pixels tend to access nearby texture image locations), specialized memory caches are used
to reduce memory access time. For hidden surface removal, GPUs implement the depth(Z)-
buffer algorithm. All modern-day GPUs contain a depth-buffer as a dedicated region of
its memory, which stores the distance of the viewer from each pixel. Before writing to the
display, the GPU compares a pixel’s distance with the distance of the pixel that is already
present. The display memory is updated only if the new pixel is closer (see the depth-buffer
algorithm in Chapter 8 for more details).
2D screen
User Primitives Geometry coordinates Pixel
program processing processing
Fig. 10.9 Schematic of a fixed-function GPU stages—the user has no control on how it
should work and what processing unit performs which stage of the pipeline
i i
i i
i i
3D geometric primitives
GPU
Hidden surface
Rasterization
removal
Fig. 10.10 The idea of programmable GPU. The GPU elements (processing units and
memory) can be reused through user programs.
With a programable GPU, it is possible for the programers to modify how the hardware
processes the vertices and shades pixels. They can do so by writing vertex shaders and frag-
ment shaders (also known as vertex programs and fragment programs). This is known as
shader programming (also known by many other names that include GPU programming and
graphics hardware programming). As the names suggest, vertex shaders are used to pro-
cess vertices (i.e., geometry)—modeling transformations, lighting, and projection to screen
coordinates. Fragment shaders are programs that perform the computations in the pixel pro-
cessing stage and determine how each pixel is shaded (rendering), how texture is applied
(texture mapping), and if a pixel should be drawn or not (hidden surface removal). The term
fragment shader is used to denote the fact that a GPU at any instant can process a subset
(or fragment) of all the screen pixel positions. These shader programs are small pieces of
codes that are sent to the graphics hardware from the user programs, but they are executed
on the graphics hardware. The ability to program GPUs gave rise to the idea of a general
purpose GPU or GPGPU; we can use the GPU to perform tasks that are not related to
graphics at all.
i i
i i
i i
designed to perform various tasks that are part of the graphics pipeline such as object defini-
tion, modeling transformation, color assignment, projection, and display. Examples of such
libraries include OpenGL (Open source Graphics Library), VRML (Virtual-Reality Mod-
eling Language), and Java 3D. The functions in a graphics library are also known as the
computer graphics application programming interface (CG API) since the library provides
a software interface between a programming language and the hardware. So when we write
an application program in C, the graphics library functions allow us to construct and display
a picture on an output device.
Graphics functions in any package are typically defined independent of any program-
ming language. A language binding is then defined for a particular high-level programming
language. This binding gives the syntax for accessing various graphics functions from that
language. Each language binding is designed to make the best use of the capabilities of
the particular language and to handle various syntax issues such as data types, parameter
passing, and errors. The specifications for language bindings are set by the International
Standards Organization. In the following, we learn the basic idea of a graphics library with
an introduction to OpenGL, a widely-used open source graphics library, with its C/C++
binding.
GLUT Library
The first thing we do is to include the header file containing the graphics library functions.
Thus, the very first line in our program is
#include<GL/glut.h>
i i
i i
i i
#include<GL/glut.h>
void init (void){
glClearColor (1.0, 1.0, 1.0, 0.0);
glMatrixMode (GL_PROJECTION)
gluOrhto2D (0.0, 800.0, 0.0, 600.0)
}
void createLine (void){
glClear (GL_COLOR_BUFFER_BIT);
glColor3f (0.0, 1.0, 0.0);
glBegin (GL_LINES);
glVertex2i (200, 100);
glVertex2i (20, 50);
glEnd ();
glFlush ();
}
void main (int argc, char** argv){
glutInit (& argc, argv);
glutInitDisplayMode (GLUT_SINGLE | GLUT_RGB);
glutInitWindowPosition (0, 0);
glutInitWindowSize (800, 600);
glutCreateWindow ("The OpenGL example");
init ();
glutDisplayFunc (createLine);
glutMainLoop ();
}
The OpenGL core library does not provide support for input and output, as the library
functions are designed to be device-independent. However, we have to show the line on the
display screen.Thus, auxiliary libraries are required for the output, on top of the core library,
which is provided in the GLUT or OpenGL Utility Toolkit library. GLUT provides a library
of functions for interacting with any screen-windowing system. In other words, the functions
in the GLUT library allow us to set up a display window on our video screen (a rectangu-
lar area on the screen showing the picture, in this case the line). The library functions are
prefixed with glut. Since GLUT functions provide interface to other device-specific window
systems, we can use GLUT to write device independent programs. Note that GLUT is suit-
able for graphics operations only. We may require to include other C/C++ header files such
as <stdio.h> or <stdlib.h> along with GLUT.
i i
i i
i i
For example, the following line of code written for the example program specifies that
a single refresh buffer is to be used for the display window and the RGB color mode is
to be used for selecting color values. Note the syntax used to represent symbolic GLUT
constants: the prefix GLUT is added followed by an underscore (‘_’) to each constant
name and is written in capital letters. The two constants are combined using a logical
OR operation.
glutInitDisplayMode (GLUT_SINGLE | GLUT_RGB);
Although GLUT provides for some default position and size of the display window, we
can change those. The following two lines in the example are used for the purpose. As the
names suggest, the glutInitWindowPosition function allows us to specify the window loca-
tion. This is done by specifying the top-left corner position of the window (supplied as
argument to the function). The position is specified in integer screen coordinates (the X and
Y pixel coordinates, in that order), assuming that the origin is in the top-left corner of the
screen. The glutInitWindowSize function is used to set the window size. The first argument
specifies the width of the window. The window height is specified with the second argument.
Both are specified in pixels.
glutInitWindowPosition (0, 0);
glutInitWindowSize (800, 600);
Next, we create the window and set a caption (optional) with the following function. The
argument of the function, that is, the string within the quotation, is the caption.
glutCreateWindow ("The OpenGL example");
Once the window is created, we need to specify the picture to be displayed in the
window. In our example, the picture is simply the line. We create this picture in a
separate function createLine, which contain OpenGL functions. The createLine func-
tion is passed as an argument to the glutDisplayFunc indicating that the line is to be
displayed on the window. However, before the picture is generated, certain initializa-
tions are required. We perform these initializations in the init function (to make our
code look nice and clean). Hence, the following sequence of lines are added to our
main program.
init ();
glutDisplayFunc (createLine);
The display window, however, is not yet on the screen. What we need is to activate it,
once the window content is decided. Thus, we add the following statement. This statement
activates all display windows we have created along with their graphic contents.
glutMainLoop ();
This function must be the last one in our program. Along with displaying the initial graph-
ics, it puts the program into an infinite loop. In this loop, the program waits for inputs from
devices such as mouse or keyboard. Even if no input is available (like in our example), the
loop ensures that the picture is displayed till we close the window.
i i
i i
i i
i i
i i
i i
glMatrixMode (GL_PROJECTION)
gluOrhto2D (0.0, 800.0, 0.0, 600.0)
Note that although the first function is an OpenGL routine (prefixed with gl), the sec-
ond function is pefixed with glu. This indicates that the second function is not part of the
core OpenGL library. Instead, it belongs to the GLU or the OpenGL Utility, an auxiliary
library that provides routines for complex tasks including setting up of viewing and projec-
tion matrices, describing complex objects with line and polygon approximations, processing
the surface-rendering operations and displaying splines with linear approximations. Together
the two functions specify that an orthogonal projection is to be used to map the line from the
view plane to the screen. The view plane window is specified in terms of its lower-left (0.0,
0.0) and top-right corners (800.0, 600.0). Anything outside this boundary will be clipped out.
The two line end points (vertices) are specified using the OpenGL function glVertex2i.
The suffix 2i indicates that the vertices are specified by two integer (i) values denoting their
X and Y coordinates. The first and second end points are determined depending on their
ordering in the code. Thus, in the example, the vertex (200, 100) is the first end point while
the vertex (20, 50) acts as the second line end point. The function glBegin with its symbolic
OpenGL constant GL_LINES along with the function glEnd indicate that the vertices are
line end points.
i i
i i
i i
With all these functions, the basic line creation program is ready. However, the functions
we used may be stored at different locations in the computer memory, depending on the
implementation of OpenGL. We need to force the system to process all these functions. This
we do with the following OpenGL function, which should be the last line of our picture
generation procedure.
glFlush ();
SUMMARY
In this chapter, we learnt about the underlying hardware in a graphics system. There are three
components of the hardware—the input devices, the output devices, and the display controller.
The most common graphics output devices are the video monitors. Various technologies are
used to design monitors. The earliest of those are the CRT, which we have already discussed in
Chapter 1. In this chapter, we learnt about flat panel displays. Broadly, they are of two types. In
the emissive displays, elctrical energy is converted to light energy, similar to a CRT. Examples
include plasma panels, LEDs, and thin-film electroluminescent displays. In non-emissive dis-
plays, external light energy is used to draw pictures on the screen. The most popular example
of such displays is the LCD. Hardcopy devices are another mode of producing graphics output.
Such devices are of two types: printers are used to produce any image including alphanumeric
characters on paper, while plotters are used for specific drawing purpose. Input devices are
mainly used for interactive graphics. Many such devices exist. Most common are the mouse
and keyboards. Other input devices include joystick, trackball, data gloves, and touch screens.
The display controller or the graphics card contains a special-purpose processor and a mem-
ory tailor-made for graphics. The processor is called the graphics processing unit or GPU. It
consists of a large number of simple processing units or cores, organized in the form of stream-
ing multiprocessors. The organization allows a GPU to perform parallel processing in SIMD
(single instruction multiple data) mode. Modern-day GPUs allow general-purpose programming
of its elements. This is done through shader programming. The vertex shaders are programs
that allow us to process vertices while the fragment shaders allow us to process pixels the way
we want.
We also got introduces to the graphics software. We learnt about the role played by graphics
libraries in the developement of a graphics program and learnt the basics of a popular graphics
library, the OpenGL, through an example line-drawing program.
BIBLIOGRAPHIC NOTE
Sherr [1993] contains more discussions on electronic displays. Tannas [1985] can be used for
further reading on flat-panel displays and CRTs. Raster graphics architecture is explained in
Foley et al. [1995]. Grotch [1983] presents the idea behind the 3D and stereoscopic displays.
Chung et al. [1989] contains work on head-mounted displays and virtual reality environments.
More on GPU along with examples on programming the vertex and fragment processors can
be found in the GPU Gems series of books (Fernando [2004], Pharr and Fernando [2005]).
For additional details on writing program using a shading language, refer to the OpenGLTM
Shading Language (Rost [2004]). The website www.gpgpu.org is a good source for more infor-
mation on GPGPU. A good starting point for learning OpenGL is the OpenGL Programming
Guide (Shreiner et al. [2004]).
i i
i i
i i
KEY TERMS
Computer graphics application programmers interface (CG API) – a set of library functions
to perform various graphics operations
Data glove – an input device, typically used with virtual reality systems, for positioning and
manipulation
Dot-matrix printer – a type of impact printer
Emissive display – a type of display that works based on the conversion of electric energy into
light on screen
Fixed-function hardware pipeline – all the pipeline stages are pre-programmed and embedded
into the hardware
Flat panel – a class of graphics display units
Fragment shaders (programs) – hardware programs to assign colors to pixels
GPU – the graphics processing unit, which is typically employed for graphics-related operations
Graphical kernel standard (GKS) – an early standard for graphics software
Impact printer – a printer that works by pressing character faces against inked ribbons on a paper
Joystick – an input device for positioning
Keyboard – an input device for characters
LCD – a type of flat panel non-emissive display
LED display – a type of flat panel emissive display
Mouse – an input device for pointing and selecting
Multicore – multiple processing units connected together
Non-emissive display – a type of display that works based on the conversion of light energy to
some onscreen graphical patterns
Non-impact printer – a printing device that uses non-impact methods such as lasers, ink sprays,
electrostatic, or electrothermal methods for printing
OpenGL – an open source graphics library, which has become the de-facto standard for graphics
software
PHIGS (Programmer’s Hierarchical Interactive Graphics Standard) – a standard for graphics
software
Plasma panel – a type of flat panel emissive display
Plotter – a type of hardcopy output device
Printer – a type of hardcopy output device.
Programmable GPU – a GPU where pipeline stages are not fixed and can be controlled
programmatically
Shader – a grid of GPU processors to perform specific stages of graphics pipeline
Shader programming/GPU programming/Graphics hardware programming – programs to
manipulate shaders
Spaceball – an input device for positioning
Stream processor – a processor that work on data streams
Streaming multiprocessor – a group of stream processors
Thin-film electroluminiscent display – a type of flat panel emissive display
Touch screen – gestural input systems
Trackball – an input device for positioning
Vertex shaders (programs) – hardware programs to process vertices
EXERCISES
10.1 What are the major components of a graphics system?
10.2 Discuss the difference between emissive and non-emissive displays.
i i
i i
i i
10.3 Explain, with illustrative diagrams, the working of the plasma, LED, and thin-film electrolu-
minescent displays?
10.4 How do LCDs work? Explain with schematic diagrams.
10.5 Mention any five input devices.
10.6 Why si GPU better suited for graphics operations than CPU? Discuss with a suitable
illustration.
10.7 Explain the implementation of 3D graphics pipeline on GPU.
10.8 Explain the concept of shader programming. Why is it useful?
10.9 Why do we need graphics standards? Mention any two standards used in computer
graphics.
10.10 In Fig. 10.11, the code for drawing a line segment on the screen is shown. Assume that
we have a square surface with four vertices. The line displayed on the screen is a specific
orthogonal view of the surface (may be top view). Modify the code to define the surface
and perform the specific projection. Modify the code further so that the Gouraud shading
is used to color the surface.
i i
i i
i i
APPENDIX
Mathematical
A Background
Various mathematical concepts are involved in understanding the theories and principles
of computer graphics. They include the idea of vectors and vector algebra, matrices and
matrix algebra, tensors, complex numbers and quaternions, parametric and non-parametric
representations and manipulations of curves, differential calculus, numerical methods, and
so on. In order to explain the fundamental concepts of graphics in this book, we used some
of those. The mathematics used in this book mostly involved vectors and matrices and how
those are manipulated. In addition, we also used concepts such as reference frames and line
equations for calculating intersection points between two lines. The backgrounds for these
mathematical concepts are discussed in this appendix.
i i
i i
i i
Y Origin (0, 0)
X
Ymax
X Ymax
Y
Origin (0, 0)
(a) (b)
Fig. A.1 Standard and inverted 2D Cartesian reference frames used in computer graphics
(a) Points are represented with respect to the origin in the lower-left corner (b) Points are
represented with respect to the origin at the upper-left corner
+Y axis
+X axis
+Z axis
Z direction. Hence the fingers are curling from the positive X direction to the positive
Y direction (through 90◦ ).
When a view of a 3D scene is displayed on a 2D screen, the 3D point for each 2D screen
position is sometimes represented with the left-handed reference frame shown in Fig. A.3.
Unlike the right-handed system, here we assume to grasp the Z axis with our left hand. Other
things remain the same. That is, the left-hand thumb points towards the positive Z direc-
tion and the left-hand fingers curl from the positive X direction to the positive Y direction.
+Z axis
+Y axis
i i
i i
i i
The XY plane represents the screen. Positive Z values indicate positions behind the screen.
Thus, the larger positive Z values indicate points further from the viewer.
VE = P2 − P1
= (x2 − x1 , y2 − y1 )
= (Vx , Vy )
The quantities Vx and Vy are the projections of the vector V onto the X - and Y -axis,
respectively. They are called the Cartesian components (or Cartesian elements). The mag-
nitude (denoted by |VE |) of the vector is computed in terms of these two components
as,
q
|VE | = Vx2 + Vy2
The direction can be specified in terms of the angular displacement α from the horizontal
as,
Vx
α = tan−1
Vy
The idea of a 3D vector is similar. Suppose we have two points P1 (x1 , y1 , z1 ) and
P2 (x2 , y2 , z2 ). We now have three Cartesian components instead of two for the three axes:
Vx = (x2 − x1 ), Vy = (y2 − y1 ), and Vz = (z2 − z1 ) for the X −, Y − and Z axis, respectively.
Then, the magnitude of the vector can be computed as,
q
|VE | = Vx2 + Vy2 + Vz2
The vector direction can be given in terms of the direction angles, that is, the angles α, β,
and γ the vector makes with each of the three axes (see Fig. A.5). More precisely, direction
Y
P2(x2, y2)
P1(x1, y1)
X
Fig. A.4 A 2D vector V defined in a Cartesian frame as the difference of two points
i i
i i
i i
+Y axis
b
a
+X axis
g
+Z axis
angles are the positive angles the vector makes with each of the positive coordinate axes.
The three direction angles can be computed as,
Vx
cos α =
|V |
Vy
cos β =
|V |
Vz
cos γ =
|V |
The values cosα, cosβ, and cosγ are known as the direction cosines of the vector. In fact,
we need to specify any two of the three cosines to find the direction of the vector. The third
cosine can be determined from the two since,
In many situations, we deal with unit vectors. It is a vector with magnitude 1. While we
usually denote a vector with an arrow on top, such as VE , a unit vector is denoted by putting
a hat on top of the vector symbol, such as V̂ . However, the most common notation for unit
vector is û. Calculation of the unit vector along the direction of a vector is easy. Suppose
VE = (Vx , Vy , Vz ) be the given vector. Then the unit vector V̂ along the direction of VE is
given by Eq. (A.2).
Vx Vy Vz
V̂ = , , (A.2)
|V | |V | |V |
q
where |V | = Vx2 + Vy2 + Vz2 is the magnitude of the vector.
i i
i i
i i
+Y axis +Y axis
V1 + V2
V2
V2
V1
V1
+X axis +X axis
(a) (b)
Fig. A.6 Illustration of vector addition (a) Original vectors. (b) VE2 repositioned to start where
VE1 ends
The direction and magnitude of the new vector is determined from its components as
before. The idea is illustrated in Fig. A.6 for 2D vector addition. Note that the second vector
starts at the tip of the first vector. The resulting vector starts at the start of the first vector and
ends at the tip of the second vector.
Addition of a vector with a scalar quantity is not defined, since a scalar quantity has only
magnitude without any direction. However, we can multiply a vector with a scalar value. We
do this by simply multiplying the scalar value to each of the components as follows:
where θ is the smaller of the two angles between the vector directions. We can also determine
the vector dot product in terms of their Cartesian components as,
Note that the dot product of a vector with itself produces the square of the vector magni-
tude. There are two important properties satisfied by the dot product. They are commutative.
In other words, VE1 .VE2 = VE2 .VE1 . Also, dot products are distributive with respect to vector
addition. Thus, VE1 .(VE2 + VE3 ) = VE1 .VE2 + VE1 .VE3 .
The cross product of two vectors is defined as,
In this expression, û is a unit vector (magnitude 1), which is perpendicular to both VE1 and
VE2 (Fig. A.7). The direction for û is determined by the right-hand rule: we grasp with our
i i
i i
i i
V1 × V2
V1
u
q
V2
right hand an axis that is perpendicular to the plane containing VE1 and VE2 such that the fingers
curl from VE1 to VE2 . Thus, the right thumb denotes the direction of û. The cross product can
also be expressed in terms of the Cartesian components of the constituent vectors as,
VE1 × VE2 = (V1y V2z − V1z V2y , V1z V2x − V1x V2z , V1x V2y − V1y V2x )
The cross product of two parallel vectors is zero. Therefore, the cross product of a vec-
tor with itself is zero. It is also not commutative since VE1 × VE2 = −(VE2 × VE1 ). However,
cross product of two vectors is distributive with respect to vector addition similar to the dot
product, that is, VE1 × (VE2 + VE3 ) = VE1 × VE2 + VE1 × VE3 .
i i
i i
i i
then
642
2M = 4 2 6
264
Two matrices can be added only if they both have the same number of rows and columns.
In order to add two matrices, we simply add their corresponding elements. Thus,
321 642 963
2 1 3 + 4 2 6 = 6 3 9
132 264 396
n
X
cij = aik bkj
k=1
i i
i i
i i
Higher-order determinants are obtained recursively from the lower-order determinant val-
ues. In order to calculate a determinant of order 2 or greater of an n × n matrix M, we select
any column k and compute the determinant as,
n
X
det M = (−1)j+k mjk det Mjk
j=1
where det Mjk is the (n − 1) by (n − 1) determinant of the submatrix obtained from M after
removing the jth row and kth column. Alternatively, we can select any row j and calculate
the determinant as,
n
X
det M = (−1)j+k mjk det Mjk
j=1
Efficient numerical methods exist to compute determinants for large matrices (n > 4).
MM−1 = M−1 M = I
where I is the identity matrix. Only the diagonal elements of I are 1 and all other elements
are zero.
We can calculate the elements of M−1 from the elements of M as,
where m−1 jk is the element of the jth row, kth column of the inverse matrix. Mkj is the (n − 1)
by (n − 1) submtarix obtained by deleting the kth row, jth column of M. Usually, more
efficient numerical methods are employed to compute the inverse of large matrices.
i i
i i
i i
Let us illustrate this with the previous example. We have the point-slope form y − 3 =
1(x − 2). After expnading, we get y − 3 = x − 2. We rearrange the terms to get y = x − 2 + 3
or y = x + 1. This is the standard form.
Now suppose we are given two line segments: L1 (y = m1 x + c1 ) and L2 (y = m2 x + c2 ).
If the lines intersect, there must be a common point (x, y), which lies on both the lines.
Therefore, the following relation must hold.
m1 x + c1 = m2 x + c2
i i
i i
i i
f (x, y, z) = 0 (A.5)
For any point P(x, y, z) on the surface, Eq. A.5 should evaluate to zero, that is, f (p) = 0.
For points that are not on the surface, Eq. A.5 returns some non-zero value. However, for
the purpose of this book, we shall restrict ourselves to the discussion of plane surfaces only,
rather than any arbitrary curved surface. This is so since we mostly considered objects with
polygonal surfaces. We also know that any surface can be represented as a mesh of polygonal
surfaces.
The most familiar way to represent a planar surface if the point-normal form, shown in
Eq. A.6.
Ax + By + Cz + D = 0 (A.6)
Equation A.6 is also known as the general form of a plane equation. Let nE be the normal
to the plane, that is, a vector perpendicular to the planar surface. Then, the constants A, B,
and C in the plane equation (Eq. A.6) denote the corresponding Cartesian components nE. In
other words, nE = (a, b, c).
Sometimes, we want to derive a plane equation given three points on the plane (say a, b,
and c). Each of these points can be represented as a point vector, that is, a vector from the
origin to the point. Thus, we can form three point vectors aE, b, E and Ec. From the three point
vectors, we can derive two vectors that lie on the plane. For example, the vectors (bE − aE) and
(Ec − aE) are two vectors on the plane.
Since the two vectors are on the plane, we can take their cross-product to obtain a vector
that is perpendicular to the two vectors, that is, the plane itself. Since the vector is perpen-
dicular to the plane, it is the normal vector. Thus, nE = (bE − aE) × (Ec − aE). Once we know nE,
we have the three constants A, B, and C. We then use Eq. A.6 by putting any one of the three
points in the equation to obtain the value of D.
Let us illustrate the idea with an example. Suppose we are given the three points a(1, 1, 1),
b(3, 4, 5), and c(7, 7, 7). Therefore, the three point vectors are,
aE = (1, 1, 1)
bE = (3, 4, 5)
Ec = (7, 7, 7)
i i
i i
i i
From these three point vectors, we form two vectors that lie on the plane, as follows.
The cross-product of these two vectors yield (see Section A.2 for details),
Therefore, the normal to the plane is nE = (−6, 12, −6). Thus, the three plane constants
are A = −6, B = 12, C = −6. Replacing these values in Eq. A.6, we get,
−6x + 12y − 6z + D = 0
Now let us take any one of the three points, say b(3, 4, 5). Since the point is on the plane,
we should have
i i
i i
i i
APPENDIX
Ray-tracing
C Method for
Surface
Rendering
In Chapter 4, we learnt the simple lighting model for computing color at surface points. As
we discussed, the model is based on many simplistic assumptions. For example, the assump-
tion that all surfaces are ideal reflectors, light path does not shift during refraction, or the
ambient lighting effect can be modeled by a single number. Consequently, the model is inca-
pable of producing realistic effects. What we require is a global illumination model, that is,
a model of illumination that takes into account all the reflections and refractions that affect
the color at any particular surface point. In this chapter, we shall learn about one such model
known as ray tracing. Obviously, the computational cost for implementing this model is
much higher.
Towards source
Fig. C.1 The basic idea of ray tracing. Each original ray passing through a pixel towards the
viewer is traced backward (from viewer to source).
i i
i i
i i
R1 T1
S1 S2
S4 T2
R4
R3
R2 R3
R2 S3
S2
S1
S3
R1(Refracted ray) T1(Refracted ray) T2
S R4
Primary ray S4
(a) (b)
Fig. C.2 The construction of a binary ray-tracing tree. In (a), we show the backward tracing
of the ray path through a scene having five surfaces. The corresponding ray-tracing tree is
shown in (b).
i i
i i
i i
figure shows, at each step of the recursive process, at most two new nodes and edges are
added to the tree. We can set the maximum depth of the tree as a user option, depending
on the available storage. The recursive process stops if any of the following conditions is
satisfied:
1. The current primary ray intersects no surface in the list.
2. The current primary ray intersects a light source, which is not a reflecting
surface.
3. The ray-tracing tree has reached its maximum depth.
At each ray–surface intersection point, we compute the intensity using a lighting model.
There are the following three components in the intensity.
Reflected contribution The intensity contribution due to the light that comes after
reflection from other surfaces
Transmitted contribution The intensity contribution due to the light that comes after
transmitting through the surface from the background
Thus, the total light intensity at any surface point I can be computed as,
I = Il + Ir + It (C.1)
In the equation, Il is the local contribution. We can use the simple lighting model dis-
cussed in Chapter 4 to compute it. For this calculation, we require the three vectors N E
E
(the surface normal at the point), V (the vector from the surface point to the viewer, i.e.,
along the opposite direction of the primary ray), and L E (the vector from the point to the
light source). In order to determine L E , we send a ray from the intersection point towards
the light source. This ray is called the shadow ray. We check if the shadow ray inter-
sects any surface in its path. If it does, the intersection point is in shadow with respect
to the light source. Hence, we do not require calculation of actual intensity due to the
light source. Instead, we can apply some technique (e.g., ambient light/texture pattern,
see Chapter 4) to create the shadow effect. The other two components in the Eq. C.1 (Ir
and It ) are calculated recursively using the steps mentioned before. The intensity value
at each intersection point is stored at the corresponding surface node position of the
ray-tracing tree.
Once the tree is complete for a pixel, we accumulate all the intensity contributions start-
ing at the leaf nodes. Surface intensity from each node is attenuated (see Chapter 4) by the
distance from the parent node and then added to the intensity of the parent surface. This
bottom-up procedure continues till we reach the root node. The root node intensity is set
as the pixel intensity. For some pixel rays, it is possible that the ray does not intersect any
surface. In that case, we assign background color to the pixel. It is also possible that, instead
of any surface, the ray intersects a (non-reflecting) light source. The light source intensity is
then assigned to the pixel.
i i
i i
i i
E − Pr
Px E
d̂ = (C.3)
E − Pr|
|Px E
The various vectors for perspective projection is shown in Fig. C.3 for illustration. In
order to determine the ray–surface intersection point, we simultaneously solve the ray equa-
tion and the surface equation. This gives us a value for the parameter t from which the
intersection coordinates are determined. At each intersection point, we update Es and d̂ for
each of the secondary rays. The new Es is the vector from origin to the intersection point. For
the reflected ray, the unit vector d̂ is calculated along the specular reflection direction. For
the ray due to refraction, d̂ is determined along the refraction path.
Let us illustrate the ray–surface intersection calculation in terms of an example—
intersection of a ray with a sperical surface. Assume that the sphere center is at the point pc
Vector d
Y
Point of
projection Pixel
Vector s
Z
X
i i
i i
i i
and the length of its radius is r. Then, for any point p on the surface, the surface can be
represented with the equation,
|Ep − pEc | = r (C.4)
If the ray intersects the surface, there should be a common surface point. Thus, we can
replace the point in the surface equation with the corresponding ray equation.
where, A = |d̂|2 = 1, B = (Es − pEc ).d̂, and C = |Es − pEc |2 − r2 . Depending on the values of
A, B and C, we have the following scenarios:
1. If B2 − A.C < 0, there is no intersection between the surface and the ray.
2. If B2 − A.C = 0, the ray touches the surface. Clearly, there will be no secondary rays
generated in this case.
3. If B2 − A.C > 0, the ray intersects the surface. There are two possible parameter values
as per Eq. C.7.
(a) If both values are negative, there is no ray–surface intersection.
(b) If one of the values is zero and the other positive, the ray originates on the sphere and
intersects it. Usually in graphics, we are not interested in modelling such cases.
(c) If the two values differ in sign, the ray originates inside the sphere and inter-
sects it. This is again a mathematical possibility, which is usually not considered in
graphics.
(d) If both are positive, the ray intersects the sphere twice (enter and exit). The
smaller value corresponds to the intersection point that is closer to the starting
point of the ray. Thus, we take the smaller value to determine the intersection
coordinates.
i i
i i
i i
Various techniques are used in the ray-tracing method to speed up the surface rendering
process. We can broadly divide these techniques into the following two groups:
1. Bounding volume techniques
2. Spatial subdivision methods
Uniform subdivision At each subdivision step, we divide the current cell into eight
equal-sized octants.
Adaptive subdivision At each subdivision step, we divide the current cell only if it contains
surfaces.
The recursion continues till each cell contains no more than a predefined num-
ber of surfaces. The process is similar to the space subdivision method we dis-
cussed in Chapter 2. We can use octrees to store the subdivision. Along with
the subdivision information, information about surfaces stored in the cells are also
maintained.
First, we check for intersection of the ray with the outermost cube. Once an intersection
is detected, we check for intersection of the ray with the inner cubes (next level of subdivi-
sion) and continue in this way till we reach the final level of subdivision. We perform this
check for only those cells that contain surfaces. In the final level of subdivision, we check
for intersection of the ray with the surfaces. The first surface intersected by the ray is the
visible surface.
i i
i i
i i
We need some means (anti-aliasing techniques) to eliminate or reduce the effect of aliasing.
There are broadly two ways of antialiasing in ray tracing.
1. Supersampling
2. Adaptive sampling
In supersampling, each pixel is assumed to represent a finite region. The pixel region is
divided into subregions (subpixels). Instead of a single pixel ray, we now generate pixel rays
for each of these subregions and perform ray tracing. The pixel color is computed as the
average of the color values returned by all the subpixel rays.
The basic idea behind the adaptive sampling is as follows: we start with five pixel rays
for each pixel instead of one as before. Among these five, one ray is sent through the pixel
center and the remaining four rays are sent through the four corners (assuming each pixel is
represented by a square/rectangular region) of the pixel. Then we perform color computation
using the ray-tracing method for each of these five rays. If the color values returned by them
are similar, we do not divide the pixel further. However, in case the five rays return dissim-
ilar color values, we divide the pixel into a 2 × 2 subpixel grid. We then repeat the process
for each subpixel in the grid. The process terminates when a preset level of subdivision is
reached.
i i
i i