Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering
Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering
OF ELECTRICAL
AND MECHANICAL ENGINEERING
Computer Organization
Assignment 2
GRAPHICS PROCESSING
UNIT
Submitted by:
Warda Ahmed
NS 7800
D-CE-37
Syndicate B
Computer Organization
Assignment - 2
Contents
Table of Figures........................................................................................ 2
List of Acronyms....................................................................................... 2
Abstract................................................................................................... 3
Processors............................................................................................... 4
Graphics Processing Unit (GPU).................................................................4
I. What is a GPU?......................................................................................4
II.Uses.....................................................................................................5
III....................................................................................................... GPU Manufacturers
5
IV......................................................................................................... Evolution of GPUS
5
1.
2.
3.
4.
V. Features...............................................................................................9
1. Memory Features.................................................................................................................. 10
VII. Types..............................................................................................14
1.
2.
3.
4.
5.
Conclusion............................................................................................. 19
References............................................................................................. 20
Page | 1
D-CE-37 (B)
Computer Organization
Assignment - 2
Table of Figures
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
List of Acronyms
1. CPU Central Processing Unit
2. GPU Graphics Processing Unit
3. VPU Visual Processing Unit
4. RAM Random Access Memory
5. GIS Geographic Information System
6. CAD Computer Aided Design
7. AGP Accelerated Graphics Port
8. PCI Peripheral Component Interconnect
9. TIGA Texas Instruments Graphics Architecture
10.
PGA Professional Graphics Controller
11.
SGI Silicon Graphics Inc.
12.
API Application Programming Interface
13.
Monochrome and Color Display Adapter (MDA/CDA)
14.
Inverse discrete cosine transform (iDCT)
15.
iMDCT Inverse modified discrete cosine transform
16.
IQ Inverse quantization
17.
VLD Variable-length decoding
18.
IGP Integrated graphics processors
Page | 2
D-CE-37 (B)
Computer Organization
Assignment - 2
Abstract
Processors are the central component of computers that carry out all the
computers work. One essential type of processor in the majority of modern computers
and electronic devices today is the graphics processing unit. It is mainly used for graphics
processing and 3D design. Leading GPU manufacturers are NVIDIA and AMD.
Modern GPU processors are parallel, and are fully programmable. NVIDIA developed
the first GPU. GPU architecture shifted from fixed graphics pipeline to programmable to a
unified shader model. A GPU has a lot more cores than a CPU. The different types of GPU
are dedicated, integrated, general purpose and integrated. It accelerates computing and
is a necessity for optimum computer performance.
Page | 3
D-CE-37 (B)
Computer Organization
Assignment - 2
Processors
The word processor is short for microprocessor or central processing unit (CPU). The
processor in a personal computer or embedded in small devices is often called a
microprocessor.
A processor is a small chip made of silicon and is a central
component of computers and other electronic devices. It is the logic
circuitry that responds to and processes the basic instructions that
drive a computer. The main job of a processor is to receive input
through input devices, analyze and process the input commands,
and produce an appropriate output. Modern processors can compute
trillions of calculations per second.
Figure 1: GeForce
6600GT (NV43)
GPU
Page | 4
D-CE-37 (B)
Computer Organization
Assignment - 2
II. Uses
GPUs are found in a wide range of systems, including embedded systems, cell
phones, personal computers, workstations and game consoles, and supercomputers. It is
placed in a video card in desktop computers and integrated into the motherboard of
mobile devices.
Most GPUs use their transistors for 3-D computer graphics. However, some have
accelerated memory for mapping vertices, such as geographic information system (GIS)
applications. Some of the more modern GPU technology supports programmable shaders
implementing textures, mathematical vertices and accurate color formats. Applications
such as computer-aided design (CAD) can process over 200 billion operations per second
and deliver up to 17 million polygons per second. Many scientists and engineers use GPUs
for more in-depth calculated studies utilizing vector and matrix features. [1]
Page | 5
D-CE-37 (B)
Computer Organization
Assignment - 2
The transistor count trends for some GPUs is shown in the following figure:
1.
3D graphics started with early display controllers, known as video shifters and video
address generators. They acted as a pass-through between the main processor and the
display. The incoming data stream was converted into serial bitmapped video output such
as luminance, color, as well as vertical and horizontal composite sync, which kept the line
of pixels in a display generation and synchronized each successive line along with the
blanking interval (the time between ending one scan line and starting the next). Arcade
system boards have been using specialized graphics chips since the 1970s. In early video
game hardware the RAM for frame buffers was too expensive, so video chips composited
data together as the display was being scanned out on the monitor.
A flurry of designs arrived in the latter half of the 1970s, laying the foundation for 3D
graphics as we know them. [4]
Fujitsu's MB14241 video shifter was used to accelerate the drawing of sprite graphics for
various 1970s arcade games from Taito and Midway, such as Gun Fight (1975), Sea
Wolf (1976) and Space Invaders (1978). [5] The Namco Galaxian arcade system in 1979
used specialized graphics hardware supporting RGB color, multi-colored sprites
and tilemap backgrounds. The Galaxian hardware was widely used during the golden age
of
arcade
video
games,
by
game
companies
such
as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu, Sega and Taito.
Page | 6
D-CE-37 (B)
Computer Organization
Assignment - 2
RCAs Pixie video chip (CDP1861) in 1976 could output a NTSC compatible video signal
at 62x128 resolution, or 64x32 for the RCA Studio II console.
2.
1980s
In the early 1980's, "GPUs" were integrated frame buffers. They were boards of TTL logic
chips that relied on the CPU, and could only draw wire-frame shapes to raster displays
[10].
The Williams Electronics arcade games Robotron: 2084 , Joust, Sinistar, and Bubbles, all
released in 1982, contain custom blitter chips for operating on 16-color bitmaps. [11]
In 1985, the Commodore Amiga featured a custom graphics chip, supporting line draw,
area fill and a blitter unit which accelerated manipulation of bitmaps.
In 1986, Texas Instruments released the TMS34010, the first microprocessor with on-chip
graphics capabilities. It could run general-purpose code, but it had a very graphicsoriented instruction set. In 1990-1991, this chip would become the basis of the Texas
Instruments Graphics Architecture ("TIGA") Windows accelerator cards.
One of the very first 2D/3D video cards for the PC was the IBM Professional Graphics
Controller (PGA). The PGA used an on-board Intel 8088 microprocessor to take over
processing all video related tasks, freeing up the CPU for video processing (such as
drawing and coloring filled polygons). Though it was released in 1984, 10 years before
hardware 2D/3D acceleration was standardized, its high cost and incompatibility with
many programs and non-IBM systems made it unable to achieve mass-market success.
The PGA's separate on-board processor marked an important step in GPU evolution to
further the paradigm of using a separate processor for graphics computations [12].
By 1987, more features were being added to early GPUs, such as Shaded Solids, Vertex
lighting, Rasterization of filled polygons, and Pixel depth buffer, and color blending. There
was still much reliance on sharing computation with the CPU [10]. In the late 1980's,
Page | 7
D-CE-37 (B)
Computer Organization
Assignment - 2
Silicon Graphics Inc. (SGI) emerged as a high performance computer graphics hardware
and software company. With the introduction of OpenGL in 1989, SGI created and released
the graphics industry's most widely used and supported, platform independent, 2D/3D
application programming interface (API). OpenGL support has also become an intricate
part of the design of modern graphics hardware. SGI also pioneered the concept of the
graphics pipeline early on [12].
3.
1990s
Launched on November 1996, 3Dfx's Voodoo graphics consisted of a 3D-only card that
required a VGA cable pass-through from a separate 2D card to the Voodoo, which then
connected to the display.
In March of 1996 15 titles with Voodoo support debuted in E3 with wholly new levels of
visual quality. The difference in video quality can be seen in a demo of the popular game
Quake, as shown in the following figure. The left side shows ordinary low resolution
graphics, while the right side shows the improved resolution of the 3dfx on OpenGL.
3dfx planned to build high end 3D gaming board capable to deliver smooth gameplay at
640x480 resolution with bilineary filtered textures.
Voodoo was used in arcade machines and through Quantum's multichip boards, had
professional promise as well. It focused on raw power in fundamental 3d operations. 3dfx
cut the right corners of pipeline, reducing gate count without much impact on image
quality. Voodoo was easy to program and hard to slow down.
Page | 8
D-CE-37 (B)
Computer Organization
Assignment - 2
3dfxs technology became the forerunner of many image quality enhancements seen
today, like soft shadows and reflections, motion blur, as well as depth of field blurring.
4.
The first company to develop the worlds first commercial GPU is NVIDIA Inc. in 1999 . The
GeForce 256 GPU was capable of billions of calculations per second, can process a
minimum of 10 million polygons per second, and has over 22 million transistors,
compared to the 9 million found on the Pentium III. Its workstation version called the
Quadro, designed for CAD applications, can process over 200 billion operations a second
and deliver up to 17 million triangles per second. [13] It was a single-chip processor with
integrated transform, drawing and BitBLT support, lighting effects, triangle setup/clipping
and rendering engines. NVIDIAs rival company ATI Technologies came up with the name
VPU or visual processing unit when they released the Radeon 9700 in 2002.
Fairly early on in the GPU market, there was a severe narrowing of competition. Early
leading companies were Silicon Graphics International, 3dfx, NVIDIA, ATI and Matrox,
when GPUs were a new concept. Now only AMD and NVIDIA are GPU manufacturing
giants.
Since their inception, GPUs have gradually become more powerful, programmable, and
general purpose with programmable geometry, vertex and pixel processors, Unified
Shader Model, Expanding instruction set and CUDA, OpenCL. [14] OpenCL is an open
standard defined by the Khronos Group which allows for the development of code for both
GPUs and CPUs with an emphasis on portability. OpenCL solutions are supported by Intel,
AMD, Nvidia, and ARM, and according to a recent report by Evan's Data, OpenCL is the
GPGPU development platform most widely used by developers in both the US and Asia
Pacific.
Nvidia Kepler:
A graphical processing unit that holds the distinction of being the first GPU designed for
the cloud. Graphics cards powered by Nvidia Kepler processors are tuned to efficiently
serve virtualized desktops, providing auto-scaling to the necessary performance level.
V. Features
GPU features include
Texture mapping
Page | 9
D-CE-37 (B)
Computer Organization
Assignment - 2
Rendering polygons
Hardware overlays
MPEG decoding
More recent graphics cards decode high-definition video on the card, offloading the
central processing unit. The video decoding processes that can be accelerated by today's
modern GPU hardware are:
Intra-frame prediction
1.
Memory Features
The only two types of memory that actually reside on the GPU chip are register and
shared memory. Local, Global, Constant, and Texture memory all reside off chip.
Local, Constant, and Texture are all cached.
While it would seem that the fastest memory is the best, the other two
characteristics of the memory that dictate how that type of memory should be
utilized are the scope and lifetime of the memory:
Data stored in register memory is visible only to the thread that wrote it and lasts
only for the lifetime of that thread.
Local memory has the same scope rules as register memory, but performs slower.
Data stored in shared memory is visible to all threads within that block and lasts
for the duration of the block. This is invaluable because this type of memory allows
for threads to communicate and share data between one another.
P a g e | 10
D-CE-37 (B)
Computer Organization
Assignment - 2
Data stored in global memory is visible to all threads within the application
(including the host), and lasts for the duration of the host allocation.
Constant and texture memory wont be used here because they are beneficial for
only very specific types of applications. Constant memory is used for data that
will not change over the course of a kernel execution and is read only. Using
constant rather than global memory can reduce the required memory bandwidth,
however, this performance gain can only be realized when a warp of threads read
the same location.Similar to constant memory, texture memory is another variety
of read-only memory on the device. When all reads in a warp are physically
adjacent, using texture memory can reduce memory traffic and increase
performance compared to global memory
GPU clock or Engine clock is the graphics processor unit's clock speed, measured in
megahertz (MHz).
1.
Graphics pipeline
P a g e | 11
D-CE-37 (B)
Computer Organization
Assignment - 2
The various stages in the typical pipeline of a modern GPU (also seen in figure 4.5) :
Bus interface/Front End
Interface to the system to send and receive data and commands.
Vertex Processing
Converts each vertex into a 2D screen position, and lighting may be applied to determine
its color. A programmable vertex shader enables the application to perform custom
transformations for effects such as warping or deformations of a shape.
Clipping
This removes the parts of the image that are not visible in the 2D screen view such as the
backsides of objects or areas that the application or window system covers.
Primitive Assembly, Triangle Setup
Vertices are collected and converted into triangles. Information is generated that will
P a g e | 12
D-CE-37 (B)
Computer Organization
Assignment - 2
allow later stages to accurately generate the attributes of every pixel associated with the
triangle.
Rasterization
The triangles are filled with pixels known as "fragments," which may or may not wind up
in the frame buffer if there is no change to that pixel or if it winds up being hidden.
Occlusion Culling
Removes pixels that are hidden (occluded) by other objects in the scene.
Parameter Interpolation
The values for each pixel that were rasterized are computed, based on color, fog, texture,
etc.
Pixel Shader
This stage adds textures and final colors to the fragments. Also called a "fragment
shader," a programmable pixel shader enables the application to combine a pixel's
attributes, such as color, depth and position on screen, with textures in a user-defined
way to generate custom shading effects.
Pixel Engines
Mathematically combine the final fragment color, its coverage and degree of transparency
with the existing data stored at the associated 2D location in the frame buffer to produce
the final color for the pixel to be stored at that location. Output is a depth (Z) value for the
pixel.
Frame Buffer Controller
The frame buffer controller interfaces to the physical memory used to hold the actual
pixel values displayed on screen. The frame buffer memory is also often used to store
graphics commands, textures as well as other attributes associated with each pixel.
2.
Until recently, the process of generating computer graphics was referred to as the
graphics pipeline. But that just wasnt cutting it for sophisticated effects like water and
smoke. Overtime, the process has been taken over by more flexible shaders, and now
uses universal shaders able to perform tasks. The graphic rendering mechanisms shifted
from fixed graphic pipelines to programmable graphic pipelines, to unified shader models.
P a g e | 13
D-CE-37 (B)
Computer Organization
Assignment - 2
Fixed-function meant that the developer could not configure the functions the FFPs
performed. Parameters like the colours of objects, etc could be changed, but the functions
themselves remained. The game logic, textures and triangles were sent to the GPU, which
would take care of all the heavy processing. The processing is visualized step by step in
the following figure. Each step is explained in the previous topic (Graphics pipeline).
PROS:
The hardware was wired and narrowly specialized to perform standard operations
on data. This made it much faster than the processor performing the same tasks.
It had new features like multiple blending modes, per-vertex Gouraud shading, fog
effects, stencil buffers (for shadow volumes), etc.
CONS:
P a g e | 14
D-CE-37 (B)
Computer Organization
Assignment - 2
FFP was limited by the amount of functions it could perform. There was no variation
or flexibility and thus no realistic graphics could be visualized.
It was impossible to go back in the stages of the pipeline to make changes as
required. For example, transparent objects like water or smoke tended to look solid,
or flicker in and out. To counter this opacity, the opaque surface was animated to
flicker.
If the graphics pipelines hardware wasnt matched perfectly to the processing
needs of the task, some of it sat idle. And since the images that need to be
displayed are very different, the match was never perfect.
2.
PROGRAMMABLE SHADERS (Separated Shader
Architecture)
In order to provide more sophisticated graphics to users, manufacturers started making
the fixed function hardware at each stage of the pipeline more flexible. Some of them
became known as shaders, and they eventually became flexible enough to overcome
most of the difficulties caused by a linear pipeline. The flow of process of programmable
shaders can be seen in the following diagram;
PROBLEMS:
While one part of the FFP was fixed, the other problem remained (i.e. part of the
pipeline doing nothing and sitting idle). The shaders were of three types; Vertex
shaders would construct the 3D model and light the vectors making it up.
P a g e | 15
D-CE-37 (B)
Computer Organization
Assignment - 2
Geometry shaders would make the lines into surfaces. Pixel shaders would
apply the textures and other effects. But one shader could only do one type of task
while the other two shaders were idle.
3.
UNIFIED SHADERS
The specialized logic like vertex shaders, pixel shaders and hardwired algorithms were
replaced with many copies of one unified CPU design. Shaders are now made so that they
are no longer confined to a certain task. There are no more vertex, geometry, and pixel
shaders: just shaders. A unified shader can do any of the three kinds of work, so it can do
whatever needs doing instead of waiting for work it can do to come in.
Figure 4.8 shows a nonunified architecture versus a unified shader architecture. The
advantage of unified approach is that one can have several shader cores and use them
for any type of shader (IIV in this example). This gives better load balancing. IB and OB
are input and output buffers.
Unified Shader Architecture allows more flexible use of the graphics rendering hardware
[15]. For example, in a situation with a heavy geometry workload the system could
P a g e | 16
D-CE-37 (B)
Computer Organization
Assignment - 2
allocate most computing units to run vertex and geometry shaders. In cases with less
vertex workload and heavy pixel load, more computing units could be allocated to run
pixel shaders.
Most graphics hardware currently uses DirectX to communicate with the applications
being run. It is an Application Programming Interface, or API, that programmers use to get
their software to use hardware effectively. Microsoft tweaked it over time, and Direct X 10
implemented a unified shader instruction set.
That means that software for different kinds of shaders could be written in a more similar
manner, making the programmers job easier. In an uncommon piece of hardware and
software changing at the same time to benefit from the changes in the other, ATI and
Nvidia both started making GPUs with unified shaders. [16]
The unified shading architecture was introduced with the Nvidia GeForce 8
series, ATI Radeon HD 2000, S3 Chrome 400, Intel GMA X3000 series, Xbox 360's
GPU, Qualcomm Adreno 200 series,PowerVR SGX GPUs and is used in all subsequent
series. OpenGL 3.3 (which offers a unified shader model) can still be implemented on
hardware that does not have unified shading architecture.
3.
A simple way to understand the difference between a CPU and GPU is to compare how
they process tasks. A CPU consists of a few cores optimized for sequential serial
processing while a GPU has a massively parallel architecture consisting of thousands of
smaller, more efficient cores designed for handling multiple tasks simultaneously. Figure 5
shows the difference between their cores.
P a g e | 17
D-CE-37 (B)
Computer Organization
Assignment - 2
The amount of cores that GPUs have depends on the manufacturer. nVidia graphics
solutions tend to pack more power into fewer chips, while AMD solutions pack in more
cores to increase processing power. Typical high-end graphics cards have 68 cores if its
nVidia, and ~1500 cores if its AMD.
P a g e | 18
D-CE-37 (B)
Computer Organization
VII.
Assignment - 2
Types
GPUs come in different shapes and forms, such as dedicated cards which you can plug
into your desktops PCI-Express slot, to graphical chips called integrated graphics chips,
which are built directly into the motherboard the backbone component of your system.
2.
P a g e | 19
D-CE-37 (B)
Computer Organization
Assignment - 2
core. This bandwidth is what is referred to as the memory bus and can be performance
limiting. Older integrated graphics chipsets lacked hardware transform and lighting, but
newer ones include it [18].
3.
Hybrid solutions
This newer class of GPUs competes with integrated graphics in the low-end desktop and
notebook markets. The most common implementations of this are ATI's HyperMemory and
Nvidia's TurboCache.
Hybrid graphics cards are somewhat more expensive than integrated graphics, but much
less expensive than dedicated graphics cards. These share memory with the system and
have a small dedicated memory cache, to make up for the high latency of the system
RAM. Technologies within PCI Express can make this possible. While these solutions are
sometimes advertised as having as much as 768MB of RAM, this refers to how much can
be shared with the system memory.
4.
Stream Processing and General Purpose
GPUs (GPGPU)
It is becoming increasingly common to use a general purpose graphics processing unit
(GPGPU) as a modified form of stream processor. This concept turns the massive
computational power of a modern graphics accelerator's shader pipeline into generalpurpose computing power, as opposed to being hard wired solely to do graphical
operations. In certain applications requiring massive vector operations, this can yield
several orders of magnitude higher performance than a conventional CPU. The two largest
discrete GPU designers, ATI and Nvidia, are beginning to pursue this approach with an
array of applications.
GPGPU can be used for many types of parallel tasks including ray tracing. They are
generally suited to high-throughput type computations that exhibit data-parallelism to
exploit the wide vector width SIMD architecture of the GPU.
5.
An external GPU is a graphics processor located outside of the housing of the computer.
External graphics processors are often used with laptop computers. Laptops might have a
substantial amount of RAM and a sufficiently powerful central processing unit (CPU), but
often lack a powerful graphics processor (and instead have a less powerful but more
energy-efficient on-board graphics chip). On-board graphics chips are often not powerful
enough for playing the latest games, or for other tasks.
P a g e | 20
D-CE-37 (B)
Computer Organization
VIII.
Assignment - 2
GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a
CPU to accelerate scientific, analytics, engineering, consumer, and enterprise
applications. Pioneered in 2007 by NVIDIA, GPU accelerators now power energy-efficient
datacenters in government labs, universities, enterprises, and small-and-medium
businesses around the world. GPUs are accelerating applications in platforms ranging
from cars, to mobile phones and tablets, to drones and robots.
GPU-accelerated computing offers unprecedented application performance by offloading compute-intensive
portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's
perspective, applications simply run significantly faster. This basic process can be seen in figure 6:
P a g e | 21
D-CE-37 (B)
Computer Organization
IX.
Assignment - 2
Advantages
A multi-GPU system provides more than just performance gains. It also gives you
the freedom to run your applications with full features and effects enabled. Figure 7
shows the increased performance of applications with the increased GPU usage.
P a g e | 22
D-CE-37 (B)
Computer Organization
Assignment - 2
Modern GPUs can use programmable shading to achieve near-cinematic realism, as figure
9 shows, featuring actress Adrianne Curry on an NVIDIA GeForce 8800 GTX.
Conclusion
GPUs became more popular as the demand for graphic applications increased. Eventually,
they became not just an enhancement but a necessity for optimum performance of a PC.
Specialized logic chips now allow fast graphic and video implementations. Generally the
GPU is connected to the CPU and is completely separate from the motherboard. The
P a g e | 23
D-CE-37 (B)
Computer Organization
Assignment - 2
random access memory (RAM) is connected through the accelerated graphics port (AGP)
or the peripheral component interconnect express (PCI-Express) bus. Some GPUs are
integrated into the northbridge on the motherboard and use the main memory as a digital
storage area, but these GPUs are slower and have poorer performance.
They are a central component in devices in this age, without which it would be impossible
to perform graphically intensive tasks like video-encoding, decoding, graphic editing,
gaming, etc.
References
[1] techopedia.
[2] ""GPU sales strong as AMD gains market share"," techreport.com..
[3] C. McClanahan, "History and Evolution of GPU Architecture," Georgia Tech.
[4] G. Singer, "The History of the modern graphics processor".
[5] "Arcade/SpaceInvaders," Computer Archeology.
[6] A. Springmann, " "Atari 2600 Teardown: What's Inside Your Old Console?"," The
Washington Post.
[7] ""What are the 6502, ANTIC, CTIA/GTIA, POKEY, and FREDDIE chips?"," Atari8.com.
[8] K. E. Wiegers, " "Atari Display List Interrupts"," COMPUTE! (47): 161, (April 1984).
[9] K. E. Wiegers, " "Atari Fine Scrolling".," COMPUTE! (67): 110., (December 1985)..
[10 I. Buck, " The Evolution of GPUs for General Purpose Computing. GTC 2010.".
]
[11 S. Riddle, " "Blitter Information".".
]
[12 T. Crow, " Evolution of the Graphical Processing Unit," 2004.
]
P a g e | 24
D-CE-37 (B)
Computer Organization
Assignment - 2
P a g e | 25
D-CE-37 (B)
Computer Organization
Assignment - 2
P a g e | 26
D-CE-37 (B)