0% found this document useful (0 votes)
24 views

4.-Software-Hardware Evolution and Birth of Multicore Processors

This document discusses the evolution of computer hardware and software, focusing on the shift to multicore processors. It notes that transistor size limits have driven the industry to build chips with multiple cores instead of increasingly powerful single cores. Multicore chips allow more parallel processing to continue improving performance within power and heat constraints. The document outlines key innovations like multiple cores on a single die and heterogeneous designs with different core types. It also discusses software challenges in optimizing applications to leverage parallelism across cores.

Uploaded by

Will Tinco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

4.-Software-Hardware Evolution and Birth of Multicore Processors

This document discusses the evolution of computer hardware and software, focusing on the shift to multicore processors. It notes that transistor size limits have driven the industry to build chips with multiple cores instead of increasingly powerful single cores. Multicore chips allow more parallel processing to continue improving performance within power and heat constraints. The document outlines key innovations like multiple cores on a single die and heterogeneous designs with different core types. It also discusses software challenges in optimizing applications to leverage parallelism across cores.

Uploaded by

Will Tinco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Software-Hardware Evolution and birth of

Multicore Processors
K.R. Chowdhary, Professor
Dept. of Computer Science and Engineering,
arXiv:2112.06436v1 [cs.AR] 13 Dec 2021

JNV University, Jodhpur.


Email:[email protected]

Abstract core models, but they improve over all performance


by handling more work in parallel.
This paper presents a brief journey to the evolution The past progress was inefficient in terms of tran-
of computer hardware and software, and underlines sistors and and power (such as multiple instruction
that shift to multicore technology is natual part issue, deep pipeline, out-of-order execution, spec-
of the evolution, and highlights the various laws ulative execution, and prefetching), but that in-
governing the advancement of computer industry. creased performance while preserving the sequen-
Looking to these, it appears that the HW-SW tial programming model.
industry trend can be represented by a mathe- Multicore chips are biggest change in PC pro-
matical model, for which future developments are gramming since 32-bit 386 architecture was intro-
predictable. Finally, the paper establishes that duced in 1990s. The multicore are a way to extend
future of computer industry lies in more thrust in the Moore’s law so that the user gets more perfor-
software to exploit parallelism available in software mance out of a piece of silicon. The multicore chips
to utilize the heterogeneity in multicore processors. have been introduced by AMD, IBM, Intel and Sun
micro in servers, desktops, and laptops.
Keywords: Multicore, Moore’s law, Amdahl’s
law, Myhrvold’s law, SW-HW heterogeneity.
1.1 Driving multicore
Current transistor technology limits the ability to
1 Introduction continue making single processor cores more pow-
erful. For example, as transistor gets smaller, the
The computer performance has been driven by gate which switches the electric current due to elec-
largely decreasing the size of the chips while in- trons - off and on, gets thinner and less able to
creasing the number of transistors they contain. In block the flow of electrons (figure-1). Thus small
accordance with Moore’s law this has caused the transistors tend to use electricity all the time, even
chipmakers to rise and prices to drop. This on- when they are not switching. This wastes and dis-
going trend has driven the computer industry for sipates the power. Also increasing clock speeds
years[1]. causes transistors to switch faster and thus con-
However, the transistors cannot shrink forever. sume more power and generate more heat.
As transistors components grow thinner, chip man- As per Moore’s law, the clock rates would have
ufacturers find it difficult to cap power usage and exceeded 10 GHz in 2008, and 15 GHz in 2010,
heat generation. Resulting to this, manufactur- if there were no switch over to multicore by chip
ers have started building chips with multiple cores, manufacturers in 2005. Due to this shifting to mul-
each separately cooled, instead of one increasingly ticore multicore, the clock rate has been restrained
powerful core. The multiple core chips do not nec- below 3 GHz in 2010, which is less than that of sin-
essarily run as fast as the highest performing single- gle core processors in 2004, helping substentially in

1
Table 1: Performance and clock rate increase in
Intel processors

Year Processor Transistors Clock f.


1978 8086 29 × 103 5.00Mhz
2006 Intel core2d 291 × 106 2.93GHz

Figure 1: P-channel FET with gate, drain, and


source connections If Moore’s law continues to apply, the number of
cores in a chip will keep on increasing. The server
applications primarily focus on throughput per cost
reduction of power dissipation and heating. and power. The multicore targeted for these appli-
cations use large number of small low power cores.
However, desktop users are interested in the per-
1.2 Inside multicore formance of a single application at a time. The
multicore for desktop users are more likely to have
Consider a dual core chip running multiple applica-
smaller number of larger, higher power cores with
tions against a single core. Each core in multicore
better single thread performance. Thus, general
chip includes every thing a multiprocessor has, ex-
solution is heterogeneous chip microprocessor with
cept level-2 cache memory hierarchy, which is out-
both high and low complexity cores[3].
side the multicore.
In a muticore system, compiler handles the
scheduling of instructions in a program, while op- 1.4 Advantages
erating system (OS) controls the over all assign-
ment of tasks. The OS or multithread application There is a speed gain when multiple tasks are run
parcels out the work to multicores. Generally, when on multicore, compared to on the single-core CPU.
a multicore processor has completed a task, one Because cores are on the same die, they can share
core takes the completed data from the other cores architectural components such as memory elements
and assembles the final result. and memory management. They thus have fewer
components and lower costs than systems running
multiple chips. Also, signaling between cores can
1.3 Applications be faster and use less electricity than multichip sys-
tems.
To take advantage of multicore chips, vendors must
redesign applications. So that processor can run
them as multiple threads. The programmers must
find good places to breakup the applications, divide 2 Hardware Innovations and
the work into roughly equal pieces that can run at Moore’s Law
the same time.
The vendors must redesign applications so that In 1965, Gorden Moore, a co-founder of Intel, pos-
they can recognize each core’s speed and memory tulated that number of transistors that could be
access capabilities as well as how fast cores can fabricated on a semiconductor chip would double
communicate with one another. every year. Amazingly, this forecast still holds.
The typical multicore chips are: AMD, IBM: Each next generation of transistor is smaller and
(Power p5, p6, p7), Intel: Xeon, Sun: Niagara pro- switches at faster speed, allowing clock speed and
cessor - 8 core (has shared II level cache). computer performance to increase at a similar rate
The multicore processors will find a natural home (figure 2, table 1). The figure shows that num-
in servers, but would not be very useful in desk- ber of transistors increase at the rate predicted by
tops until vendors develop considerable more mul- Moore’s law, but clock frequency grew at slightly a
tithreaded applications[2]. different rate [5].
semiconductor technology still doubles the number
of transistors on chip every two years[6]. This dou-
bling of transistors is now used to increase the num-
ber of independent processors on chip, instead of
attempting to enhance the capability of individual
processor. Now the challenge is - how to design
software to exhibit the capability of multicore ar-
chitecture. In other words, will parallelism continue
the cycle of software innovation?

Figure 2: Improvement in Intel86 architecture over 3 Software Innovations and


the years[5].
Myhrvold’s Law
A common belief among software developers is that
software grows at least at the same rate as the plat-
form on which it runs. Nathan Myhrvolds, for-
mer chief technology officer at Microsoft, memo-
rably captured this wisdom with his four laws about
software[7].

1. Software is like a gas, which has tendency to


expand to fill the capacity of any computer.

• Windows NT: LOC (lines of code) dou-


bling time 866 days (@ 33.90 percent per
Figure 3: Cycle of HW-SW Innovation process in
year).
Computer Industry[5].
• Browser: LOC doubling time 216 days,
growth rate 221 percent per year.

Faster processors governed by Moore’s law en- 2. Software grows until it comes to limitation by
abled software vendors to add new features and Moore’s law.
functionality to software that in turn demanded The initial growth is quick, like gas expand-
larger developer teams. The challenges of con- ing (as in browser), eventually limited by hard-
structing increasingly complex software increased ware (as in NT), which ultimately brings any
demand for high level languages and program li- processor to its knees, just before the new
braries. Their higher level of abstraction con- model is out.
tributed to slower code and, in conjunction with
larger and more complex programs, drove demand 3. Software growth makes Moore’s law possible.
for faster processors and closed the cycle, as indi- That is why chips get faster at the same price,
cated in figure 3. not cheaper. This will continue as long as there
is opportunity for new software.
Now the era of steady growth of single proces-
sor performance is over, and there is a transi- 4. Impossible to have enough new algorithms,
tion from sequential to parallel computation (mul- new applications, and new notion of what is
ticore). The sequential computing era ended when cool.
the practical limits on power dissipation stopped
the continual increase in clock speed, and lack of
exploitation in instruction level parallelism dimin- In fact, increase in size of SW has not only
ished the values of complex architectures. overloaded the processor and memory, but have
However, the Moore’s law is still applicable, as many positive effects also. Some of these are
following:
Table 2: Compiled Code size for Hello World

Language Size in bytes


1. Increased functionality
C 5874
• Improved security C++ 8762
• Printer and IO drivers for graphics
• Printer resolution and color depth (24
Table 3: Execution time for Hello World
bits)
• Improved software engineering practices Mechanism Timer(280 nsec.)
such as layering of SW architecture and C++, Console 1760
modularizing system to improve develop- C++, Windows 36375
ment C♯, console 2628
C♯, windows 80348
• The data manipulated by computers also
evolved from simple ASCII to larger
structured objects (word and excel doc-
uments), to compressed documents like 3. Decreased Programming Focus
jpeg, xml (computation and space saving) Abundant machine resources have allowed pro-
• Growing use of videos grammers to become easy-going about perfor-
mance and less aware about resource consump-
2. Programming changes tion by their code. More important is change
Over the last 30 years, programming languages in the mindset of the developer. Consequently,
have evolved from Assembly language and C most codes run at near machine capacity, and
code to high level languages. Major step was any further increase in code size / complexity
C++, which brought object oriented mecha- will retard the performance.
nism. C++ also introduced abstraction, like
classes and template, and made possible rich
libraries. These features give expensive run 4 Mitigating Amdahls’s Law
time implementations, modularity in develop-
ment, and information hiding. These practices The Amdahl’s law sates that speed of a particular
enabled the creation of ever larger and more application is limited by the fraction of the applica-
complex software. tion that is serial (cannot be parallelized). During
the serial portion of execution, the chip’s power
Safe and managed languages such as C♯ and budget is applied towards using a single large core
Java further increased the level of program- to allow the serial portion to execute as quickly as
ming by introducing garbage collection, richer possible. During the parallel portions, the chip’s
class libraries (.net, java classes), just-in-time power budget is used more efficiently by running
compilation, etc. All these features provide the parallel portions on large number of small-area
powerful abstractions for developing software and power efficient cores. Thus, executing serial
but consume lot of processor resources and portions of an application on a fast but relatively
memory. inefficient core and executing parallel portions
The rich features of language requires a run of an algorithm on many small cores can max-
time system to maintain a large amount of imize the ratio of performance to power dissipation.
meta-data on every method and class at run
time, even if the required features are not in- Illustrative Example: Let for 10 percent of time
voked. The table 2 shows the code for com- a program gets no speedup on a 100-core computer.
piling “Hello World”in Windows Vista, coded To run this sequential piece twice as fast, assume
using Visual studio. The table 3 shows the a single fat core would need 10 times as many re-
execution time of “Hello World”. sources as a thin core due to larger caches, a vector
unit, and other features. Applying Amdahl’s law, programmars are not only ill equipped to produce
the speedups can be computed as follows[8]: proper parallel programs, they also lack the tools,
and environments for producing such programs[9].
i True serial: 1 sec. Dealing with those issues require a suite of tools
and environments that provide users and develop-
ii For 100 thin cores: ers with convenient mechanisms for managing dif-
Required time = 0.1 + 0.9/100 = 0.109 sec. ferent resources in multicore environments. The
Relative speedup = 1/0.109 = 9.17 times resources are memory, cache, and computing units,
iii For 90 thin cores, and one thick core: compilers that allow sequential programs to au-
Required time = 0.1/2 + 0.9/90 = 0.06 sec. tomatically take advantages of multicore systems,
Relative speedup = 1/0.06 = 16.66 times strategies that allow users to analyse performance
issues in multicore systems, and environment to
Since, the implementation cost of one thick core control and eliminate bugs in parallel threaded pro-
is taken as equal to 10 thin cores, the total cost in grams.
(ii) and (iii) is going to be same. Developing parallel algorithms are considerable
challenge, but many problems, such as video
Requirement of Heterogeneity in Software: processing, natural language interactions, speech
To take full advantage of heterogeneous Chip Mul- recognition, linear and nonlinear optimization, ma-
tiprocessors (CMPs), the system software must use chine learning, are some of the areas, which
the execution characteristics of each application to are computationally intensive, and multicore are
predict its future processing needs and then sched- the solutions for these. However, Amdahl’s law
ule it to a core that matches those needs if one is limit the scope until sequential component itself is
available. The predictions can minimize the loss to rescheduled.
the whole system rather than to the single applica- An alternative use for multicore processor is to
tion. redesign a sequential application into a loosely cou-
Effective schedulers can be implemented, even pled or asynchronous system in which computa-
with the current commercial operating systems, tions run on separate processors. For example, it is
and open OS like Linux. natural to separate monitoring features from pro-
To achieve the best performance it will require gram logic.
to compile the program for heterogeneous CMPs It is fact that, for many applications, most func-
slightly differently. That is, for statically schedula- tionality is likely to remain sequential. For software
ble and dynamically schedulable cores. Program- developers to find the resources to add or change
ming or compiling parallel applications might re- features, it may be necessary to eliminate old fea-
quire more awareness of heterogeneity. Application tures or reduce their resource consumption. An im-
developers assume that cores provide equal perfor- portant consequence of multicore is that sequential
mance, but heterogeneity breaks this assumption. performance tuning and code-restructuring tools
In fact, many of the future areas of research remain are likely to be increasingly important.
for heterogeneous CMPs[4]. Another challenge is modifications and updating
of parallel code. Parallelism will also force ma-
jor change in software development. Moore’s divi-
5 Multicore and its future dend enabled a shift to higher level languages and
libraries, the pressure driving this trend will not
With clock speeds stalling out, and computational change, because increasing abstraction improves se-
power increasing due to doubling of number of curity reliability, and program productivity. In
cores per processor, serial computing is now dead, multicore, parallel garbage collection does not in-
and parallel computing revolution is now upon us. crease pause time of threads, hence real-time ap-
However, writing paralle applications is a signifi- plications are easier to implement with real-time
cant challenge. For parallelism to succeed it must constraints.
ultimately produce better performance relative to Another approach that sacrifices the perfor-
speed, efficiency, and reliability. However, most mance for development productivity is, to hide the
underlying parallel implementation. of parallel programming models. Third, the lan-
guages must exploit parallel resources efficiently.
Cilk, an extension of C provides two key words
6 Tools for Multicore pro- for expressing parallelism. A spawn transforms
gramming sequential (blocking) function call into an asyn-
chronous (non blocking) call. A sync blocks a func-
Cetus [10, 11] is an open source source-to-source C tion’s execution until all its spawned children are
compiler written in java and maintained at Purdue completed.
University. The Cetus has features of automatic
parallelization of C code. The following part shows Cilk int fib (n) {
the transformation of C code by Cetus. if ( n < 2 ) return n;
else
#Input to Cetus int x, y;
int temp; x = spawn fib (n -1);
int main(void) { y = spawn fib (n -2);
int i,j, c, a[100]; sync;
c = 2; return ( x + y);
for (i=0; i<100; i++){ }
a[i] = c*a[i] + (a[i] -1); }
}
} Because the recursive calls to fib are spawned,
Cilk’s runtime system can execute them in paral-
#Output from Cetus lel. Because the expression (x + y) depends on the
int temp; results of both calls, the sync ensures that both
int main(void) have completed before the addition begins.
{ The Cilk’s run time system must efficiently
int i,j, c, a[100]; map logically independent calls onto computational
c = 2; cores.
for (i=0; i<100; i++)
{
a[i] = ((c*a[i]) + (a[i] -1)); 8 Conclusion
}
} The multicore processors will change the software
profoundly as previous hardware evolutions such
Cetus performs symbolic transformations for as the shift from vacuum tubes to transistors or
efficient source code generation and the one which transistors to ICs. Parallelism will drive software
can be parallelized easily. Some examples are: in new direction which is computational intensive.

1 + 2*a +4 -a ⇒ 5 + a (folding)
a*(b + c) ⇒ a*b + a*c (distribution) References
(a*2)/(8*c) ⇒ a / (4*c) (division)
[1] David Geer, “Chip makers turn to Multicore
Processors”, Computer, May 2005, pp. 11-14.

[2] Terraya A., Tenhunen H. and Wolf W., “Multi-


7 Multicore programming processor Systems on Chips”, Computer, June
languages 2005, pp. 36-40.

A parallel programming language must have key [3] Rakesh Kumar, et al., “Heterogeneous chip
properties. First, expressing parallelism should be multiprocessors”, Computer, Nov. 2005, pp.
simple. Second, should be able to combine number 32-38.
[4] Stroze L. and Brooks D., “Energy and Area-
Efficient Architectures through Application
Clustering and Architectural Heterogeneity”,
ACM Transaction son Architecture and Code
Optimization, Vol. 6, No. 1, March 2009, pp.
4.1-4.31.

[5] James Larus, “Spending Moore’s dividend”,


Communications of ACM, Vol. 52, May 2009,
pp.62-69.
[6] Hachman M., “Intel’s Gelsinger Predicts In-
telInside Every Thing”, PC Magazine, July 3,
2008.
[7] research.microsoft.com/acm97/nm/tsld026.htm.
[8] Krste Asanovic et. al., “A view of parallel
Computing Landscape”, Communications of
ACM, Oct. 2009, vol. 52, No. 10, pp. 56-67.
[9] W-Chun and pavan Balaji, “Tools and Envi-
ronments for Multicore and many core Archi-
tectures”, IEEE Computer, Dec. 2009, pp. 26-
27.
[10] Chirag Dave, et. al., “CETUS: A source-to-
source compiler infrastructure for multicores”,
IEEE Computer, Dec. 2009, pp. 36-42.
[11] cetus.ecn.purdue.edu.

You might also like