Von Neumann Architecture vs. Parallel Processing
Von Neumann Architecture vs. Parallel Processing
c
The computer architecture Von Neumann devised in the 1940s remains the basis for
mainstream computing still today. The base components for his architecture were the core elements:
Input/output devices, Memory, Logic Unit, and Control Unit. Not much has changed over the years
except for our ability to make these units run faster. The technology behind these devices has changed,
but the theoretical concepts have not. For instance, instead of vacuum tubes, we have switched to
transistors and have refined our manufacturing methods to make them smaller and smaller. However,
in the last few years, the exponential leaps and bounds we have been able to achieve in refining our
manufacturing methods has died down. Gains of this nature are still being seen over time but they are
modest gains, not nearly at the rate from the 40s to 70s, or even from the 70s to 90s. These gains are
leveling off quickly and so a change was needed. Parallel processing, while not an extreme change from
the Von Neumann Architecture, is allowing for additional speedup via multiple processor cores which
execute instructions in parallel.
This change in the approach to achieving speedup is exemplified by an excellent quote from the
reading, ͞If you cannot build something to work twice as fast, do two things at once. The results will be
identical.͟ (Schneider & Gersting, 2007, p.226). The results are identical if done correctly, although, the
increased complexity comes at a price too. Not only are multiple processors needed, but typically they
all require their own cache memory, and an extremely intricate inter-processor communication system
to keep the duplicated cashed memory addresses up to date and correct and to keep instructions
appropriately allocated to the multiple processors. Because we are no longer executing instructions in a
sequential fashion, we need to ensure that logically the instructions we execute out of order can be
done without causing a problem. Instruction order is mainly limited by data dependencies, for example
if instruction A needs to read a memory address that instruction B writes to, then B is dependent on A
and cannot be executed before or parallel to it. Depending on the code being executed this can be a
major limitation in parallel processing. There also exist methods such as loop unrolling and register
renaming for increasing the parallelization of instructions but these also come at a cost in complexity.
The end result after all the added complexity has been dealt with is indeed a faster machine. So, the
ends here justify the means but the overall complexity in parallel processor systems may eventually slow
or stop the speedup growth that we can achieve in such systems.
The Von Neumann architecture, for the last 20 years or so, extended its usable lifetime a bit by
incorporating execution methods similar to those used in multi core machines today. A major limitation
that original sequential processing suffered from was the unnecessary idling of system resources. Even
if a Von Neumann machine does not contain multiple cores and thread level parallelism, some degree of
instruction level parallelism is achievable via parallel execution pipelines. (Hennessy & Patterson, 2007,
p.66) ͞All processors since about 1985 use pipelining to overlap the execution of instructions and
improve performance. This potential overlap among instructions is called instruction level parallelism
(ILP), since the instructions can be evaluated in parallel͟. A key difference between a Von Neumann
machine with instruction level parallelism and a muti-processor machine with thread level parallelism, is
that with a single core, instructions cannot actually be executed in parallel, but really just set up in
parallel since there is really only one core to perform executions. Ultimately the goal in instruction
level parallelism is to reduce the clock cycles per instructions (CPI) and this is done by more effectively
utilizing our system resources. While the stages of an instruction need to be performed sequentially,
they can overlap each other and since different recourses are required for the different stages, they can
be done in parallel.
The text, (Schneider & Gersting, 2007, p.219), describes these basic phases as the Fetch Phase,
the Decode Phase, and the Execute phase, but generally when implementing a basic instruction pipeline
we use five stages, the extra two being Memory Access and Register Write Back. Having five stages will
mean that we should also have five pipelines to execute the different stages for five instructions in
parallel. A visual aid here (From Wikipedia͛s Parallel Computing) can help to conceptualize how this
works:
͞A canonical five-stage pipeline in a RISC machine (IF = Instruction Fetch, ID = Instruction Decode, EX =
Execute, MEM = Memory access, WB = Register write back)͟
While executing these pipelined instructions in parallel will speed up execution and help utilize
our system resources it will not achieve a five fold decrease in CPI. Since a chain is only as strong as its
weakest link, the pipeline which requires the most clock cycles can hold up the other pipelines while
they wait for it to complete.
In conclusion, the days of a single processor Von Neumann architecture are at an end because
the concepts it is based on cannot be refined much further. This is true in terms of transistor size, clock
frequency, and even innovative logical level tweaks such as pipelining. Short of a breakthrough in one of
those areas the next most practical way to achieve significant speedup is through the use of thread
level, multi core, parallel processing. Parallel processing machines have been gaining steadily in
popularity over the last 10 years and now dominate the sales rack at best buy. The overall architecture
behind these machines is nothing revolutionary. They are based on the Von Neumann architecture and
only vary in that they have multiple processing cores and require all the circuitry needed for those cores
to play nicely with one another. While we wait for quantum computing to come of age or some other
more drastic change in how computing is done, hopefully we will be able to exploit parallel processing to
achieve speedup for years to come.
References:
http://en.wikipedia.org/wiki/Parallel_computing
! "# $# "%! &
'(Invitation to Computer Science Third
Edition ± Java Version
u"% $ ")*!+&
'(Computer Architecture Fourth Edition ± A
Quantitative Approach