0% found this document useful (0 votes)
10 views

An Operating System Framework For Large

Uploaded by

kevinallein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

An Operating System Framework For Large

Uploaded by

kevinallein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Submitted to the Seventh IEEE Symposium on Parallel and Distributed Processing

An Operating System Framework for Large Parallel Computers


Isaac D. Scherson
Piotr Chrzastowski-Wachtely
Dinesh Ramanathan
Raghu Subramanian
Vara Ramakrishnan
Ver^onica L. M. Reisz
fisaac,pch,dinesh,raghu,vara,[email protected]
Department of Information and Computer Science
University of California, Irvine
Irvine, California 92717-3425

Abstract
Little work has been done on operating systems for massively parallel computing. This paper pro-
poses a framework for such an operating system. It is assumed that there are multiple jobs executing
on a large MIMD computer. Each job is assumed to be data parallel, using as many virtual processors
as necessary to exploit its inherent parallelism. We view the notion of virtual processors as playing
a unifying role in the conceptual design of the operating system. Our main thesis is that the vari-
ous functions performed by the operating system may be viewed as operations on the set of virtual
processors.
In the context of the above framework, several open theoretical problems are identi ed, and in
particular, the twin problems of spatial and temporal scheduling are addressed. Preliminary analysis
indicates the viability of horizontal spatial schedules and periodic temporal schedules.


This research was supported in part by the Air Force Oce of Scienti c Research under grant numbers F49620-92-J-0126
and AFOSR-90-0144, the NASA under grant number NAG-5-1897, and the NSF under grant numbers MIP-9106949 and
MIP-9205737
y
On leave from the Institute of Informatics, Warsaw University
z
Supported by CNPq (Conselho Nacional de Desenvolvimento Cient co e Tecnologico), Brazilian Government, under
grant number 200358-92.8

1
1 Introduction
When programming a uniprocessor machine, the user has to know very little about the machine to write
ecient programs. With current massively parallel processors (MPPs), however, the user needs to know
machine dependent details such as the number of processors available and the amount of memory at
each node. In addition, users often analyze characteristics of the machine's communication delays and
the application's communication patterns in order to speed up programs [1]. Current parallel programs
are, therefore, highly machine speci c, and dicult to port across machines. One of the reasons for this
situation is the lack of operating systems that can eciently bridge the gap between parallel machines
and high level parallel programming models.
Moreover, the high cost of current MPPs demands that they be utilized eciently and to the fullest. (In
the past, these precise economic considerations used to apply to mainframes.) Typically, this requires the
accommodation of multiple users, and an operating system to manage the sharing of machine resources
among these users.
There are, in fact, several MPP operating systems available in the market, for example, CM-5's CMost [25]
and T3D's UNICOS [17]. Unfortunately, the convenience a orded by these commercial operating systems
is nowhere near what we have grown accustomed to with uniprocessor operating systems. Duties that
properly belong to the operating system, such as processor virtualization and virtual memory management,
are often ignored and dubbed as \programmer responsibility." The inability of these operating systems
to o er programming convenience without compromising performance may stem from the fact that they
are largely extensions of uniprocessor operating systems.
This paper attempts to rethink, ab initio, the role of operating systems in massively parallel computing.
We envisage several jobs executing on a large MIMD machine; each job is assumed to be data parallel,
using as many virtual processors as necessary to exploit its inherent parallelism. We believe that the notion
of virtual processors uni es the conceptual design of the operating system, in much the same way as the
lesystem does in UNIXTM . Our main thesis is that most activities of the operating system may be viewed
as operations or manipulations on the set of virtual processors. Viewing the activities of the operating
system in this context facilitates the examination of the merits and demerits of various operating system
policies and reveals the underlying basic theoretical problems.
Section 2 presents a framework to view operating system issues in terms of virtual processors. In the
context of this framework, the sections that follow take a closer look at two complementary problems:
spatial and temporal scheduling. In Section 4, we consider spatial schedules, and make a qualitative
comparison between two simple spatial scheduling policies. In Section 5, a simpli ed, discrete model for
temporal scheduling is proposed, and metrics for evaluating temporal schedules are described. Section 6
nds the optimal temporal scheduling policy in terms of these metrics.

2 Operating System Framework


The operating system is viewed as a layer between the programming model and the physical machine
model. This is schematically depicted in Figure 1. Since a program is nothing but a description of a
virtual machine, the desired function of an operating system can be succinctly stated as the simultaneous
emulation of several virtual machines (corresponding to multiple programs) on a single physical machine
in an ecient manner.

2
VM

VM
VM VM - Virtual Machine
Each VM is a user job

VM Physical
VM
Machine

Operating System
VM VM

Programming Model

Figure 1. The function of the operating system is to emulate several virtual machines (corresponding to multiple
programs) on a single physical machine.

2.1 Programming and Machine Models


Since the operating system bridges the programming model and the physical machine, an understanding
of the role of the operating system requires a clearer de nition of both the programming model and the
physical machine.

 The programming model


Several parallel programming models have been proposed in the past. Of late, at least in the realm
of massively parallel computing, there seems to be a convergence on the data parallel programming
model [2, 7], in the form of HPF [12] and Fortran 90 [3, 16]. Sabot [20] mentions that, in a survey of
120 parallel algorithms from three ACM Symposia on Theory of Computing (STOC), all were found
to be data parallel. This preponderance of the data parallel programming model is perhaps because
it allows one to express a large degree of parallelism, while retaining the single-thread-of-control
philosophy of sequential programming.
 The physical machine
A natural way of executing a data parallel program is on a SIMD machine. However, it would
be a mistake to equate data parallelism with SIMD. A data parallel program may just as well be
executed on an asynchronous MIMD machine, and in fact, there are several advantages to doing so.
The chief advantage is that it is possible to run several jobs on a MIMD machine simultaneously.
Moreover, a MIMD machine does not force unnecessary synchronization after every instruction, or
unnecessary sequentialization of non-interfering branches, as a SIMD machine does. These factors,
among others, explain the recent market trend towards MIMD computers, exempli ed by Thinking
Machines' CM-5 and Cray Research's T3D.

The combination of the data parallel programming model and a MIMD machine model is called a SPMD
execution model [18, page 606]. SPMD stands for Single Program Multiple Data, indicating that all
processors execute the same program, but may be at di erent instructions at a given time, owing to
asynchronous execution. Throughout this paper, our discussion of operating systems is predicated on the
SPMD execution model.

3
2.2 Virtual Processors as a Basis for Operating Systems
As pointed out before, a program is nothing more than a description of a virtual machine. In the case
of a data parallel program, the virtual machine consists of a (typically large) number of identical virtual
processors (VPs), communicating through an interconnection network. For instance, the standard data
parallel program to multiply two N  N matrices [10] might be viewed as a virtual machine consisting of
N 2 VPs communicating in a mesh.
For many years now, the concept of virtual processors has been relegated to the status of a mere logical
aid to programmers [24, 8]. In our view, the notion of virtual processors should form the fundamental
basis of an MPP operating system. We claim two advantages to this approach.

 It is our thesis that most of the functions of an MPP operating system can be viewed as operations
on the set of VPs. Thus, the notion of virtual processors provides a uni ed framework within which
several operating system issues may be considered and evaluated. We brie y illustrate our thesis
below, by phrasing various well-known operating system issues in terms of VP manipulations:
{ Spatial scheduling: When a job enters the system, the spatial scheduling (or space sharing)
policy de nes which processor each VP is allocated to.
{ Temporal scheduling: The temporal scheduling (or time sharing) policy dictates how each
processor switches between the execution of the VPs allocated to it.
{ Load balancing: Once a VP is allocated to a processor, it usually does not get reallocated, since
moving VPs between processors is quite expensive. However, there are situations when VPs do
indeed move between processors. For example, if a job spawns and kills VPs dynamically in an
unpredictable fashion, it is periodically necessary to load balance the VPs among processors.
{ Memory and I/O problems: Memory limitations and I/O bottlenecks may also be phrased in
terms of VPs. Memory limitations occur when a processor does not have enough local memory
to hold all the VPs allocated to it. I/O bottlenecks are most pronounced while loading (roll-in)
the VPs of a job from the disk into the local memories of various processors1.
The above problems are not independent of each other | any policy on one issue has subtle reper-
cussions on the other issues. The VP model enables us to understand and tackle these complex
interactions. For example, suppose each processor schedules the VPs assigned to it in FIFO order.
A VP that is at the end of a processor's queue may work its way towards the head, only to be
bumped o to the end of another processor's queue because of an ill-timed load balancing. If this
continues, the VP in question might never get a chance to be executed.
 An operating system based on virtual processors alleviates several constraints imposed by current
commercial MPP operating systems, such as Thinking Machines' CM-5 and Cray's T3D.
Since these operating systems do not support the virtual processor abstraction, it is the programmer's
(or compiler's) responsibility to manage the virtual processors. This involves grouping together
logical VPs into coarse grain processes, forcing the number of such processes to t a legal partition
size. If all partitions of that size happen to be in use, then the programmer must either wait,
or re-group the VPs (perhaps compile the code with di erent options) to t another partition size.
1
For example, it takes the Cray T3D (128 processors with 64M per processor, and 2 I/0 gateways at 50M/sec) about 1.3
minutes to load the whole machine.

4
Furthermore, if some jobs terminate, leaving the machine lightly loaded, then current MPP operating
systems are unable to load balance the VPs of still-running jobs over a larger number of processors. 2

On the ip side, there are, conceivably, performance overheads in designing an operating system based
on virtual processors. Whether the convenience won is worth the performance lost can not be settled by
rhetoric | extensive experimentation and analysis is required before anything can be said on the matter.
As an application of the virtual processor framework, in the following sections, we take a preliminary look
at two of the problems mentioned above, namely spatial and temporal scheduling. As the names suggest,
these two issues are complementary to each other: the rst determines where VPs must be executed, while
the second determines when VPs must be executed.

3 Prior Work on Spatial and Temporal Scheduling


There is a plethora of work on scheduling in the literature. However, almost all of the work is in the
context of bus-based multiprocessor systems and a fork-and-join programming model [22, 4, 5, 13, 14,
15, 23, 6, 21]. Perhaps the only commonality between the existing literature and the problem at hand is
the word \scheduling". The conclusions of the scheduling papers and books are primarily artifacts of the
dynamically varying number of threads, and the disk access latencies associated with switching between
threads. We believe that the issues involved in the context of a SPMD execution model are completely
di erent and merit separate study.
The scheduling support provided by commercial parallel computers is limited at best [9, 19]. For example,
MasPar's MP-2, being a SIMD machine, does not allow space sharing at all: at any given time, only one
job may run on the machine. However, it does allow time sharing: the whole machine switches between
the various jobs at regular time intervals (this is often called gang scheduling). In contrast, Cray's T3D
allows space sharing but disallows time sharing. When a job arrives, a set of processors is carved out of
the pool of free processors (if possible) and placed at the job's disposal (this is called partitioning). The
partition runs the job to completion, never ever switching to another job. Thinking Machines' CM-5 falls
somewhere in between, providing limited forms of both space sharing and time sharing. The machine is
divided into partitions of prede ned sizes at boot time. Several jobs may share a partition. Each partition,
en masse, switches between the various jobs allocated to it at regular time intervals.

4 Spatial Scheduling
Whenever a job enters the system, the spatial scheduling policy must specify the processor that each VP
of the job is allocated to. For simplicity, we restrict ourselves to the static case, wherein a set of jobs
present themselves to be allocated initially, and no jobs arrive or leave the system thereafter. Moreover,
we assume that jobs do not spawn and kill VPs dynamically, thereby necessitating load balancing.
In the static case, a spatial schedule is simply a mapping from the set of VPs to the set of processors.
Figure 2 shows an example of such a spatial schedule.
Spatial scheduling can be done in many ways. Two policies suggest themselves immediately. A vertical
spatial schedule is one in which the VPs of every job are granted exclusive access to a subset of the
2
This information was obtained from discussions with engineers from MasPar, Cray Research and Thinking Machines [1,
9, 19].

5
4 4

3 4 5 4 2

2 2 2 4 1

1 2 3 4 5
Figure 2. A spatial schedule, or allocation. In this toy example, 5 jobs need to be allocated on a machine with 5
processors. The jobs have 1,4,1,5, and 1 VPs each.

Vertical Horizontal
Processor Uti- Wasted, if the number of idle PEs is not enough to Fully utilized. When a job terminates and leaves
lization satisfy the minimum requirements of any queued the system, the resources are automatically shared
job. If jobs leave the system, freeing many PEs, among the remaining jobs in the system: no load
then VPs of running jobs can not exploit the free balancing required.
PEs, unless load balancing is performed (with large
OS overhead).
Memory As in processor utilization case, memory resources Since each job uses all PEs, a copy of each program
utilization may not be fully utilized. On the other hand, since has to be on each PE, increasing code memory.
many VPs of the same job are on each PE, the Also, since VPs of di erent jobs exist on the same
number of copies of the program can be minimized PE, memory fragmentation may occur.
by just having one per PE. Memory fragmentation
does not occur.
Interprocessor Reduced, as VPs on the same PE can communicate Many communications performed concurrently.
communica- by local memory access, however, these communi- Although the network delay may be signi cant,
tion cations will be sequential. horizontal allocation makes the communication
patterns random, which makes networks behave
well [11].
Roll-in/roll- When only one job is allocated on a processor, time Can be fully masked: while the PEs are execut-
out time lost in roll-in/out is unavoidable, and usually sig- ing jobs allocated to them, the new one is loaded
ni cant. into the system by using a DMA controller. Once
the job has been loaded, the local schedule is aug-
mented with the VPs of the new job. A similar
procedure is applied to mask the roll-out time.
Table 1. A qualitative comparison of vertical and horizontal spatial schedules

processors. This scheme is also called partitioning. At the other extreme, a horizontal spatial schedule is
one in which the VPs of every job are spread evenly over all the processors in the system (or as many
processors as possible, if the number of VPs is smaller than the number of processors).
Table 1 gives a qualitative comparison of vertical and horizontal spatial schedules in terms of system
performance metrics such as processor utilization, memory utilization, interprocessor communication, and
roll-in/roll-out time. These preliminary considerations indicate several advantages of horizontal allocations
over the conventionally adopted vertical/partitioning schemes.

5 Temporal Scheduling
Once spatial scheduling is done, the problem is more local in nature. Several VPs, possibly belonging to
di erent jobs, may have been allocated to the same processor. The temporal scheduling policy speci es

6
Time ... ... ... ... ... ...
Slices

IDLE IDLE IDLE


6 Job A

Period 2
5

Job B
4

IDLE IDLE IDLE


3
Job C

Period 1
2

1
Processors

PE 1 PE 2 PE 3 PE 4 PE 5 PE 6

Figure 3. An Example of a Trace Diagram.

how each processor multiplexes the execution of the VPs that are allocated to it.

5.1 A simple model of temporal schedules


In its most general form, a temporal schedule may be described as follows. Each processor loops through
the following cycle: According to some policy, it picks up one of the VPs currently allocated to it and
runs it, until one of the following two conditions is satis ed:
 The VP hits a barrier synchronization3 (when the data it needs to read is not ready).
 A quantum of time, called the time-slice, elapses.
If all VPs allocated to a processor are stuck at barriers, then the processor idles.
Modeling a temporal schedule analytically to the last detail, or even simulating it, is a hairy task. Even
if it were possible to do so, the results would probably be quite inaccurate, owing to our ignorance of the
characteristics of real-world parallel programs (such as: how often does an \average" data parallel program
hit a barrier synchronization?). In order to gain engineering insights without getting bogged down with
inessential details, we propose the following simpli ed, discretized model of temporal schedules.
The execution of a parallel computer is represented by a two dimensional diagram, such as the one shown
in Figure 3. The vertical (time) axis is discretized into time steps of length equal to the time slice. The
horizontal (processor) axis has one spot for every processor. While the processor axis of the diagram
is nite, the diagram extends in nitely along the time axis. This is a simpli ed representation of the
fact that the time for execution of a job (minutes) is much larger than a time-slice (milliseconds). Each
square of this diagram labelled with either the name of a VP, or the word IDLE. When a square (which
corresponds to a processor and a time-step) is labelled with the name of a VP, it means that the VP
was executed on that processor during that time-step. The IDLE squares the trace diagram require more
careful interpretation. An IDLE square is not intended to indicate deliberate lazing by the processor, but
rather an unavoidable idling caused by synchronization statements in the programs.
The above two dimensional diagram is called a trace diagram. Not all trace diagrams can occur. In a legal
execution of jobs, no VP can be ahead of any other VP of the same job by a barrier synchronization. To
3
Since all jobs are data parallel, it is assumed that the only synchronization statements are barriers.

7
model this fact in the trace diagram, it is postulated that a trace diagram is legal if all VPs belonging to
the same job receive the same number of time slices \on average". More precisely: Over any period of
time, the number of time slices devoted to the various VPs of a job di er by no more than a constant.

Metrics for Temporal Scheduling


What quali es a temporal schedule as good? There are two angles to this question. From the system
manager's point of view, a temporal schedule is good if all processors are kept busy (the imagery here is
that users are charged by CPU time; so if all the CPUs are always busy, then the management makes a lot
of money). From a single user's point of view, a temporal schedule is good if the user's job is guaranteed
a certain rate of progress in its execution. As usual in any working system, these two objectives may
sometimes con ict, in which case they have to be traded o against each other.
The system administrator's point of view is formalized as a metric called idling ratio. In terms of the
trace diagram, the idling ratio is the fraction of the squares of the trace diagram that are marked IDLE.
The system administrator wishes the idling ratio to be as small as possible.
Similarly, the user's point of view is formalized as a metric called job happiness. A job's happiness is the
fraction of squares in the trace diagram marked with the name of a VP belonging to that job. The user
wishes the job happiness to be as large as possible.
Some care is required in the precise mathematical de nition of idling ratio and job happiness. Since a
trace diagram contains an in nite number of squares, and the number of squares marked X (here X is
either the name of a VP, or the word IDLE) may also be in nite, it is not immediately clear what is meant
by the fraction of squares marked X. We adopt the following de nition. The fraction of squares marked
X is the in mum of all  such that: There exists a suciently large time interval t such that, in any
window of trace diagram of length t, the fraction of squares marked X is greater than or equal to .
If X is a job, then t may loosely be interpreted as an upper bound on the response time (for example,
the time that X's screen is frozen) seen by X.
Traditionally, in distributed systems, a much weaker de nition for job happiness, called fairness, is used.
A temporal schedule is fair if each VP occurs in nitely many times in the trace diagram (that is, no VP
is consistently ignored by a processor). The above de nition of job happiness is much stronger, since it
demands not only that the VP be serviced eventually, but also within a certain hard time bound.

6 Finding Good Temporal Schedules


Having de ned what constitutes a good temporal schedule, it is natural to try to construct one. This is
done in two stages. First, a simple class of schedules called periodic schedules is de ned, and it is shown
that there are periodic schedules that are as good as any other temporal schedules, enabling us to restrict
the search for a good schedule to the class of periodic schedules. Next, it is shown that the problem of
nding the best among periodic schedules can be formulated as a linear program.

6.1 Periodic Schedules Suce


A periodic schedule is a trace diagram that repeats at regular intervals in the vertical (time) direction.
Figure 1 depicts a periodic schedule with a period of three time slices. A periodic schedule is a slight

8
extension of a round robin schedule, wherein a processor may return to a VP many times within the same
round.
An optimal temporal schedule is de ned as one with the least idling ratio that satis es the happiness
requirements of all the users. The following theorem proves that for any temporal schedule, there exists
a periodic schedule with lower idling ratio, in which each job is at least as happy. This implies that the
search for an optimal temporal schedule may be restricted to the class of periodic schedules.
Theorem 1. For every temporal schedule S , there exists a periodic schedule Sp, such that the idling
ratio of Sp is at most that of S , and every job's happiness in Sp is at least as much as in S .
Proof: De ne the progress of a job at a particular time as the number of time slices granted to each of
its VPs upto that time. Thus, if a job has V VPs, its progress at time slice t may be represented by a
progress vector of V components, where each component is an integer less than or equal to t.
By the rules of legal execution, no VP may lag behind another VP of the same job by more than a constant
C number of time slices. Therefore, no two elements in the progress vector can di er by more than C .
De ne the di erential progress of a job at a particular time as the number of time slices by which each
VP leads the slowest VP of the job. Thus, di erential progress vector at time t is also a vector of V
components, where each component is an integer less than or equal to C . The di erential progress vector
is obtained by subtracting out the minimum component of the progress vector from each component of
the progress vector.
The system's di erential progress vector (SDPV) at time t is the concatenation of all job's di erential
progress vectors at time t. The key is to note that the SDPV can only assume a nite number of values.
Therefore, there exists an in nite sequence of times ti1 ; ti2 ; . . . such that the SPDVs at these times are
identical.
Consider any time interval [tik ; tik ]. One may construct a periodic schedule by cutting out the portion of
0

the trace diagram between tik and tik , and replicating it in nitely in the vertical direction.
0

First of all, we claim that such a periodic schedule is legal. From the equality of the SPDVs at tik and
tik , it follows that all VPs belonging to the same job receive the same number of time slices during each
0

period. In other words, at the end of each period, all the VPs belonging to the same job have made equal
progress. Therefore, no two VP lags behind another VP of the same job by more than a constant number
of time slices.
Secondly, observe it is possible to choose a time interval [tik ; tik ], such that the happiness of each job
0

in the during this interval is at least as much as in the complete trace diagram. This implies that the
happiness of each job in the constructed periodic schedule is greater than or equal to the happiness of
each job in the original temporal schedule.
Finally, the idling ratio of the constructed periodic schedule must be less than or equal to the idling ration
of the original temporal schedule. Since the fraction of area in the trace diagram covered by each job
increases, the fraction covered by the holes must necessarily decrease. This concludes the proof. 2

6.2 The Optimal Periodic Schedule


Let 1 ; 2; . . . n denote the n processors of the machine, and let J1 ; J2 ; . . . ; Jm denote the m jobs which
need to be run. Each job Jj has an associated happiness requirement j . (The happiness requirement
may be a function of the priority level of the job, for example.) The problem is to nd a periodic schedule,
if possible, that satis es all jobs' happiness requirements, and that minimizes the idling ratio.
9
1 2 3 4 5
J1 0 0 0 0 1
J2 1 1 1 0 1
J3 1 0 0 0 0
J4 1 1 0 3 0
J5 0 0 1 0 0
Figure 4. The allocation matrix corresponding to the spatial schedule depicted in Figure 2.

We assume that the spatial scheduling is done, and VPs have been allocated to processors somehow. The
spatial schedule can be summarized in the form of an m  n matrix A, called the allocation matrix, where
A gives the number of VPs of job J on processor  . For example, Figure 4 gives the allocation matrix
j;p j p

corresponding to the spatial schedule shown in Figure 2.


Let T be the period of the sought periodic schedule, and let R be the number of time slices received by
j

any VP of job J in one period of the periodic schedule. (Recall that within each period of a periodic
j

schedule, all VPs belonging to the same job must receive exactly the same number of time slices.) Of
course, T and the R s are not yet known.
j

To summarize, the input data available to us are the  s and the A s; and the output data to be computed
j j;p

are T and the R s. Without loss of generality, let us normalize the period T to 1, and rede ne the R s
j j

to be the old R s divided by T . (Thus, while the old R s were integer variables, the new R s rational
j j j

numbers.)
The minimization of the idling ratio may be written algebraically as:
0 XX 1
B A R CC
min B
j;p j

B@1 ? n CA p j
(1)

The constraints imposed by the happiness requirements are:


PA R
n
p
 j;p j
j 8j = 1; . . . ; m: (2)
Finally, we need a sanity constraint, restricting the normalized period to 1.
X
A R 1 j;p j8p = 1; . . . ; n: (3)
j

The objective function (1), along with the constraints (2) and (3), form a linear program, which may be
solved using standard techniques. An apparent complication is that we seek not just any R s, but R s j j

that are rational. This is really not a problem. As long as the  s are rational, the R s will automatically
j j

turn out to be rational.

7 Conclusions
This paper presents the notion of virtual processors as the unifying concept in the design of operating sys-
tems for massively parallel computing. We propose that several well-recognized activities of the operating
10
system can be viewed as operations or manipulations on the set of virtual processors. To illustrate the
applicablility of the virtual processor framework, we present preliminary analyses of spatial and temporal
scheduling.
We believe that the many conceptual bene ts of founding an MPP operating system on virtual processors
will outweigh the overheads in terms of performance. However, a de nitive answer will require extensive
experimentation and analysis.

References
[1] Tom Blank. Personal communications, 1993. MasPar Computer Corporation, Sunnyvale, CA.
[2] Guy E. Blelloch. Vector Models for Data-parallel Computing. MIT Press, Cambridge, MA, 1990.
[3] Walter S. Brainerd, Charles H. Goldberg, and Jeanne C. Adams. Programmer's guide to Fortran 90.
McGraw-Hill Book Co., 1990.
[4] M. Crovella et al. Multiprogramming on multiprocessors. In Proceedings of the Third IEEE Sympo-
sium on Parallel and Distributed Processing, pages 590{597, Dec, 1991.
[5] R. Cytron, J. Lipkis, and E. Schonberg. A computer-assisted approach to SPMD execution. In
Proceedings of Supercomputing '90, pages 398{406, Nov, 1990.
[6] Heshan El-Rewini, Theodore G. Lewis, and Heshan H. Ali. Task Scheduling in Parallel and Distributed
Systems. Prentice Hall, Englewood Cli s, New Jersey 07632, 1994.
[7] Philip J. Hatcher and Michael J. Quinn. Data-parallel programming on MIMD computers. MIT Press,
Cambridge, MA, 1991.
[8] W. D. Hillis. The Connection Machine. MIT Press, Cambridge, Mass., 1985.
[9] Kent K. Koeninger. Personal communications, 1994. Cray Research Corp., Minneapolis, MN.
[10] T. Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Mor-
gan Kaufmann Publishers, Inc., San Mateo, CA 94403, 1992.
[11] Tom Leighton. Methods for message routing in parallel machines. In 24th Annual ACM Symposium
on Theory of Computing, pages 77 { 95, 1992.
[12] David B. Loveman. High Performance Fortran. IEEE Parallel and Distributed Technology, 1(1):25 {
42, 1993.
[13] S.T. Luetenegger and M.K. Vernon. The performance of multiprogrammed multiprocessor scheduling
policy. In Performance Evaluation Review, pages 226{236, May, 1990.
[14] C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed
shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146{178, 1993.
[15] C. McCann and J. Zoharjan. Processor allocation policies for message-passing parallel computers. In
Performance Evaluation Review, pages 19{32, May, 1994.
[16] Michael Metcalf and John Reid. Fortran 90 Explained. Oxford University Press, New York, 1990.

11
[17] Wilfried Oed. The Cray Research Massively Parallel Processor System CRAY T3D. available by
anonymous ftp from ftp.cray.com, November 1993.
[18] David A. Patterson and John L. Hennessy. Computer Organization and Design: The Hard-
ware/Software Interface. Morgan Kaufmann Publishers, San Mateo, CA, 1994.
[19] David M. Ray. Personal communications, 1994. Thinking Machines Corporation, Cambridge, MA.
[20] Gary Sabot. The Paralation Model: Architecture-Independent Parallel Programming. MIT Press,
Cambridge, MA, 1988.
[21] Vivek Sarkar. Partitioning and Scheduling Parallel Programs for Multiprocessors. Pitman Publishing,
128, Long Acre, London WC2E 9AN, 1989.
[22] S. K. Setia, M.S. Squillante, and S.K. Tripathi. Analysis of processor allocation in multiprogrammed,
distributed-memory parallel processing system. IEEE Transaction on Parallel and Distributed Sys-
tems, 5(4):401{430, April, 1994.
[23] K. C. Sevcik. Characterization of parallelism in applications and their use in scheduling. In Perfor-
mance Evaluation Review, pages 171{180, May, 1989.
[24] Thinking Machines Corporation, Cambridge, MA. *Lisp release notes, 1987.
[25] Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Sum-
mary, October 1991.

12

You might also like