0% found this document useful (0 votes)
22 views

Chapter 3

Uploaded by

Legesse Samuel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Chapter 3

Uploaded by

Legesse Samuel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CHAPTER 3-PROCESSES

1
INTRODUCTION
 The concept of a process originates from the field of OS.
 process is a program in execution.
 From OS perspective, the management and scheduling of
processes are the most important issue.
 other important issues arise in distributed systems;
 multithreading;
 to efficiently organize client-server systems;

 to enhance performance by overlapping communication and local

processing.
 virtualization:
 allows an application and its environment to run concurrently

with other applications independent of the hardware and


platforms, leading to a high degree of portability.
 process or code migration;
 moving processes between different machines (in wide area DS);

 can help in achieving scalability, 2

 can also help to dynamically configure clients and servers.


3.1. THREADS
 to execute a program, an OS creates a number of virtual
processors, each one for running a different program.
 to keep track of these virtual processors, the OS has a
process table, containing entries to store CPU register
values, memory maps, open files, accounting information,
privileges, etc.
 a process is a program that is currently being executed
on one of the OS’s virtual processors.
 there are usually many processes executing concurrently.

ƒprocesses should not interfere with each other; sharing


resources by processes is transparent.
 this concurrency transparency has a high price;
allocating resources for a new process and context
switching take time. 3
CONT…
 A thread is a single sequential flow of control within a
program.
 A thread also executes independently from other threads; but
no need of a high degree of concurrency transparency thereby
resulting in better performance.
 A Web browser is an example of a multithreaded
application. Within a typical browser, you can:
 scroll a page while it’s downloading an applet or an image,
 play animation and sound concurrently,
 print a page in the background while you download a new page

 A main contribution of threads in distributed systems is that


they allow clients and servers to be constructed.

 threads can be used in both:


 Non-Distributed systems 4
 Distributed systems
3.1.1. THREADS IN NON-DISTRIBUTED SYSTEMS
 a process has an address space(containing program
text and data) and a single thread of control, as well as
other resources such as open files, child processes,
accounting information, etc.

(a) (b)
Fig.3.1. (a) three processes each with one thread. 5

(b) one process with three threads.


CONT…
 each thread has its own program counter, registers,
stack, and state; but all threads of a process share
address space, global variables and other resources such
as open files, etc.

Fig.3.2. Threads
CONT…

 Threads allow multiple executions to take place in the


same process environment, called multithreading.
 Multithreading provides concurrency with less overhead;
 i.e. less transparency; application must provide memory
protection for threads.
 Thread Usage – Why do we need threads?
 simplifying the programming model: since many activities
are going on at once more or less independently
 they are easier to create and destroy than processes since
they do not have any resources attached to them
 performance improves by overlapping activities if there is
too much I/O;
 i.e., to avoid blocking when waiting for input or doing calculations, say
in a spreadsheet
7
 real parallelism is possible in a multiprocessor system
CONT…
 in non-distributed systems, threads can be used with
shared data instead of processes to avoid context
switching overhead in interprocess communication (IPC).

Fig.3.3. Context switching as the result of IPC 8


THREAD IMPLEMENTATION
 Threads are often provided in the form of a thread
package.
 Such a package contains operations to create and destroy
threads as well as operations on synchronization
variables such as mutexes and condition variables.
 The two approaches to implement a thread package are:
user-level and kernel-level thread.
user-level thread:
 to construct a thread library that is executed entirely in
user mode.
 the OS is not aware of threads
 Advantages:
 it is cheap to create and destroy threads; just allocate and free
memory
 context switching can be done in just a few instructions; 9
store and reload only CPU register values
CONT…
 Drawback:
 invocation of a blocking system call will block the entire
process to which the thread belongs, and all the other
threads in that process.
kernel-level thread:
 let the kernel be aware of threads and schedule them
 implementing threads in the OS’s kernel
 ƒ
expensive for thread operations such as creation, deletion,
synchronization since each requires a system call.

 solution: use a hybrid form of user-level and kernel-


level threads, called lightweight processes (LWP).

10
LIGHTWEIGHT PROCESSES (LWP)
 a LWP runs in the context of a single (heavy-weight)
process, and there can be several LWPs per process.

 the system also offers a user-level thread package;


 for creating and destroying threads and
 to provide facilities for thread synchronization, such as
mutexes and condition variables.

 the important issue is that the thread package is


implemented entirely in user space.
 in other words, all operations on threads are carried out
without intervention of the kernel.

11
CONT…
 the thread package can be shared by multiple LWPs,
as shown in the figure 3.4.
 this means that each LWP can be running its own (user-
level) thread

Fig. 3.4. combining kernel-level lightweight processes & 12


user-level threads
CONT…
 Advantages of using LWPs in combination with a user-
level thread package:
 creating, destroying, and synchronizing threads is relatively
cheap and involves no kernel intervention at all.
 a blocking system call will not suspend the entire process.
 there is no need for an application to know about the LWPs.
 all it sees are user-level threads

 LWPs can be easily used in multiprocessing environments,


by executing different LWPs on different CPUs.
 This multiprocessing can be hidden entirely from the

application

 Drawback:
 need to create and destroy LWPs, which is just as expensive
as with kernel-level threads. 13
3.1.2. THREADS IN DISTRIBUTED SYSTEMS
 threads allow blocking system calls without blocking
the entire process;
 this means multiple logical connections (communications)
can be established at the same time.

 threads gain much of their power by sharing an address


space
 No shared address space in distributed systems

 individual processes; e.g., a client or a server, can be


multithreaded to improve performance

14
CONT…
Multithreaded Clients:
 the main advantage is hide communication latency.
 addresses delays in downloading documents from web servers
in a WAN.
 The usual way to hide communication latencies is to
initiate communication and immediately proceed
with something else.
 Example: consider web browsers:
 fetching different parts of a page can be implemented as a
separate thread
 each opening its own TCP connection to the server
 each can display the results as it gets its part of the page

15
CONT…
 Hide latency by starting several threads
 One to download text (display as it arrives)
 Others to download photographs, figures, etc.

 parallelism can also be achieved for replicated servers


since each thread request can be forwarded to separate
replicas.
 if servers are replicated, the multiple threads may be sent to
separate sites.
 As a result; data can be downloaded in several parallel
streams, improving performance

16
CONT…
Multithreaded Servers:
 servers can be constructed in three ways:

A. Single-threaded process
 it processes one request at a time
 it gets a request, examines it, carries it out to completion
before getting the next request
 while waiting for the disk, the server is idle and does not
process any other requests;
 consequently, requests from other clients cannot be
handled

17
CONT…
B. Threads: (Multi-threaded)
 threads are more important for implementing servers

 e.g., a file server


 the dispatcher thread reads incoming requests for a file
operation from clients and passes it to an idle worker thread.
 the worker thread performs a blocking disk read; in which
case another thread may continue, say the dispatcher or
another worker thread

Fig 3.5. a multithreaded


server organized in a
dispatcher/worker model

18
CONT…
C. Finite-state machine
 if threads are not available
 it gets a request, examines it, tries to fulfill the request
from cache, else sends a request to the file system;
 but instead of blocking it records the state of the current
request and proceeds to the next request
 but hard to program
Summary

19
3.2. ANATOMY OF CLIENT
A. (Networked) User Interfaces:
 A major task of client machines is to provide the means
for users to interact with remote servers.

 There are two ways for this interaction:


 First, for each remote service the client machine will
have a separate counterpart that can contact the
service over the network.
 Second, the client machine provides direct access to
remote services by offering a convenient user interface.
 this means that the client machine is used only as a terminal
with no need for local storage

 In the case of networked user interfaces, everything is


20
processed and stored at the server.
CONT…

Fig 3.6.
a networked application with its own a general solution to allow access to
protocol remote applications

Fat client: Thin client:


 each remote application has  the client is basically a terminal
two parts: one on the client, and does little more than provide
one on the server. a GUI interface to remote services
 communication is application
specific 21
CONT…
B. Client-Side Software:
 in addition to the user interface, parts of the processing
and data level in a client-server application are executed
at the client side.
 an example is embedded client software for ATMs, cash
registers, etc.
 moreover, client software can also include components to
achieve distribution transparency;
 Access transparency: Client side stubs hide communication
and hardware details.
 Location, migration, and relocation transparency rely on
naming systems; (e.g., when a server changes location, the
client software can be informed without the user knowing )
 Failure transparency (e.g., client middleware can make
22
multiple attempts to connect to a server)
CONT…
 replication transparency:
 e.g. assume a distributed system with replicated servers;
 the client proxy can send requests to each replica and a
client side software can transparently collect all responses
and passes a single return value to the client application

23
Fig 3.7., Transparent replication of a server using a client-side solution
3.3. ANATOMY OF SERVERS
3.3.1. General Design Issues
 a server is a process implementing a specific service
on behalf of a collection of clients.
 each server is organized in the same way; it waits until a
request arrives.
A. How to organize servers?
 iterative server:
 the server itself handles the request and, if necessary, returns
a response to the requesting client.
 concurrent server:
 a concurrent server does not handle the request itself, but
passes it to a separate thread or another process, after
which it immediately waits for the next incoming request.
24
 a multithreaded server is an example of a concurrent server
CONT…
B. Where do clients contact a server?
 clients send requests to an end point, also called a
port, at the machine where the server is running.
 Each server listens to a specific end point.

 How do clients know the end point of a service?


 globally assign end points for well-known services;
 e.g. FTP is on TCP port 21, HTTP is on TCP port 80
 these end points have been assigned by the Internet

Assigned Numbers Authority (lANA).


 with assigned end points, the client only needs to find the
network address of the machine where the server is running.
 for services that do not require pre-assigned endpoints, it
can be dynamically assigned by the local OS;
 a client will first have to look up the end point 25
CONT…
 IANA Ranges:
 IANA divided the port numbers into three ranges

 Well-known ports:
 assigned and controlled by IANA for standard services,

 e.g., DNS uses port 53

 Registered ports:
 are not assigned and controlled by IANA;

 can only be registered with IANA to prevent duplication

 e.g., MySQL uses port 3306

 Dynamic ports : neither controlled nor registered by IANA26


CONT…
 how can the client know endpoints that are not well-
known? two approaches:
i. have a daemon running (on each machine that runs
servers) and listening to a well-known endpoint;
 it keeps track of all endpoints of services on the collocated
server
 the client will first contact the daemon which provides it

with the endpoint, and then the client contacts the specific
server

27

Fig. 3.8. Client-to-server binding using a daemon


CONT…

ii. use a superserver (as in UNIX) that listens to all


endpoints and then forks a process to take care of the
request;
 this is instead of having a lot of servers running
simultaneously and most of them idle

Fig. 3.9. Client-to-server binding using a superserver 28


CONT…

C. Whether and how a server can be interrupted?

 for instance, a user may want to interrupt a file


transfer; may be it was the wrong file
 let the client exit the client application;
 this will break the connection to the server;
 the server will tear down the connection assuming that the
client had crashed
OR
 let the client send out-of-bound data;
 data to be processed by the server before any other data from
the client;
 the server may listen on a separate control endpoint; or send it

on the same connection as urgent data as is in TCP


29
CONT…
D. Whether or not the server is stateless
 A stateless server does not keep information on the
state of its clients, and can change its own state without
having to inform any client
 Example: a web server which honors HTTP requests
doesn’t need to remember which clients have contacted it.
 A stateful server maintains information on its clients.
 the information needs to be explicitly deleted by the server.
 Example: a file server that allows a client to keep a local
copy of a file and can make update operations
 such a server would maintain a table containing (client, file)
entries.
 improve the performance of read and write operations
 but requires a recovery procedure in case of a server
crash;
30
 a stateful server needs to recover its entire state as it was
just before the crash
CONT…
3.3.2. Server Clusters
 A server cluster is a collection of machines connected
through a network, where each machine runs one or
more servers.
 the machines are connected through a LAN, with high
bandwidth and low latency.
 it is logically organized into three tiers
 the first tier consists of a (logical) switch through which
client requests are routed.
 the second tier consists of (application/compute) servers
through which data is processed.
 the third tire consists of data-processing servers; eg. File
servers and database servers;
 for other applications, the major part of the workload may be here
31
CONT…

32
Fig. 3.10. The general organization of a three-tiered server cluster
CONT…

 Distributed Servers
 the problem with a server cluster is when the logical
switch (single access point) fails making the
cluster unavailable.
 to eliminate this potential problem, several access
points can be provided where the addresses are
publicly available leading to a distributed server.
 For example, the Domain Name System (DNS) can
return several addresses, all belonging to the same host
name.

33
3.4. CODE MIGRATION
 So far, we have been mainly concerned with distributed
systems in which communication is limited to passing
data.
 However, there are situations in which passing programs,
even while they are running, and also in heterogeneous
systems; simplifies the design of a distributed system.

 code migration in distributed systems took place in the


form of process migration in which an entire process
was moved from one machine to another.

 code migration also involves moving data as well:


 when a program migrates while running, its status, pending
signals, and other environment variables such as the stack
and the program counter also have to be moved. 34
CONT…
 Reasons for Migrating Code:
 to improve performance
 move processes from heavily-loaded to lightly-loaded machines
(load balancing)
 to reduce communication
ƒ
 move a client application that performs many database
operations to a server if the database resides on the server;
then send only results to the client
 to exploit parallelism (for nonparallel programs)
ƒ
 e.g., copies of a mobile program (called a mobile agent or a
crawler ) moving from site to site searching the web.
 to have flexibility
 by dynamically configuring distributed systems; instead of
having a multi-tiered client-server application deciding in
advance which parts of a program are to be run where. 35
CONT…

Fig. 3.11. the principle of dynamically configuring a client to


communicate to a server; the client first fetches the necessary
software, and then invokes the serve 36
CLIENT-SERVER EXAMPLES
 Example 1: Send Client code to Server
 Server manages a huge database.
 If a client application needs to perform many database
operations, it may be better to ship part of the client
application to the server and send only the results across the
network.
 Example 2: Send Server code to Client
 In many interactive DB applications, clients need to fill in
forms that are subsequently translated into a series of DB
operation where validation at server side is required.

37
3.4.1. MODELS FOR CODE MIGRATION
 communication in distributed systems is concerned
with exchanging data between processes.
 code migration deals with moving programs
between machines, with the intention to have those
programs be executed at the target.
 in some cases, as in process migration, the execution
status of a program, pending signals, and other parts of
the environment must be moved as well.

 To get a better understanding of the different models for


code migration, we use a framework described in
Fuggetta et al. (1998).
38
CONT…

 In this framework, a process consists of three segments;


code segment, resource segment, and execution
segment.

 The code segment is the part that contains the set of


instructions that make up the program that is being
executed.
 The resource segment contains references to external
resources needed by the process, such as files, printers,
devices, other processes, and so on.
 The execution segment is used to store the current
execution state of a process, consisting of private data,
the stack, and the program counter.
39
CONT…

1. Weak Mobility
 transfer only the code segment and may be some
initialization data;
 process can only migrate before it begins to run, or
perhaps at a few intermediate points.
 the feature of weak mobility is that a transferred
program is always started from its initial stage.
 e.g. Java Applet (which always start execution from the
beginning)
 The benefit of this approach is its simplicity.
 it requires only that the target machine can execute that code.

40
CONT…
 In case of week mobility, the migrated code is executed
by the target process (in its own address space) or a
separate process.
 For example:
 Java applets are simply downloaded by a web browser and
are executed in the browser's address space.
 Advantage:
 no need to start a separate process, thereby avoiding
communication at the target machine.
 Drawback:
 the target process needs to be protected against malicious or
inadvertent code executions.
 a simple solution is to let the operating system take care of
that by creating a separate process to execute the migrated41
code.
CONT…
2. Strong Mobility
 transfer code segment and execution segment.
 processes can migrate after they have already started
to execute.
 its feature is that a running process can be stopped,
subsequently moved to another machine, and then
resume execution where it is stopped.
 it is much harder to implement
 can also be supported by remote cloning; having an
exact copy of the original process and running on a
different machine.
 the cloned process is executed in parallel to the original
process.
42
 UNIX does this by forking a child process and letting
that child continue on a remote machine.
CONT…

 migration can be: sender-initiated and receiver-initiated.


 Sender-initiated:
 migration is initiated at the machine where the code
currently resides or is being executed.
 Example:
 uploading programs to a server; requires that the client
has previously been registered and authenticated at
that server.
 sending a search program across the internet to a web

database server to perform the queries at that server

43
CONT…

 Receiver-initiated:
 the initiative for code migration is taken by the target
machine.
 Example: Java applets.
 code migration occurs between a client and a server,
where the client takes the initiative for migration.
 the server is generally not interested in the client's
resources. Instead, code migration to the client is done
only for improving client-side performance.

44
 Summery for models of code migration

45
Fig. 3.12. Alternatives for code migration
3.4.2. MIGRATION AND LOCAL RESOURCES
 So far, only the migration of the code and execution
segment has been taken into account.

 What often makes code migration so difficult is that the


resource segment cannot always be simply transferred
along with the other segments without being changed.
 For example:
 suppose a process holds a reference to a specific TCP port
through which it was communicating with other (remote)
processes.
 Such a reference is held in its resource segment.
 When the process moves to another location, it will have to
give up the port and request a new one at the destination.
46
CONT…
 To understand the implications that code migration has on
the resource segment, there are three types of process-to-
resource bindings.
1. Binding by Identifier: the strongest binding
 when a process refers to a resource by its identifier.
 the process requires the referenced resource.
 eg., when a process uses a URL to refer to a specific web site or IP
to refer to FTP server.
2. Binding by Value: the weaker binding
 when only the value of a resource is needed.
 in this case, another resource can provide the same value; it doesn’t
affect the execution of the process.
 eg., when a program relies on standard libraries of programming
languages such as C or Java which are normally locally available,
but their location in the file system may vary from site to site.
3. Binding by Type: the weakest binding
 when a process needs only a resource of a specific type; reference to
47
a resource by a type.
 e.g., local devices such as a printer or a monitor, and so on.
CONT…
 in migrating code, we need to change the references to
resources; ƒhow reference should be changed? depends on
whether the resource can be moved along with the code,
i.e., resource-to-machine binding
ƒTypes of Resource-to-Machine Bindings
1. Unattached Resources: can be easily moved between
different machines with the migrating program (such
as data files associated with the program)
2. ƒFastened Resources: moving or copying may be
possible, but more expensive; such as local databases
and complete web sites;
3. ƒFixed Resources: intimately bound to a specific
machine or environment and cannot be moved; such as
local devices. 48
CONT…
 when migrating code, we have nine combinations to
consider;

Fig. 3.13. Actions to be taken with respect to the references to local


resources when migrating code to another machine
49
CONT…
 when a process is bound to a resource by identifier;
 when the resource is unattached:
 it is best to move it along with the migrating code.,
 but when the resource is shared by other processes; an
alternative is to establish a global reference, that is, a
reference that can cross machine boundaries
 an example of such a reference is a URL.

 when the resource is fastened or fixed:


 the best solution is to create a global reference.

50
CONT…
 when a process is bound to a resource by value:
 when the resource is fixed:
 occurs when a process assumes that memory can be shared
between processes.
 establishing a global reference (means need to implement a

distributed form of shared memory)


 this is not efficient solution

 When the resource is fastened:


 are typically runtime libraries
 copies of the resources are available on the target machine,

 establishing a global reference is a better alternative when

huge amounts of data are to be copied


 When the resource is unattached:
 the best solution is to copy (or move) the resource to the new
51
destination
 establishing a global reference is the other option.
CONT…
 when a process is bound to a resource by type:
 irrespective of the resource-to-machine binding;
 the solution is to rebind the process to a locally available
resource of the same type.
 when a resource is not available, copy or move the
original one to the new destination, or establish a global
reference.

52
3.4.3. MIGRATION IN HETEROGENEOUS SYSTEMS
 So far, we have assumed that the migrated code can be
easily executed at the target machine when dealing with
homogeneous systems.
 However, distributed systems are constructed on a
heterogeneous collection of platforms, each with its
own OS and machine architecture.
 Migration in such systems requires:
 each platform is supported, i.e. the code segment can be
executed on each platform.
 the execution segment can be properly represented at each
platform.
 Heterogeneity problems are similar to those of portability.

53
CONT…
 heterogeneity can be addressed by providing process
virtual machines:
 for scripting languages directly interpret the migrated code
at the host site.
 for Java interpret intermediate code generated by a compiler

 A virtual machine encapsulates an entire computing


environment.
 if properly implemented, the virtual machine provides
strong mobility since local resources may be part of
the migrated environment.
 The reasons for wanting to migrate entire environments
is that it allows continuation of operation while a
machine needs to be shutdown.
54
CONT…
 For example:
 in a server cluster, the systems administrator may
decide to shutdown or replace a machine, but will not
have to stop all its running processes.
 instead, it can temporarily freeze an environment,
move it to another machine (where it sits next to other,
existing environments), and simply unfreeze it again.
 this is an extremely powerful way to manage long-
running compute environments and their processes.

55
END

56

You might also like