0% found this document useful (0 votes)
36 views

Final

The document discusses parallel processing and the CPU. It covers: 1. Flynn's taxonomy which classifies computing systems as SISD, SIMD, MISD, or MIMD based on the number of instruction and data streams processed simultaneously. 2. Pipelining which increases CPU performance by dividing instruction processing into stages to allow overlapping of execution. The maximum speedup of a pipeline is equal to the number of stages. 3. Dependencies like structural hazards, control hazards, and data hazards can cause stalls in a pipeline and affect performance. Techniques like renaming help reduce structural hazards.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Final

The document discusses parallel processing and the CPU. It covers: 1. Flynn's taxonomy which classifies computing systems as SISD, SIMD, MISD, or MIMD based on the number of instruction and data streams processed simultaneously. 2. Pipelining which increases CPU performance by dividing instruction processing into stages to allow overlapping of execution. The maximum speedup of a pipeline is equal to the number of stages. 3. Dependencies like structural hazards, control hazards, and data hazards can cause stalls in a pipeline and affect performance. Techniques like renaming help reduce structural hazards.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

- Slides in our Google classroom: https://classroom.google.

com/u/1/c/NTQzOTMxNzU5OTU4
- Exercises in the classroom
- Textbook: Patterson, D. A., & Hennessy, J. L. (2014). Computer organization and design: the hardware/software interface 5th ed.

Topic 1:
- Classroom: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NDk3ODk1ODE5Mjcz/details
- primary function of the CPU
A processor (CPU) is the logic circuitry that responds to and processes the basic instructions that drive a computer. The CPU is seen as the
main and most crucial integrated circuitry (IC) chip in a computer, as it is responsible for interpreting most of computers commands.
CPUs will perform most basic arithmetic, logic and I/O operations, as well as allocate commands for other chips and components
running in a computer.
The basic elements of a processor include:
ALU ( Arithmetic Logic Unit), FLU ( Floating point unit), Register, Cache, RAM, CU (Control Unit), IC ( Integrated Circuit), … .
The four primary functions of a processor are: fetch, decode, execute and write back.
The processor in a personal computer or embedded in small devices is often called a microprocessor. That term means that the processor's
elements are contained in a single IC chip.
Multi-core ( multi-core processor—a IC chip containing more than one CPU), which means that the IC contains two or more processors for
enhanced performance, reduced power consumption and more efficient concurrent processing of multiple tasks.
Topic 2:
- Flynn’s taxonomy: SISD, SIMD, MISD, MIMD
Parallel computing is a computing where the jobs are broken into discrete parts that can be executed concurrently. Each part is further
broken down to a series of instructions and execute concurrently on different CPUs. Parallel systems deal with the concurrently use of
multiple computer resources that can include a single computer with multiple processors, a number of computers connected by a
network or a combination of both.
Based on the number of instruction and data streams that can be processed simultaneously, computing systems are classified into four major
categories:
Single-instruction, single-data (SISD) systems:

is a uniprocessor machine which is capable of executing a single instruction, operating on a single data stream, machine
instructions are processed in a sequential manner (1 cách tuần tự), computers adopting this model are called sequential computers. Most
conventional computers have SISD architecture. All the instructions and data to be processed have to be stored in primary memory.
Single-instruction, multiple-data (SIMD) systems:

is a multiprocessor machine capable of executing the same instruction on all the CPUs but operating on different data streams.
Machines based on an SIMD model are well suited to scientific computing since they involve lots of vector and matrix operations. So that
the information can be passed to all the processing elements (PEs) organized data elements of vectors can be divided into multiple sets(N-
sets for N PE systems) and each PE can process one data set. Dominant representative SIMD systems is Cray’s vector processing machine.
Multiple-instruction, single-data (MISD) systems:

is a multiprocessor machine capable of executing different instructions on different PEs but all of them operating on the same
dataset The system performs different operations on the same data set. Machines built using the MISD model are not useful in most of the
application.
Multiple-instruction, multiple-data (MIMD) systems:

is a multiprocessor machine which is capable of executing multiple instructions on multiple data sets. Each PE in the MIMD model
has separate instruction and data streams; therefore machines built using this model are capable to any kind of application. Unlike SIMD
and MISD machines, PEs in MIMD machines work asynchronously.
- Pipeline: pipelined system has 3 stages,… What is the maximum speedup?
Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is increased.
Pipelined system has three stages are Inserting, Filling, Sealing ( I, F, S).
Design of a basic pipeline:
A pipeline has two ends, the input end and the output end, interface registers are used to hold the intermediate output
between two stages and they are also called latch or buffer.
All stages and interface registers are controlled by a common clock.
Excution in a pipelined processor: Example a processor having 4 stages and let there be 2 instructions to be excuted:
Non-overlapped execution:

Total time = 8 Cycle.


Overlapped excution:
Total time = 5 Cycle. Pipeline Stages RISC ( Reduce Instruction Set Computer) processor has 5 stage instruction pipeline to
exxute all the instructions in the RISC Instruction set:
Stage 1: Fetch, CPU reads instructions from the address in the memory whose value is present in the program counter.
Stage 2: Decode, instruction is decoded and the register file is accessed to get the values from the registers used in the
instruction.
Stage 3: Excute, ALU operations are performed.
Satge 4: Memory Access, memory operands are read and written from/to the memory that is present in the instruction.
Stage 5: Write Back, computed/fetched value is written back to the register present in the instructions.
Performance of a pipelined processor. Consider a ‘k’ segment pipeline with clock cycle time ( cycle time is the time from the start of
work until the instruction is ready for delivery or the time between two consecutive finished products) as ‘Tp’. Let there be ‘n’ tasks to be
completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’
instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ET pipeline = k + n – 1 cycles
= ( k + n – 1 ) * Tp.
In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will be:
ET non-pipeline = n * k * Tp.
So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’ tasks are executed on the same processor is:
S = Performance of pipelined processor /Performance of non-pipelined processor
Because the performance of a processor is inversely proportional to the execution time, we have:
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k:
S = n * k / n = k.
Efficiency = Given speed up ( S) / Max speed up ( S max) = S / Smax
We know that Smax = k .So, Efficiency = S / k.
Throughput = Number of instructions / Total time to complete the instructions = n / (k + n – 1) * Tp
The cycles per instruction (CPI) value of an ideal pipelined processor is 1
Dependencies in a pipelined processor:
There are three types of dependencies possible in a pipelined processor:
1/ Structural Dependency.

2/ Control Dependency.

3/ Data Dependency.

These dependencies may introduce stalls ( stall is a cycle in the pipeline without new input) in the pipeline.

Structural Dependency:

This dependency arises because the resource conflict in the pipeline. A resource conflict is a situation when more than one instruction
tries to access the same resource in the same cycle. A resource can be a register, memory, or ALU. Example:

In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource (Memory) which introduces a resource conflict.
To avoid this problem, we have to keep the instruction on wait until the required resource (memory in our case) becomes available. This wait will
introduce stalls in the pipeline as shown below:

To minimize structural dependency stalls in the pipeline, we use a hardware mechanism called Renaming. We divide the memory into two independent
modules, one to store the instructions and one to store data ( or operands) separately called Code memory(CM) and Data memory(DM) respectively. CM
will contain all the instructions and DM will contain all the operands that are required for the instructions.

Control Dependency (Branch Hazards):

This type of dependency occurs during the transfer of control instructions such as BRANCH, CALL, JMP, etc. On many instruction
architectures, the processor will not know the target address of these instructions when it needs to insert the new instruction into the pipeline. Due to this,
unwanted instructions are fed to the pipeline.

Consider the following sequence of instructions in the program:

100: I1

101: I2 (JMP 250) ( JMP = jump) ( NOTE: Generally, the target address of the JMP instruction is known after ID stage only).

102: I3

250: I250

Expected output: I1 -> I2 -> I250.


Output Sequence: I1 -> I2 -> I3 -> I250 Because when I2 decodes, I3 is also loaded.

To correct the above problem we need to stop the Instruction fetch until we get target address of branch instruction. This can be done by
introducing delay slot until we get the target address.

Output Sequence: I1 -> I2 -> Delay (Stall) -> I250

- Parallel processing

 Parallel processing is a computing technique when multiple streams of calculations or data processing tasks co-occur through numerous
central processing units (CPUs) working concurrently. 

How Does Parallel Processing Work?


 In general, parallel processing refers to dividing a task between at least two microprocessors. Then, they designate a specific processor
for each part.
 There are two types of parallel processes: fine-grained and coarse-grained. Tasks communicate with one another numerous times per
second in fine-grained parallelism to deliver results in real-time or very close to real-time.
 After a task, the software will fit all the data fragments together, assuming that all the processors stay in sync. If computers are
networked to form a cluster, one can use those without multiple processors for parallel computing.

Types of Parallel Processing?

 There are various varieties of parallel processing, such as MMP, SIMD, MISD, SISD, and MIMD, of which SIMD is probably the most
popular

1. Single Instruction, Single Data (SISD)

In the type of computing called Single Instruction, Single Data , a single processor is responsible for simultaneously managing a single
algorithm as a single data source. Instructions are carried out sequentially by SISD, which may or may not be capable of parallel
processing, depending on its configuration.

However, one control unit is in charge of all functional units.


2. Multiple Instruction, Single Data (MISD)

Multiple processors are standard in computers that use the Multiple Instruction, Single   Data instruction set. MISD computers can
simultaneously perform many operations on the same batch of data. The MISD structure consists of many processing units, each
operating under its instructions and over a comparable data flow.

3. Single Instruction, Multiple Data (SIMD)

Computers that use the Single Instruction, Multiple Data architecture have multiple processors that carry out identical instructions. The
SIMD architecture has numerous processing components.

Multiple modules included in the shared subsystem aid in simultaneous communication with every CPU.

4. Multiple Instruction, Multiple Data (MIMD)

Multiple Instruction, Multiple Data, or MIMD, computers are characterized by the presence of multiple processors, each capable of
independently accepting its instruction stream. Additionally, each CPU draws data from a different data stream. Although MIMD
computers are more adaptable than SIMD or MIMD computers, developing the sophisticated algorithms that power these machines is
more challenging.
5. Single Program, Multiple Data (SPMD)

SPMD systems, which stand for Single Program, Multiple Data, are a subset of MIMD. SPMD is message passing programming used in
distributed memory computer systems. Each node launches its application and uses send/receive routines to send and receive messages
when interacting with other nodes.

6. Massively Parallel Processing (MPP)

A storage structure called Massively Parallel Processing is made to manage the coordinated execution of program operations by
numerous processors. MPP processors typically communicate through a messaging interface and can have up to 200 or more processors
working on an application. Although SISD computers can’t run in parallel on their own, a cluster can be created by connecting many of
them.

- Amdahl’s law

- Unpipelined processor, pipeline processor

Do exercises in Google classroom.

- Topic 2A, 2B, 2C: Pipeline excersises


Without Bypassing - 2nd instruction has to wait for RR (register read) until previous instruction has not completed writing into register (RW
first half cycle). Number of stalls = 4

1 2 3 4 5 6 7 8 9 10 11 12 13

ADD  R3 R1 + IF DE RR AL AL DM DM RW
R2

ADD  R5 R3 + IF DE X X X X RR A AL DM DM RW
R4 L

With Bypassing - 2nd instruction has to wait for execution (AL stage) until previous instruction has not completed Execution (AL stage).
Data from AL stage output of first instruction can be bypassed to AL stage input of second instruction. Number of stalls = 1

1 2 3 4 5 6 7 8 9 10

ADD  R3 R1 + R2 IF D RR AL A DM DM RW
E L

ADD  R5 R3 + R4 IF DE RR X AL AL DM D RW
M

Pair 2

Without Bypassing - 2nd instruction has to wait for RR (register read) until previous instruction has not completed writing into register (RW
first half cycle). Number of stalls = 4

1 2 3 4 5 6 7 8 9 10 11 12 13

LD     R2  [R1] IF DE RR AL AL DM DM RW

ADD  R4 R2 + IF DE X X X X RR AL AL DM DM RW
R3

With Bypassing - 2nd instruction has to wait for execution (AL stage) until previous instruction has not accessed memory and retrieved data
(DM stage). Data from DM stage output of first instruction can be bypassed to AL stage input of second instruction. Number of stalls = 3

1 2 3 4 5 6 7 8 9 10 11 12

LD     R2  [R1] I DE RR AL AL DM D RW
F M

ADD  R4 R2 + R3 IF DE R X X X AL AL DM DM RW
R

 
 

Pair 3

Without Bypassing - 2nd instruction has to wait for RR (register read) until previous instruction has not completed writing into register (RW
first half cycle). Number of stalls = 4

1 2 3 4 5 6 7 8 9 10 11 12 13

LD  R2  IF D RR A AL DM DM R
[R1] E L W

SD  R3  [R2] IF DE X X X X RR AL AL DM DM RW

With Bypassing - 2nd instruction has to wait for execution (AL stage) until previous instruction has not accessed memory and retrieved data
(DM stage). Data from DM stage output of first instruction can be bypassed to AL stage input of second instruction for calculating effective
address using R2. Number of stalls = 3

1 2 3 4 5 6 7 8 9 10 11 12

LD  R2  IF DE RR AL AL DM DM RW
[R1]

SD  R3  [R2] IF DE R X X X AL AL DM DM RW


R

Pair 4

Without Bypassing - 2nd instruction has to wait for RR (register read) until previous instruction has not completed writing into register (RW
first half cycle). Number of stalls = 4

1 2 3 4 5 6 7 8 9 10 11 12 13

LD  R2  IF D RR A AL DM DM R
[R1] E L W

SD  R2  [R3] IF DE X X X X RR AL AL DM DM RW

With Bypassing - 2nd instruction has to wait for memory access (DM stage) until previous instruction has not accessed memory and retrieved
data (DM stage). Data from DM stage output of first instruction can be bypassed to DM stage input of second instruction for R2 value.
Number of stalls = 1

1 2 3 4 5 6 7 8 9 10

LD  R2  IF D RR AL AL DM DM RW
[R1] E

SD  R2  [R3] IF DE RR AL AL X DM DM RW

- Classroom slides: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTQ0OTMzMTcxNTI3/details

Topic 5: Memory
- main memory access time
- Link: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTAyODIwNzU4NDYz/details
Topic 8:

- Multiprocessor with UMA & NUMA

+ Consists of many fully programmable processors each capable of executing its own program

+ Shared address space architecture

Including:

* Uniform Memory Access (UMA) Multiprocessors

* Non-Uniform Memory Access (NUMA) Multiprocessors

Multiple processors connected to a single centralized memory – since all processors see the same memory organization → uniform memory
access (UMA)

- DSM (Distributed Shared Memory): is a resource management component of a distributed operating system that implements the
shared memory model in distributed systems, which have no physically shared memory. 

- UMA: is a shared memory architecture used in parallel computers.

- NUMA: is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the
processor

- Compare UMA and NUMA


- Amdahl’s law

- Calculate upper bound on the multiprocessor performance in GIPS

Consider a shared-memory multiprocessor built around a single bus with a data bandwidth of x GB/s. Instructions and data words are 4 B wide,
each instruction requires access to an average of 1.4 memory words (including the instruction itself). The combined hit rate for caches is 98%.
Compute an upper bound on the multiprocessor performance in GIPS. Address lines are separate and do not affect the bus data bandwidth.

Solution:

We have hit = 98% =>miss = 2% = 0.02

Average memory word = 1.4

Instruction and data words: 4 B

Bus transfer: miss rate x average memory x instructions and data = 1.4 x 0.02 x 4 = 0.112B.

Data bandwidth: x GB/s

Absolute upper bound on performance: data bandwidth / bus transfer = x/0.112 = 8.93x GIPS.

Bus width: 32 B (default)

Bus clock rate: y GHz

Upper bound on the multiprocessor performance in GIPS: bus width x Bus clock rate x bus transfer = 32y x 8.93 = 285.76y = 286y GIPS

- Slides in Classroom: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTU4MjUxMTQ2MjQ2/details

Part B: OS

- Slides in classroom: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTYxODg1OTYzMTk2/details

- Excerises in the classroom

Chapter 1: Introduction

- Main purposes of an operating system:

+ Manage the computer's resources.


+ Establish a user interface.
+ Execute and provide services for applications software
- Services provided by an operating system

- System call
+ is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed
on. A system call is a way for programs to interact with the operating system.
+ provides the services of OS to the user programs via Application Program Interface (API).
+ provides an interface between a process and OS to allow user-level processes to request services of OS.
+ is the only entry points into the kernel system
+ programs needing resources must use system calls.
Services Provided by System Calls :
Process creation and management
Main memory management
File Access, Directory
File system management
Device handling (I/O)
Protection
Networking.
- System Programs, User programs
System Program:
+ can be defined as the act of building Systems Software using System Programming Languages.

+ System Programs can be divided into these categories:


File Management.
Status Information.
File Modification.
Programming-Language support.
Program Loading and Execution.
Communications.
User Program:
+ is a program that must communicate with a resource manager for some or all of its processing. A user program starts a
conversation with a resource manager to request a connection to a resource.
- OS structures
Monolithic structure:
Monolithic Architecture is like a big container, wherein all the software components of an app are assembled and tightly
coupled, each component fully depends on each other.
As you can see in the example all the services provided by the application (Customer Services, Cost Services, Product Services)
are directly connected. So if we want to change in code or something we have to change in all the services as well.
Advantage:
Fast execution speed.
Disadvantage:
Large and Complex Applications.
Slow Development.
Unscalable.
Unreliable.
Inflexible.
Layered structure:
Layered Structure is a type of system structure in which the different services of the operating system are split into various
layers, where each layer has a specific well-defined task to perform.
Design Analysis :
The whole Operating System is separated into several layers ( from 0 to n ) as the diagram shows. Each of the layers must have its
own specific function to perform. There are some rules in the implementation of the layers as follows.
1. The outermost layer must be the User Interface layer.
2. The innermost layer must be the Hardware layer.
3. A particular layer can access all the layers present below it but it cannot access the layers present above it. That is layer n-1 can
access all the layers from n-2 to 0 but it cannot access the nth layer.
Thus if the user layer wants to interact with the hardware layer, the response will be traveled through all the layers from n-1 to 1.
Each layer must be designed and implemented such that it will need only the services provided by the layers below it.

Advantages :
Modularity.
Easy debugging.
Easy update.
No direct access to hardware.
Abstraction.
Disadvantages :
Complex and careful implementation.
Slower in execution.
Microkernel:
Kernel is the core part of an OS that manages system resources. It also acts as a bridge between the application and
hardware of the computer. It is one of the first programs loaded on start-up (after the Bootloader).
A microkernel is one of the classifications of the kernel. Being a kernel it manages all system resources. But in a
microkernel, the user services and kernel services are implemented in different address spaces. The user services are kept in user
address space, and kernel services are kept under kernel address space, so it reduces the size of kernel and size of an operating system as
well.
It provides minimal services of process and memory management.
The communication between client program/application and services running in user address space is established through
message passing, reducing the speed of execution microkernel.
The OS remains unaffected as user services and kernel services are separated so if any user service fails it does not affect
kernel service.
It is easily extendible, if any new services are to be added they are added to user address space and hence require no
modification in kernel space.
Advantages of Microkernel
The architecture of this kernel is small and separate hence it can function better.
Expansion of the system is easier, it is simply added to the system application without disturbing the kernel.
It is also portable, secure, and reliable.
Hybrid structure:
is a structure that combines from the structures mentioned above instead of relying on just one structure.
For example, Linux architectures are monolithic, yet many functions are organized as loadable modules, allowing easy addition,
deletion, and addition to the kernel.
The system has the fast speed of monolithic structure, still using the modular principle to increase flexibility.
For example, the architecture of Windows is basically in the form of a microkernel, a lot of Windows components still operate in kernel mode
and share the same memory space so as not to affect speed.
For example, the iOS operating system combines a layered structure with a microkernel structure.
- Kernel mode and user mode
Kernel mode is the mode in which the program executes in it has full permissions to access and control the computer hardware..
For example, it is possible to change the contents of all registers.
User mode is the mode in which the programs executing in user mode have very limited access to and control of the hardware.
- distinction between kernel mode and user mode function:

Kernel-mode vs In kernel mode, the program has direct and In user mode, the application program executes and
User mode unrestricted access to system resources. starts.

In Kernel mode, the whole operating system In user mode, a single process fails if an interrupt
Interruptions might go down if an interrupt occurs occurs.

Kernel mode is also known as the master User mode is also known as the unprivileged mode,
Modes mode, privileged mode, or system mode. restricted mode, or slave mode.

Virtual address In kernel mode, all processes share a single In user mode, all processes get separate virtual
space virtual address space. address space.

Level of In kernel mode, the applications have more While in user mode the applications have fewer
privilege privileges as compared to user mode. privileges.

As kernel mode can access both the user


programs as well as the kernel programs While user mode needs to access kernel programs as it
Restrictions there are no restrictions. cannot directly access them.
Mode bit value The mode bit of kernel-mode is 0. While; the mode bit of user-mode is 1.

Memory It is capable of referencing both memory It can only make references to memory allocated for
References areas. user mode.

A system crash in kernel mode is severe and In user mode, a system crash can be recovered by
System Crash makes things more complicated. simply resuming the session.

Only essential functionality is permitted to User programs can access and execute in this mode for
Access operate in this mode. a given system.

The user mode is a standard and typical viewing mode,


The kernel mode can refer to any memory which implies that information cannot be executed on its
block in the system and can also direct the own or reference any memory block; it needs an
CPU for the execution of an instruction, Application Protocol Interface (API) to achieve these
Functionality making it a very potent and significant mode. things.

Chapter 3: Processes

- Process

- services provided by an operating system

- fork() operation

- distinction between kernel mode and user mode function

- Process schedule

- Classroom: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTA3NjIzMjM5OTM3/details

PROCESS:

- PROCESS MEMORY:

- PROCESS STATE:
- PCB (PROCESS CONTROL BLOCK):

- OPERATION ON PROCESS:

- PROCESS SCHEDULING:

- CONTEXT SWITCH:

- INTER-PROCESS COMMUNICATION (IPC):

o SHARED MEMORY:

o MESSAGE PASSING:
Chapter 4: MultiThread Programming
- Thread
Each execution unit of a process, i.e. a sequence of instructions allocated to the CPU for independent execution, is called an
execution thread.
Modern OSs support multithreading, allowing multiple threads to run at the same time.
In multithreading, processes and threads share text (program code), data (data), files (resources of that process). But each thread
is managed by a different TCB ( Thread Control Block) thread management block and has its own stack. The form of thread
management as well as process management. If one thread changes the variable's information, the other threads will recognize this
change.
Advantages:
Increase performance and save time (due to parallel execution).
Easily share resources and information.
Increased responsiveness.
Take advantage of processor architecture with multiple CPUs.
Convenient for organizing the program.
User level thread: created and managed by the application, the OS does not know the existence of such thread.

Kernel level thread: created and managed by the OS. The OS provides an API (application programming interface) for
applications that may require the creation, deletion, and modification of thread-related parameters.
- differences between user-level threads and kernel-level threads

S.
No. Parameters User Level Thread Kernel Level Thread

Kernel threads are implemented by Operating


1. Implemented by User threads are implemented by users. System (OS).

Operating System doesn’t recognize user Kernel threads are recognized by Operating
2. Recognize level threads. System.

3. Implementation Implementation of User threads is easy. Implementation of Kernel thread is complicated.

Context switch
4. time Context switch time is less. Context switch time is more.

Hardware Context switch requires no hardware


5. support support. Hardware support is needed.

If one user level thread performs blocking


Blocking operation then entire process will be If one kernel thread perform blocking operation
6. operation blocked. then another thread can continue execution.

Multithread applications cannot take


7. Multithreading advantage of multiprocessing. Kernels can be multithreaded.

Creation and User level threads can be created and Kernel level threads take more time to create and
8. Management managed more quickly. manage.

Operating Any operating system can support user- Kernel level threads are operating system-
9. System level threads. specific.

The thread library contains the code for The application code does not contain thread
thread creation, message passing, thread management code. It is merely an API to the
Thread scheduling, data transfer and thread kernel mode. The Windows operating system
10. Management destroying makes use of this feature.

11. Example Example: Java thread, POSIX threads. Example: Window Solaris.


S.
No. Parameters User Level Thread Kernel Level Thread

 User Level Threads are simple and


quick to create.
 Can run on any operating system  Scheduling of multiple threads that belong to
 They perform better than kernel same process on different processors is
threads since they don’t need to make possible in kernel level threads.
system calls to create threads.  Multithreading can be there for kernel routines.
 In user level threads, switching  When a thread at the kernel level is halted, the
12. Advantages between threads does not need kernel kernel can schedule another thread for the
mode privileges. same process.

 Multithreaded applications  on user-


level threads cannot benefit from  Transferring control within a process from one
multiprocessing. thread to another necessitates a mode switch
 If a single user-level thread performs a to kernel mode.
13. Disadvantages blocking operation, the entire process  Kernel level threads takes more time to create
is halted. and manage than user level threads.

- parent process and the child process


Running program is a process. From this process, another process can be created. There is a parent-child relationship between the two
processes. This can be achieved using a library function called fork(). fork() function splits the running process into two processes, the existing
one is known as parent and the new process is known as a child.
- Race conditions
A Race condition is a scenario that occurs in a multithreaded environment due to multiple threads sharing the same resource or
executing the same piece of code. If not handled properly, this can lead to an undesirable situation, where the output state is dependent on the
order of execution of the threads.
- Synchronization

Những process cùng tồn tại được gọi là process đồng thời ( concurrent process) hay còn được gọi là tiến trình tương
tranh.
Việc quản lý các concurrent process là vấn đề quan trọng đối với OS.
Các vấn đề đối với concurrent process:
a/ Process cạnh tranh tài nguyên
Các process luôn có nhu cầu sử dụng một số tài nguyên như memory, đĩa, thiết bị ngoại vi, kênh i/o.
Các process sẽ phải chờ đợi được cấp tài nguyên, làm ảnh hưởng thời gian thực hiện, bị ảnh hưởng gián tiếp
bởi process cạnh tranh.
Vấn đề đoạn nguy hiểm và đảm bảo loại trừ tương hỗ ( mutual exclusion).
Ví dụ: cả 2 process cùng yêu cầu tài nguyên là máy in, máy in chỉ có thể phục vụ 1 process tại 1 thời điểm.
Việc mutual exclusion đảm bảo rằng nếu một tiến trình đang sử dụng tài nguyên thì các process khác không
được sử dụng tài nguyên đó ( được gọi là tài nguyên nguy hiểm).
Vấn đề đoạn nguy hiểm và đảm bảo mutual exclusion:
Đoạn mã process trong đó chứa các thao tác truy cập tài nguyên nguy hiểm được gọi là đoạn nguy hiểm.
Hai process không được phép thực hiện đồng thời trong đoạn nguy hiểm của mình. Để giải quyết vấn đề
này cần xây dựng cơ chế cho phép phối hợp goạt động của OS và các process.
Không để xảy ra bế tắc ( deadlock)
Việc đảm bảo mutual exclusion có thể gây ra tình trạng deadlock.
Các process không thể thực hiện tiếp do chờ đợi nhau.
Ví dụ:
Hai process P1, P2 cần tài nguyên T1, T2.
Tại thời điểm ban đầu P1 được cấp T1 và P2 được cấp T2.
P1 phải chờ P2 giải phóng T2, P2 phải chờ P1 giải phóng T1.
 Deadlock xảy ra, không thể thực hiện tiếp.
Không để đói tài nguyên ( starvation)
Do mutual exclusion, process có thể rơi vào tình trạng đói tài nguyên, tức là chờ đợi quá lâu mà
không đến lượt sử dụng tài nguyên nào đó.
Ví dụ:
Có 3 process P1, P2, P3 cùng yêu cầu lặp đi lặp lại 1 tài nguyên, thứ tự ưu tiên từ 1 đến 3.
P1 và P2 lần lượt được cấp tài nguyên nhiều lần vì P1 được ưu tiên làm trước xong thì đến
P2, sau khi P2 làm xong lại thấy P1 và P3 xếp hang chờ tài nguyên, theo thứ tự ưu tiên P1 lại được làm trước P3.
P3 không bao giờ đến lượt => Không thể thực hiện mặc dù không hề có deadlock.
b/ Process hợp tác với nhau thông qua tài nguyên chung
Các process có thể chia sẽ vung bộ nhớ chung ( các biến toàn cục), hay các tập tin.
Làm nảy sinh vấn đề đồng bộ hóa ( Synchronization), yêu cầu tính nhất quán của data.
Đòi hỏi mutual exclusion, tránh deadlock và starvation.
Ví dụ: hai process P1, P2 cập nhật hai biến nguyên x, y như nhau:
Khởi đầu: Do điều độ P1, P2 có thứ tự thực hiện
x=y=2 x=y=2
P1: x = x + 1 P1: x = x + 1
P1: y = y + 1 P2: x = x * 2 ( P2 thực hiện do đây là vùng tài nguyên nguy
hiểm ( critical section), các P lần lượt thực hiện)
P2: x = x * 2 P2: y = y * 2
P2: y = y * 2 P1: y = y + 1
=> x == y x != y
Tính nhất quán của dữ liệu bị phá vỡ
Điều kiện chạy đua ( race condition) là tình huống một số thread hoặc process đọc và ghi data sử dụng chung và kết
quả phụ thuộc vào thứ tự các thao tác đọc, ghi.
Có thể giải quyết vấn đề bằng cách đặt toàn bộ thao tác truy cập và cập nhật data dung chung của mỗi process vào
đoạn nguy hiểm và sử dụng mutual exclution.
c/ Process có liên lạc nhờ gửi thông điệp
Các process có thể trao đổi thông tin trực tiếp với nhau bằng cách gửi thông điệp ( message passing).
Thông qua thư viện của ngôn ngữ lập trình hoặc bản thân OS.
Không chia sẻ hoặc cạnh tranh tài nguyên do đó không cần mutual exclusion.
Vẫn có thể xảy ra deadlock và starvation, ví dụ cả 2 process đang chờ thông điệp của nhau, process chờ đợi quá
lâu khi các process trình khác trao đổi thông điệp với nhau.
Giải pháp:
Mutual Exclution: chỉ 1 process được ở trong critical section.
Tiến triển: process thực hiện ở ngoài đoạn nguy hiểm không được phép ngăn cản process khác vào
critical section của mình.
Chờ đợi có giới hạn: process phải được vào critical section sau 1 khoảng thời gian hữu hạn nào đó.
Giải pháp phần mềm:
Giải thuật Peterson:
Là giải thuật cho bài toán critical section, nhằm phục vụ cho việc đồng bộ 2 process.
Hai process P0 và P1( thay i và j thành 0 và 1) đang là concurrent process. Giả sử chúng đang tranh giành nhau tài
nguyên tại critical section. Thuật toán Peterson sẽ tạo ra 2 biến toàn cục, một biến là int turn dung để nhận biết xem P0 hay
P1 đi vào critical section, một mảng bool flag có kích thước là 2 dùng để nhận biết process nào sãn sàng vào critical section.
Nếu flag[0] = True thì P0 đang sẵn sàng vào và ngược lại. Ban đầu ta đều đặt giá trị của 2 flag là False hết.
Hai process chạy cùng lúc:
P0 P1
1/ Flag[0] = True 2/ Flag[1] = True:
Cả 2 P0 và P1 đều sãn sàng vào critical section
3/ Turn = 1 4/ Turn = 0
P0 chuyển turn thành 1 để nhường cho P1 vào trước, P1 chuyển turn thành 0 để nhường P0 vào trước.
Turn sẽ mang giá trị do process nhường sau cùng tác động ( ở đây là P1 nên turn = 0)
4/ Điều kiện vòng lặp False 5/ Điều kiện vòng lặp True
Lúc này P0 do sai điều kiện lặp vô hạn nên được vào vùng critical section trước.
P1 do đúng điều kiện lặp vô hạn nên bị lặp ( lặp ở đây giống như 1 hình thức chờ).
6/ Truy cập xong thì đặt lại Flag[0] = False 7/ Điều kiện lặp False
Lúc này Flag[0] mới cập nhật giá trị thành False, P1 thoát khỏi vòng lặp vô hạn và được
truy cập vào critical section
8/ Flag[1] = False
Đặt lại tiếp Flag[1] để bắt xen kẻ như hồi bước 4 đến bước 7.

Thuật toán này phức tạp trong thực tế.


Process nằm ở trạng thái chờ đợi tích cực, vẫn phải sử dụng CPU để kiểm tra.
Process lặp đi lặp lại vòng lặp gây lãng phí CPU.
Giải pháp phần cứng:
Giải pháp thuộc nhóm phần cứng:
Cấm các interrupt ( ví dụ có 2 process, process 1 vào critical section và sử dụng interrupt để thực hiện 1
số công việc như show ra màn hình v.v, lúc này khi process 2 vào phần critical section và muốn dùng interrupt thì OS sẽ
không cho dùng vì process 1 đã dung rồi), tuy nhiên giải pháp này làm giảm tính mềm dẻo của OS.
Sử dụng lệnh máy đặc biệt, ví dụ:

Semaphore ( cờ hiệu):
Là phương pháp mutual exclusion không phụ thuộc vào hỗ trợ của phần cứng, sử dụng cờ hiệu hay đèn hiệu
( semaphore ) do Dijstra ( khứa này người Hà Lan)đề xuất.
Là kỹ thuật quản lý các concurrent process bằng cách sử dụng một số nguyên, số nguyên này được gọi là semaphore.
Semaphore là biến không âm và được chia sẽ giữa các threads. Biến này đượcs sử dụng để giải quyết vấn đề
critical section và để đạt được process synchronization ( đồng bộ hóa tiến trình) trong môi trường multiprocessing.
Một semaphore S là một giá trị nguyên mà ngoại trừ việc khởi tạo thì nó còn cho phép 2 toán tử wait() and
signal(). Vậy wait() và signal() là gì ?
wait() biểu thị bằng P ( viết tắt của proberen trong tiếng Hà Lan, nghĩa là test).
signal() biểu thị bằng V ( viết tắt của verhogen trong tiếng Hà Lan, nghĩa là tăng).
Bài toán triết gia ăn cơm:
Có 5 ông triết gia ngồi ở bàn tròn, giữa bàn là thức ăn và 5 CHIẾC đũa, sao cho bên phải và bên trái của
mỗi người có một CHIẾC đũa.

Công việc của họ suy nghĩ, khi ăn họ dừng suy nghĩ, sử dụng 2 chiếc đũa của mình.
Triết gia có thể nhặt các chiếc đũa từng chiếc một, theo thứ tụ bất kỳ.
Sau khi ăn xong, triết gia đặt 2 chiếc đũa xuống.
Ví 5 ông triết gia là 5 process.
Critical resources là đũa.
Critical section là thời gian ăn cơm.
Suy nghĩ là phần ngoài critical section.
Giải bài toán bằng thuật toán Semaphore
Mỗi chiếc đũa ( critical resources) là một Semaphore.
Thao tác NHẶT đũa sẽ được gọi Wait đối với cờ hiệu tương ứng.
Thao tác ĐẶT đũa xuống sẽ gọi là Signal.
Tạo ra 5 Semaphore ( 5 chiếc đũa) có giá trị bằng 1 và 5 ông triết gia ( process).
Cho 5 process chạy cùng lúc. Khi ông triết gia thứ i ăn thì sẽ phải đợi đũa thứ i và ( i + 1) % số lượng cờ hiệu, (
ở đây là ( i + 1) % 5). Giải thích phép toán ( i + 1) % 5 : Ví dụ j là một số tự nhiên tăng dần đến max, phép toán ( j+ 1) %
max để lấy giá trị j tiếp theo của j.
Sau khi đợi lấy đũa xong thì đủ điều kiện ăn cơm ( vào critical section). Sau khi ăn cơm thì ông triết gia phải
bỏ chiếc đũa thứ i và ( i + 1) % 5 xuống ( signal) sau đó chạy phần ngoài critical signal ( suy nghĩ)
- calculate the speedup gain of an application
- Classroom: https://classroom.google.com/c/NTQzOTMxNzU5OTU4/p/NTYyMTc0OTE5MTUy/details

You might also like