Chapter 14: Parallel Algorithms

This document discusses parallel algorithms and their analysis. It begins by introducing parallel processors and algorithms that can leverage multiple processors working simultaneously. It then discusses different models of parallelism like SIMD and MIMD. A key model discussed is the PRAM model where multiple processors share memory. Examples are given of parallel search, finding the maximum value, and computing AND/OR of arrays. These examples show how algorithms can be parallelized to run in poly-log time, defining a new complexity class called NC.

Uploaded by

abkavitharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views23 pages

Chapter 14: Parallel Algorithms

Uploaded by

abkavitharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Chapter 14: Parallel Algorithms

• As processing power continues to become cheaper, it is

natural to build machines with multiple processors
– parallel processors can execute multiple programs simultaneously
instead of concurrently
• Can we also write a program such that parallel processors
can work on the single program in parallel through some
form of cooperation?
– Yes, the solution will use a parallel algorithm
– Questions:
• How can we parallelize an algorithm?
• How can we handle multiple processors accessing memory at the same
time?
• How will the parallel algorithm impact the problem’s computational
complexity?
• What problems can be parallelized so that they obtain a speedup?
Parallelism vs. Sequentiality
• Suppose a sequential algorithm has a complexity of W(n)
in the worst case
– If we run this algorithm in parallel on p processors, the best we
can hope for is a complexity of W(n) / p
• Will we achieve this maximum speedup?
– Probably not as any data dependencies will cause some
processors to idle until the needed data becomes available
• Example from the book:
– We want to put our shoes and socks on
– Sequential algorithm: put on left sock, put on right sock, put on left shoe,
put on right show
– With 4 processors (hands), we cannot accomplish these 4 tasks in 1 unit as
the socks must be put on before the shoes
• We similarly could not perform insertion sort on the kth value in a list
before first inserting the first k-1 values in the list
Models of Parallelism
• SIMD: Single instruction, multiple data
– Issue a single instruction to be carried out on different
processors where each processor handles a different datum
– Within SIMD, we can subdivide models based on how
processors communicate
• Nearest Neighbor schemes:
– Hypercube: has 2d nodes where each node connects directly to exactly d
neighbors (see figure 14.2a for a hypercube with d = 3)
– Bounded degree: a node connects directly to d neighbors, but where the size
of the network is not restricted by d (see figure 14.3)
• PRAMs (covered next)
• MIMD: Multiple instruction, multiple data
– Each processor receives its own instruction and data to work on
(or more commonly, each processor receives its own process
and data to work on)
PRAMs
• PRAM: Parallel random access • PRAMs are impractical
machine – because of the complex
– A machine of p general-purpose nature of processor-to-
processors processor communication
and concurrent memory
– They all share RAM
accesses
• Each processor may have its own local
memory (e.g., cache, RAM or • They are convenient
registers) but all communication theoretical machines for
between processors takes place in the proving the complexity of
shared RAM
– Each processor knows its own id
parallel algorithms
(processor id or pid) – Therefore, while hypercube,
bounded degree and MIMD
– All processors are synchronized to machines are used in
read, execute and write at the same practice, we won’t bother
time by one control unit with these architectures for
– Shared memory can accommodate our consideration of parallel
concurrent reads algorithms
• We will consider how writes are
handled later
Parallel Algorithms for PRAMs
• The basic strategy for parallelizing an algorithm is as follows:
– Load the data (an array in our examples) into shared memory
– Each processor can access a given array value concurrently (each processor
will be interested in data from the array based on the pid)
– Each processor performs its operation
– Each processor writes its result to the array concurrently
• Note: as long as each processor is writing to a different memory location in
shared memory, RAM can handle all writes concurrently
– Concurrent reads/writes is not possible in practice
– Communication of a previous result to another processor is performed
through shared memory
– As time goes on, fewer processors may be needed, although in practice,
each processor could continue to execute as long as it does not affect the
result of a processor that is still needed
• Figure 14.3 demonstrates this idea where each processor starts with data pid and
pid+1 (processors are even numbered in this figure) and as time goes on, fewer
processors are used
Example: Parallel Search
• Search has improved from (n) to (n½) to (log n)
• With parallelism, search can be reduced to (1):
– In parallel
• for all processors, copy a[pid] to M[pid]
• if(M[pid] = = target) write pid to some variable k
– If we assume only 1 element will equal target, then the location
i of that element is written to k, so that a[k] stores target
– This algorithm uses n processors and is (1)
• So, even though the array may be unsorted, we have improved over all
previous search algorithms
• A problem with this algorithm arises if target appears in multiple array
locations because the PRAM can not accommodate multiple writes to k
of different values at the same time
• We will visit this problem of handling multiple writes to the same
memory location later in this chapter
Parallel Tournament for Maximum
• Here we see a parallel algorithm to
determine the maximum item in an array incr = 1;
while(incr < n)
– The algorithm is based on the tournament temp0 = M[pid]
idea from chapter 5, here each tournament of temp1 = M[pid + incr]
if(temp0 > temp1)
a given round is performed by a different M[pid] = M[pid + incr]
processor incr *= 2
• For n array items, this algorithm uses n / 2
processors and takes (log n) time It should be easy to see that
• Each processor takes 2 array elements from the above algorithm iterates
log n times
position M[pid] and M[pid + incr], finds the
max and copies it into M[pid] So the complexity is 4 log n + 1
number of operations or log n
For instance: comparisons
P0 looks at locations 0 and 1, P1 looks at locations 1 and 2, etc
In the next iteration, P0 compares the items at 0 (max of 0, 1) and 2 (max of 2, 3)
Eventually, Processor 0 compares the items at 0 and n / 2 for max
Formal Analysis
• The book presents the following theorem:
– At the end of the tth iteration of the while loop, incr = 2t and
each cell M[i] for 0<=i<2log n contains the maximum of M[i]…
M[i+incr-1]
– The book proves this by induction
– We do not need to be so formal to see that incr = 2t since we
multiply incr by 2 each iteration, so if incr starts at 1, after t
iterations, incr = 2t
– By induction, we can see that the second result is true as
follows:
• If at some iteration t-1, M[i] contains max(M[i], …, M[i+incr/2-1])
where incr = 2t-1
• Then the next iteration will compare M[i+incr/2] and M[i] and by our
assumption, M[i+incr/2] stores the max(M[i+incr/2], …, M[i+incr-1])
• So M[i] will now be whichever is greater and so at iteration t, M[i] is
max(M[i], …, M[i+incr-1])
Example
Variations of the Parallel Tournament
• The “parallel tournament” approach is also called the
“Binary fan-in technique” – see figure 14.3
• We can use this approach and apply it to other probems:
– Given n boolean values in an array, we can compute the
• AND of the array by replacing M[i] = max(M[i], M[i+incr]) with M[i]
= M[i] AND M[i+incr])
• OR of the array by replacing M[i] = max(M[i], M[i+incr]) with M[i] =
M[i] OR M[i+incr])
– Given n int values, we can sum these items
• By replacing M[i] max(M[i], M[i+incr]) with M[i] = M[i] + M[i+incr])
• All of these algorithms require n processors and perform
the operation in (log n) with concurrent reads but no
write problems
A New Class: NC
• The class NC (poly-log time complexity) are those
algorithms that can be solved by a PRAM of some O(nk)
processors in O(logmn) time
– That is, if the algorithm can be solved using a PRAM with a
number of processors that is some polynomial amount of n (n 2, n3,
n6, etc) and the run-time for the algorithm is log n multiplied some
constant number of times, then the algorithm is classified as NC
• We have seen finding max, summation, array-AND and array-OR are all in
NC
• What about other problems like sorting? We will explore sorting later
– Note: It may not be practical to have n k processors (even when k
= 1) for large input sizes
• In practice, most parallel processors have no more than 1024 processors,
often far fewer
Parallel Matrix Multiplication
x11 x12 x13 y11 y12 y13
• To perform matrix x21 x22 x23 * y21 y22 y23
multiplication on two x31 x32 x33 y31 y32 y33
NxN matrices
For instance:
– we need to perform n z11 = x11*y11 + x12*y21 + x13*y31
multiplications and n-1
additions per element in z32 = x31*y12 + x32*y22 + x33*y32
the resulting matrix
Parallel Solutions:
– The resulting matrix is Assume we have n2 processors each processor
NxN, so we have n2 will produce the result of one matrix entry,
items thus taking n multiplications and n-1
additions, or roughly 2*n operations ((n))
– So matrix multiplication
needs a total of n3
Assume we have n3 processors, each process
multiplications and n2 *
does 1 multiplication and then the additions
(n-1) additions, so this is can be done in log n time on n processors
an (n3) algorithm requiring log n + 1 operations ((log n))
Parallel Reads vs. Writes
• The basic form of PRAM disallows parallel
• All PRAMs can writes to the same memory location
perform concurrent
– This PRAM is known as a CREW PRAM
reads (concurrent reads, exclusive writes)
– that is, any • However, there are stronger PRAM models as
number of follows (known as CRCW), each of which
processors can
read the same
relaxes the restriction more and more, resulting
memory location in more powerful but harder to implement
at the same time PRAMs:
• PRAMs can also – Common-write: processors can write to the
perform multiple same memory location concurrently as long as
writes to different they all write the same value
memory locations, – Arbitrary-write: when multiple processors write
but what about to the same memory location, one is arbitrary
concurrent writes (to chosen
the same location)? – Priority-write: when multiple processors write
– This is a problem to the same memory location, only the processor
with the lowest pid is selected to write
Using a Priority-Write PRAM
• As an example, we implement
insertion sort using a priority-write
PRAM as follows:
for(i=1; i<n; i++)
– The PRAM has n processors copy element a[j] to M[j] for
• The code is given to the right all j between 0 and i-1
temp = a[i]
– Notice that we must use a priority- if(M[pid] < M[i])
write PRAM so that the proper location a[pid] = M[pid]
is written into k in the else statement else
a[pid+1] = M[pid]
• The loop iterates n – 1 times k = pid
– Within each iteration, each processor a[k] = temp
performs up to 4 operations
• or a total of 4 * (n – 1) parallel steps
– So with n processors, we can perform a
sort in (n)
One Iteration of Parallel Insertion Sort

This process is repeated for i from 1 to n-1

so it takes 4 * (n – 1) operations if there are n processors
Boolean OR on n bits
• We want to OR together n different bits stored in n array
elements
– Notice that if any bit of n is true, then the OR results in true
– We can use this information and the ability to handle concurrent
writes to solve Boolean OR of n bits in (1) time
• Use an n processor PRAM that permits common-writes
– Each processor reads a[i] (a boolean) and initialize boolean
variable k to false
– If bit = = 1, then write true to k
– k is then the result of the OR
• With 2 operations, this is in (1) requiring n processors
– Note: without the ability to perform common-writes, this will not work!
• Could we use a similar approach for AND?
– Yes, initialize k to true and have any processor write a[i] to k if
a[i] is false
Finding Maximum in (1)
• By using the common-write PRAM, we can now solve this
problem in constant time
– we use n2 processors where each processor is denoted as p i,j
• Assign n processors pi,0…pi,n-1 to array location i
– Each processor pi,j compares the two array values a[i] and a[j]
– If a[i] < a[j] then write 1 to b[i] else write 1 to b[j]
• there will be multiple concurrent writes to the same array location, but all
writes are a 1, so this is permissible
– Now assign a processor to each element in b (n total processors)
• if b[pid] = = 0 then write pid to k
– a[k] is the maximum item
• Why? Because b[pid] = = 0 if it never lost a comparison, so a[pid] is the
maximum in the array
• This algorithm takes 4 parallel operations, so is (1) requiring n2 processors
Example of Parallel Max in (1)
Parallel Merge
• We now focus on merging 2 subarrays in parallel in support of a
parallel MergeSort
– Given two arrays of k elements each, both subarrays already being
sorted
• We want to combine them into a single sorted array of n elements
– We will use 2*k processors, each assigned to one value in one of the
two subarrays
• Assume the first subarray has elements 0…k-1 and the second has elements k…
2k-1
• Processor i (0<=i<=2k-1) will determine where to place element a[i] (if 0<=i<=k-
1) or b[i] (if k<=i<=2k-1) into the merged array
– We perform an “in-place merge” in parallel so that we do not need any additional
array space
• We make the following observation:
– an element a[x] will be placed into the array at position x + b(a[x]) where b(a[x]) tells
us how many elements in b are less than a[x] and an element b[x] will be placed into
the array at position a(b[x]) + x where a(b[x]) tells us how many elements in a are less
than b[x]
– How can we determine a(b[x]) or b(a[x])?
Using Binary Search
• First, we make the assumption that no two elements in
the two combined subarrays share a value
– this allows us to use a CREW PRAM
• A processor, using binary search, can find an element’s
proper position in the array using binary search
– the following allows us to position a[x] into the merged array:
• position  findLocationUsingBinarySearch(b, a[x])
• a[x+position] = a[x]
– It should be obvious that this set of code is bound by (log n)
• in fact, it will take log n + 2 operations in the worst case
– binary search can require up to log n + 1 comparisons
– We can therefore merge the two sorted subarrays into one
combined sorted array using n processors in log n time where
each processor does the above steps
• Pseudocode is given in figure 14.8
Parallel MergeSort
• Now that we have a parallel merge using n processors,
we can rewrite MergeSort
– Recall in MergeSort that we used recursion to divide the array
in half and then merged the two sorted subarrays into a single
array
• We used recursion to keep track of the two subarrays to merge
– Now that we have a parallel merge and parallel processors, to
combine any two subarrays we can keep track of the various
subarrays to merge on different processors
– So, we no longer need the recursive step, instead we can
replace it with iteration
for(k=1; k<n; k=2*k)
for each i in 0, 2k, 4k, …, i < n do in parallel
Pi executes merge(M, i, k)
// that is, Pi merges the two subarrays i..i+k-1 and i+k…i+2k-1
Example and Analysis

Each parallel merge is in (log n)

Since we double the number of elements being merged at each iteration,

it takes log n levels of merging, so we have log n * log n total parallel steps
log n * log n = log 2 n or (log n)2

More precisely, the complexity is ½(log n + 1) (log n + 2) – 1

This complexity is between (log n) and (n)

CREW vs. CRCW PRAMs
• It’s worth pointing out that an algorithm that is solvable
using a CRCW PRAM can also be solved using a
CREW PRAM
– The CREW PRAM may require more time than the CRCW
PRAM, problem can still be solved
– How much more time?
– It has been shown that the difference is no more than a factor
of log n
• Recall that we could find the maximum array element in (log n)
using an n processor CREW PRAM and (1) using an n processor
CRCW PRAM
• Therefore, any algorithm that can be solved by a CRCW PRAM that is
in NC means that the problem can be solved by a CREW PRAM in NC
and so problems in the class NC are independent of the type of PRAM

Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Chapter 02
No ratings yet
Chapter 02
47 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
PRAM Parallel Computing Algorithms
No ratings yet
PRAM Parallel Computing Algorithms
49 pages
Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
No ratings yet
Fundamental Algorithms: Chapter 3: Parallel Algorithms - The PRAM Model
26 pages
Parallel Algorithm Merged
No ratings yet
Parallel Algorithm Merged
76 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Pda 3
No ratings yet
Pda 3
90 pages
1 Overview, Models of Computation, Brent's Theorem
No ratings yet
1 Overview, Models of Computation, Brent's Theorem
8 pages
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
No ratings yet
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques
104 pages
Parallel
No ratings yet
Parallel
59 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Ram, Pram, and Logp Models
No ratings yet
Ram, Pram, and Logp Models
72 pages
PRAM and RAM Models Explained
No ratings yet
PRAM and RAM Models Explained
17 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
21 pages
Parallel Algorithms Explained
No ratings yet
Parallel Algorithms Explained
50 pages
Pram
No ratings yet
Pram
22 pages
S R T S: OME Esearch Opics For Tudents
No ratings yet
S R T S: OME Esearch Opics For Tudents
3 pages
Bert 2a Parallel Algorithms Parfor Quicksort Reduction Listranking Rootfinding Postordernumbering
No ratings yet
Bert 2a Parallel Algorithms Parfor Quicksort Reduction Listranking Rootfinding Postordernumbering
73 pages
Parallel Computation Models
No ratings yet
Parallel Computation Models
59 pages
Chapter Six
No ratings yet
Chapter Six
18 pages
PRAM Algorithms
100% (1)
PRAM Algorithms
24 pages
Case Study
33% (3)
Case Study
4 pages
HPC in Computer Algebra
No ratings yet
HPC in Computer Algebra
152 pages
Module 3
No ratings yet
Module 3
104 pages
Lecture: Systolic Arrays I: Topics: Sorting and Matrix Algorithms
No ratings yet
Lecture: Systolic Arrays I: Topics: Sorting and Matrix Algorithms
21 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
Compre 1
No ratings yet
Compre 1
2 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
Parallel Algorithms & Architectures
No ratings yet
Parallel Algorithms & Architectures
22 pages
Parallel Computation Models Explained
No ratings yet
Parallel Computation Models Explained
3 pages
Pap 3 Shared Memory Algos
No ratings yet
Pap 3 Shared Memory Algos
23 pages
Pda 4
No ratings yet
Pda 4
82 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
Lecture Parallelism DC PDF
No ratings yet
Lecture Parallelism DC PDF
7 pages
Parallel Algorithms for PRAM Models
No ratings yet
Parallel Algorithms for PRAM Models
4 pages
CSE524sp10 01
No ratings yet
CSE524sp10 01
62 pages
S23 PDC Mid Exam
No ratings yet
S23 PDC Mid Exam
2 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
Parallel Random Access Machine
No ratings yet
Parallel Random Access Machine
22 pages
The PRAM Model and Algorithms: Advanced Topics Spring 2008
No ratings yet
The PRAM Model and Algorithms: Advanced Topics Spring 2008
24 pages
Improved Computing Performance For Listing Combinatorial Algorithms Using Multi-Processing Mpi and Thread Library
No ratings yet
Improved Computing Performance For Listing Combinatorial Algorithms Using Multi-Processing Mpi and Thread Library
16 pages
Parallel & Distributed Algorithms Course
No ratings yet
Parallel & Distributed Algorithms Course
65 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
No ratings yet
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
51 pages
Parallel Computing: Types of Parallelism
No ratings yet
Parallel Computing: Types of Parallelism
27 pages
Chapter 01
No ratings yet
Chapter 01
52 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Introduction To Parallelism
No ratings yet
Introduction To Parallelism
27 pages
PRAM Model
No ratings yet
PRAM Model
72 pages
Algo - 1
No ratings yet
Algo - 1
54 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
The Design and Analysis of Parallel Algorithms
No ratings yet
The Design and Analysis of Parallel Algorithms
412 pages
PRAM Models
No ratings yet
PRAM Models
4 pages
Parallel Algorithms for CS Students
No ratings yet
Parallel Algorithms for CS Students
353 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
PRAM and Distributed Computing Report
No ratings yet
PRAM and Distributed Computing Report
5 pages
Mcse 11
No ratings yet
Mcse 11
3 pages
CSC 429 Mid Fall 12
No ratings yet
CSC 429 Mid Fall 12
6 pages
Get Encyclopedia of Software Engineering Vol 2 2nd Edition John J. Marciniak Free All Chapters
100% (13)
Get Encyclopedia of Software Engineering Vol 2 2nd Edition John J. Marciniak Free All Chapters
82 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
39 pages
Searching On Sorted Sequence
No ratings yet
Searching On Sorted Sequence
9 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
Parellel Computing 2024 C - Handout-2
No ratings yet
Parellel Computing 2024 C - Handout-2
3 pages
Daa Unit-V
No ratings yet
Daa Unit-V
50 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Introduction To Parallel Processing Algorithms and Architectures 1st Edition by Behrooz Parhami ISBN 9780306469640 0306469642instant Download
100% (9)
Introduction To Parallel Processing Algorithms and Architectures 1st Edition by Behrooz Parhami ISBN 9780306469640 0306469642instant Download
78 pages
P&DC Course Information Sheet
No ratings yet
P&DC Course Information Sheet
4 pages
Advanced Parallel Algorithms
No ratings yet
Advanced Parallel Algorithms
56 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
PRAMs
No ratings yet
PRAMs
67 pages
CSC 306 22 - 22 Past Questions and Answers
No ratings yet
CSC 306 22 - 22 Past Questions and Answers
4 pages
Syllabus - BSCS 611 Parallel Computing
No ratings yet
Syllabus - BSCS 611 Parallel Computing
1 page
Exploring Parallel Computing Architectures A Literature Review
No ratings yet
Exploring Parallel Computing Architectures A Literature Review
9 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
47 pages
Interconnection Networks For Parallel Processores
No ratings yet
Interconnection Networks For Parallel Processores
20 pages
Institute of Business Administration Karachi: Muhammadsaeed@iba - Edu.pk Saeed@uok - Edu.pk
No ratings yet
Institute of Business Administration Karachi: Muhammadsaeed@iba - Edu.pk Saeed@uok - Edu.pk
2 pages
LEC6 parallelAlg-Broadcasting
No ratings yet
LEC6 parallelAlg-Broadcasting
15 pages
HPC Pyq
No ratings yet
HPC Pyq
11 pages
LECTURE 3 - Parallel Computing Platforms (PART 2)
No ratings yet
LECTURE 3 - Parallel Computing Platforms (PART 2)
44 pages

Chapter 14: Parallel Algorithms

Uploaded by

Chapter 14: Parallel Algorithms

Uploaded by

Chapter 14: Parallel Algorithms

• As processing power continues to become cheaper, it is

This process is repeated for i from 1 to n-1

Each parallel merge is in (log n)

Since we double the number of elements being merged at each iteration,

More precisely, the complexity is ½(log n + 1) (log n + 2) – 1

This complexity is between (log n) and (n)

You might also like