Data Structure (CTE 115)
Data Structure (CTE 115)
LECTURE NOTE ON
DATA STRUCTURES
COURSE CODE: CTE 115
Mr Tajudeen A.A
1
Introduction to Data Structures
Data Structure is a way of collecting and organising data in such a way that we can perform operations
on these data in an effective way. Data Structures is about rendering data elements in terms of some
relationship, for better organization and storage. For example, we have some data which has,
player's name "Virat" and age 26. Here "Virat" is of String data type and 26 is of integer data type.
We can organize this data as a record like Player record, which will have both player's name and age
in it. Now we can collect and store player's records in a file or database as a data structure. For
example: "Dhoni" 30, "Gambhir" 31, "Sehwag" 33
If you are aware of Object Oriented programming concepts, then a class also does the same thing, it
collects different type of data under one single entity. The only difference being, data structures
provides for techniques to access and manipulate data efficiently.
In simple language, Data Structures are structures programmed to store ordered data, so that various
operations can be performed on it easily. It represents the knowledge of data to be organized in
memory. It should be designed and implemented in such a way that it reduces the complexity and
increases the efficiency.
Data structures
A data structure is a specialized format for organizing, processing, retrieving and storing data. There
are several basic and advanced types of data structures, all designed to arrange data to suit a specific
purpose. Data structures make it easy for users to access and work with the data they need in
appropriate ways. Most importantly, data structures frame the organization of information so that
machines and humans can better understand it.
In computer science and computer programming, a data structure may be selected or designed to
store data for the purpose of using it with various algorithms. In some cases, the algorithm's basic
operations are tightly coupled to the data structure's design. Each data structure contains
information about the data values, relationships between the data and -- in some cases -- functions
that can be applied to the data.
For instance, in an object-oriented programming language, the data structure and its associated
methods are bound together as part of a class definition. In non-object-oriented languages, there
may be functions defined to work with the data structure, but they are not technically part of the
data structure.
2
. A data item is a single unit of values. It is a raw fact which becomes information after processing. Data items
for example, date are called group items if they can be divided into subsystems. The date for instance is
represented by the day, the month and umber is called an elementary item, because it cannot be sub-divided
into sub-items. It is indeed treated as a single item. An entity is used to describe anything that has certain
attributes or properties, which may be assigned values. For example, the following are possible attributes and
their corresponding values for an entity known as STUDENT.
ATTRIBUTES NAME AGE SEX MATRIC NO
VALUES Paul 21 Male 800654
Entities with similar attributes for example, all the 200 level Computer science & Statistics students form an
entity set.
Main functions of data Structures:
Seek to identify and develop entities, operations and appropriate classes of problems to use them.
Determine representations for abstract entities to implement abstract operations on concrete
representations.
Data structures are the building blocks for more sophisticated applications. They are designed by
composing data elements into a logical unit representing an abstract data type that has relevance to
the algorithm or application. An example of an abstract data type is a "customer name" that is
composed of the character strings for "first name," "middle name" and "last name."
It is not only important to use data structures, but it is also important to choose the proper data
structure for each task. Choosing an ill-suited data structure could result in slow runtimes or
unresponsive code. Five factors to consider when picking a data structure include the following:
Early programming languages -- such as Fortran, C and C++ -- enabled programmers to define their
own data structures. Today, many programming languages include an extensive collection of built-in
data structures to organize code and information. For example, Python lists and dictionaries,
and JavaScript arrays and objects are common coding structures used for storing and retrieving
information.
Software engineers use algorithms that are tightly coupled with the data structures -- such as lists,
queues and mappings from one set of values to another. This approach can be fused in a variety of
applications, including managing collections of records in a relational database and creating an index
of those records using a data structure called a binary tree.
Some examples of how data structures are used include the following:
Storing data. Data structures are used for efficient data persistence, such as specifying the
collection of attributes and corresponding structures used to store records in a database
management system.
Managing resources and services. Core operating system (OS) resources and services are
enabled through the use of data structures such as linked lists for memory allocation, file
directory management and file structure trees, as well as process scheduling queues.
Data exchange. Data structures define the organization of information shared between
applications, such as TCP/IP packets.
Ordering and sorting. Data structures such as binary search trees -- also known as an ordered or
sorted binary tree -- provide efficient methods of sorting objects, such as character strings used
as tags. With data structures such as priority queues, programmers can manage items organized
according to a specific priority.
Indexing. Even more sophisticated data structures such as B-trees are used to index objects, such
as those stored in a database.
Searching. Indexes created using binary search trees, B-trees or hash tables speed the ability to
find a specific sought-after item.
Scalability. Big data applications use data structures for allocating and managing data storage
across distributed storage locations, ensuring scalability and performance. Certain big data
4
programming environments -- such as Apache Spark -- provide data structures that mirror the
underlying structure of database records to simplify querying.
Characteristics of data structures
Data structures are often classified by their characteristics. The following three characteristics are
examples:
1. Linear or non-linear. This characteristic describes whether the data items are arranged in
sequential order, such as with an array, or in an unordered sequence, such as with a graph.
2. Homogeneous or heterogeneous. This characteristic describes whether all data items in a given
repository are of the same type. One example is a collection of elements in an array, or of various
types, such as an abstract data type defined as a structure in C or a class specification in Java.
3. Static or dynamic. This characteristic describes how the data structures are compiled. Static data
structures have fixed sizes, structures and memory locations at compile time. Dynamic data
structures have sizes, structures and memory locations that can shrink or expand, depending on
the use.
Linked List
Tree
Graph
Stack, Queue
etc.
All these data structures allow us to perform different operations on data. We select these data
structures based on which type of operation is required. We will look into these data structures in
more details in our later lessons.
5
Boolean, which stores logical values that are either true or false.
integer, which stores a range on mathematical integers -- or counting numbers. Different sized
integers hold a different range of values -- e.g., a signed 8-bit integer holds values from -128 to
127, and an unsigned long 32-bit integer holds values from 0 to 4,294,967,295.
Floating-point numbers, which store a formulaic representation of real numbers.
Fixed-point numbers, which are used in some programming languages and hold real values but
are managed as digits to the left and the right of the decimal point.
Character, which uses symbols from a defined mapping of integer values to symbols.
Pointers, which are reference values that point to other values.
String, which is an array of characters followed by a stop code -- usually a "0" value -- or is
managed using a length field that is an integer value.
Array. An array stores a collection of items at adjoining memory locations. Items that are the
same type are stored together so the position of each element can be calculated or retrieved
easily by an index. Arrays can be fixed or flexible in length.
170 Stack. A stack stores a collection of items in
171 the linear order that operations are applied.
172 This order could be last in, first out (LIFO) or
173 first in, first out (FIFO).
6
183 Tree. A tree stores a collection of items in
184 an abstract, hierarchical way. Each node is
185 associated with a key value, with parent nodes
186 linked to child nodes -- or subnodes. There is
187 one root node that is the ancestor of all the
188 nodes in the tree.
189
190 A binary search tree is a set of nodes where each
191 has a value and can point to two child nodes.
Graph. A graph stores a collection of items in a nonlinear fashion. Graphs are made up of a finite
set of nodes, also known as vertices, and lines that connect them, also known as edges. These are
useful for representing real-world systems such as computer networks.
Trie. A trie, also known as a keyword tree, is a data structure that stores strings as data items that
can be organized in a visual graph.
The following are the units for identifying data character, fields, sub fields ,
records, files.
A file is a collection of logically related records; e.g students file, stock file.
A record is a collection of logically related data fields; e. g Data relating to students in students file.
In a database table records are usually in rows. Therefore, the table below has three (3) records.
While a field is consecutive storage position of values. It is a unit of data within a record e. g
student's number, Name, Age. In a database concept fields are usually in columns of a given table.
Data items for example , date are called group items if they can be divided into subsystems. The date
for instance is represented by the day, the month andumber is called an elementary item, because it
can not be sub-divided into sud-items otherwise known as sub fields called . It is indeed treated as a
single item.
Character is the smallest unit of information. It includes letters, digits and special symbols such as +
(Plus sign), _(minus sign), \, /, $,a,b, z, A,B, Z etc. Every character requires one byte of memory unit
for storage in computer system.
8
A set is a mathematical model for a collection of different things; a set contains elements or
members, which can be mathematical objects of any kind numbers, symbols, points in space, lines,
other geometrical shapes, variables, or even other sets.
Need for Set Data Structure
Set data structures are commonly used in a variety of computer science applications, including
algorithms, data analysis, and databases. The main advantage of using a set data structure is that it
allows you to perform operations on a collection of elements in an efficient and organized way.
2. Ordered Set
An Ordered set is the common set data structure we are familiar with. It is generally implemented
using balanced BSTs and it supports O(log n) lookups, insertions and deletion operations.
Difference between Array, Set, and Map Data Structure:
Features: Array Set
Duplicate values Duplicate Values Unique Values
Order Ordered Collection Unordered Collection
Size Static Dynamic
Elements in an array can be Iterate over the set to retrieve
Retrieval accessed using their index the value.
Adding, removing, and accessing Set operations like union,
Operations elements intersection, and difference.
Stored as contiguous blocks of Implemented using linked lists
Memory memory or trees
9
You can insert an element into a set using the insert function. For example:
277 2. Check if an element is present: You can
278 check if an element is present in a set using the
279 count function. The function returns 1 if the
280 element is present, and 0 otherwise.
281 3. Remove an element:
282 You can remove an element from a set using
283 the erase function. For example: In the case of
284 Hash table implementation it will be like the
following:
294 Taking out Maximum and Minimum from Set Data Structure
Output
The set s1 is : s2.erase(50) : 1 removed
60 50 40 30 20 10 30 40 60
The set s2 after assign from s1 is : s1.lower_bound(40) : 40
10 20 30 40 50 60 s1.upper_bound(40) : 30
s2 after removal of elements less than 30 : s2.lower_bound(40) : 40
30 40 50 60 s2.upper_bound(40) : 60
13
Linear data structures are data structures in which data elements are stored in a linear sequence.
They include:
1. Arrays: A collection of elements stored in contiguous memory locations.
2. Linked Lists: A collection of nodes, each containing an element and a reference to the next node.
3. Stacks: A collection of elements with Last-In-First-Out (LIFO) order.
4. Queues: A collection of elements with First-In-First-Out (FIFO) order.
Linear data structures are used in many computer science applications such as searching, sorting,
and manipulating data. They offer efficient data access, but may require additional memory for
maintaining pointers between elements. A data structure is a particular way of organizing data in
a computer so that it can be used effectively. The idea is to reduce the space and time
complexities of different tasks. Below is an overview of some popular linear data structures.
1. Array
2. Linked List
3. Stack
4. Queue
1. Array
The array is a data structure used to store homogeneous elements at contiguous locations. The size
of an array must be provided before storing data.
An array is a linear data structure that store a sequence of elements. An array is defined as it is a
collection of items stored at memory (contiguous memory locations).
We can also say that arrays are the set of homogeneous(it can hold only one type of data the data
that is all floating numbers or all characters or all integers numbers); data elements stored multiple
items of the same type together in one place in memory.
Array use an index-based data structure which helps to identify each of the elements in array .makes
it easier to calculate, what the position of each element is by simply adding an offset to a base value.
Single sub-scripted values are called linear array or one- dimensional array and Array can also handle
complex data structures by storing data in a two-subscripted variables are called as two-dimensional
array.
Example: For example, let us say, we want to store marks of all students in a class, we can use an
array to store them. This helps in reducing the use of a number of variables as we don’t need to
create a separate variable for marks of every subject. All marks can be accessed by simply traversing
the array.
Advantages of arrays:
1. Constant-time Access: Arrays allow for constant-time access to elements by using their index,
making it a good choice for implementing algorithms that need fast access to elements.
2. Memory Allocation: Arrays are stored in contiguous memory locations, which makes the memory
allocation efficient.
14
3. Easy to Implement: Arrays are easy to implement and can be used with basic programming
constructs like loops and conditionals.
Disadvantages of arrays:
1. Fixed Size: Arrays have a fixed size, so once they are created, the size cannot be changed. This can
lead to memory waste if an array is too large or dynamic resizing overhead if an array is too small.
2. Slow Insertion and Deletion: Inserting or deleting elements in an array can be slow, especially if
the operation needs to be performed in the middle of the array. This requires shifting all elements
to make room for the new element or to fill the gap left by the deleted element.
3. Cache Misses: Arrays can suffer from cache misses if elements are not accessed in sequential
order, which can lead to poor performance.
4. In summary, arrays are a good choice for problems where constant-time access to elements and
efficient memory allocation are required, but their disadvantages should be considered for
problems where dynamic resizing and fast insertion/deletion operations are important.
2. Linked List
A linked list is a linear data structure (like arrays) where each element is a separate object. A linked
list is made up of two items that are data and a reference to the next node. A reference to the next
node is given with the help of pointers and data is the value of a node. Each node contains data and
links to the other nodes. It is an ordered collection of data elements called a node and the linear
order is maintained by pointers. It has an upper hand over the array as the number of nodes i.e. the
size of the linked list is not fixed and can grow and shrink as and when required, unlike arrays.
Types of Linked Lists:
1. Singly Linked List: In this
type of linked list, every node
stores the address or
reference of the next node in
the list and the last node has
the next address or reference
as NULL. For example 1->2-
>3->4->NULL
2. Doubly Linked List: In this type of Linked list, there are two references associated with each node,
One of the reference points to the next node and one to the previous node. The advantage of this
data structure is that we can traverse in both directions and for deletion, we don’t need to have
explicit access to the previous node. Eg. NULL<-1<->2<->3->NULL
15
3. Circular Linked List: Circular linked list is a linked list where all nodes are connected to form a
circle. There is no NULL at the end. A circular linked list can be a singly circular linked list or a doubly
circular linked list. The advantage of this data structure is that any node can be made as starting
node. This is useful in the implementation of the circular queues in the linked list. Eg. 1->2->3->1
[The next pointer of the last node is pointing to the first]
4. Circular Doubly Linked List: The circular doubly linked list is a combination of the doubly linked
list and the circular linked list. It means that this linked list is bidirectional and contains two pointers
and the last pointer points to the first pointer.
Example: Consider the previous example where we made an array of marks of students. Now if a
new subject is added to the course, its marks are also to be added to the array of marks. But the size
of the array was fixed and it is already full so it can not add any new element. If we make an array of
a size lot more than the number of subjects it is possible that most of the array will remain empty.
We reduce the space wastage Linked List is formed which adds a node only when a new element is
introduced. Insertions and deletions also become easier with a linked list.
One big drawback of a linked list is, random access is not allowed. With arrays, we can access i’th
element in O(1) time. In the linked list, it takes Θ(i) time.
1. Dynamic Size: Linked lists are dynamic in size, so they can grow or shrink as needed without
wasting memory.
2. Efficient Insertion and Deletion: Linked lists provide efficient insertion and deletion operations, as
only the pointers to the previous and next nodes need to be updated.
3. Cache Friendliness: Linked lists can be cache-friendly, as they allow for linear access to elements,
which can lead to better cache utilization and improved performance.
16
Disadvantages of linked lists:
1. Slow Access: Linked lists do not allow for constant-time access to elements by index, so accessing
an element in the middle of the list can be slow.
2. More Memory Overhead: Linked lists require more memory overhead compared to arrays, as each
element in the list is stored as a node, which contains a value and a pointer to the next node.
3. Harder to Implement: Linked lists can be harder to implement than arrays, as they require the use
of pointers and linked data structures.
4. In summary, linked lists are a good choice for problems where dynamic size and efficient
insertion/deletion operations are important, but their disadvantages should be considered for
problems where constant-time access to elements is necessary.
3. Stack
A stack or LIFO (last in, first out) is an abstract data type that serves as a collection of elements, with
two principal operations: push, which adds an element to the collection, and pop, which removes the
last element that was added. In stack both the operations of push and pop take place at the same
end that is top of the stack. It can be implemented by using both array and linked list.
It is defined as ordered collection of elements represented by a real physical stack or pile. Linear data
structure features insertion and deletion of items take place at one end called top of the stack. You
can use these concepts or structures all throughout programming.
Example: Stacks are used for maintaining function calls (the last called function must finish execution
first), we can always remove recursion with the help of stacks. Stacks are also used in cases where we
have to reverse a word, check for balanced parenthesis, and in editors where the word you typed the
last is the first to be removed when you use undo operation. Similarly, to implement back
functionality in web browsers.
Primary Stack Operations:
17
void push(int data): When this operation is performed, an element is inserted
into the stack. The Push operation is used to put a new item to stack. Initially,
a stack will be empty with the top of the stack pointing to null. To add a data
element, item 1, to the stack, the following operation is performed on the stack:
push(item1);
int pop(): When this operation is performed, an element is removed from the
top of the stack and is returned. The Pop operation is used to retrieve the top
of the stack and update the top pointer to refer to the next top item. Thus, for
the stack shown in Figure 3.2, performing a pop() operation would return item4
and update the top of the stack to point to item3.
Auxiliary Stack Operations:
int top(): This operation will return the last inserted element that is at the top without removing
it.
int size(): This operation will return the size of the stack i.e. the total number of elements present
in the stack.
int isEmpty(): This operation indicates whether the stack is empty or not.
int isFull(): This operation indicates whether the stack is full or not.
Types of Stacks:
Register Stack: This type of stack is also a memory element present in the memory unit and can
handle a small amount of data only. The height of the register stack is always limited as the size of
the register stack is very small compared to the memory.
Memory Stack: This type of stack can handle a large amount of memory data. The height of the
memory stack is flexible as it occupies a large amount of memory data.
Array-Based Stack
In this section, an implementation of a stack using one dimensional array is provided. As shown in the
listing below, an array (myArray) of integers is used to implement a stack that can store a maximum of
five integers. The top variable is used to index the top item of the stack. The Stack class contains two
functions, push() and pop() implementing the behaviour of the stack push and pop operations
described
#include<iostream>
#include<conio.h>
#include<stdlib.h>
using namespace Data_Structure_Course;
class stack
{
int myArray[5];
int top;
public:
stack()
{
top=-1;
}
void push(int x)
{
if(top > 4)
{
cout <<"stack over flow";// the stack is full
return;
}
18
myArray[++top]=x;//increase top by one and store the new item
cout <<"inserted" <<x;
}
void pop()
{
if(top <0)
{
cout <<"stack is Empty";
return;
}
cout <<"deleted" << myArray[top--];
}
Stack Example
Consider a stack S1 that is empty and implemented using an array. Specify the effect of the following
operations on S1:
push(item1) , push(item2), pop(), push(item3),pop(),pop(),pop().
Solution:
First of all the stack is empty, so top = -1, let’s solve this problem using a table as follows:
Operation Stack Content Result Return Value Top Value
push(item1) S1:[item1] - item1
push(item2) S1: [item1,item2] - Item2
pop() S1: [item1] item2 item1
push(item3) S1: [item1,item3] - Item3
pop() S1: [item1] Item3 Item1
pop() S1: [] Item1 -1
pop() S1: [] Error: empty -
stack
Advantages of Stacks:
1. LIFO (Last-In, First-Out) Order: Stacks allow for elements to be stored and retrieved in a LIFO
order, which is useful for implementing algorithms like depth-first search.
2. Efficient Operations: Stacks provide efficient push-and-pop operations, as only the top element
needs to be updated.
3. Easy to Implement: Stacks can be easily implemented using arrays or linked lists, making them a
simple data structure to understand and use.
Disadvantages of Stacks:
1. Fixed Size: Stacks have a fixed size, so they can suffer from overflow if too many elements are
added or underflow if too many elements are removed.
2. Limited Operations: Stacks only allow for push, pop, and peek (accessing the top element)
operations, so they are not suitable for implementing algorithms that require constant-time
access to elements or efficient insertion and deletion operations.
19
3. Unbalanced Operations: Stacks can become unbalanced if push and pop operations are
performed unevenly, leading to overflow or underflow.
4. In summary, stacks are a good choice for problems where LIFO order and efficient push and pop
operations are important, but their disadvantages should be considered for problems that require
dynamic resizing, constant-time access to elements, or more complex operations.
4. Queue
A queue or FIFO (first in, first out) is an abstract data type that serves as a collection of elements, with
two principal operations: enqueue, the process of adding an element to the collection. (The element
is added from the rear side) and dequeue the process of removing the first element that was added.
(The element is removed from the front side). It can be implemented by using both array and linked
list. A queue is defined as a linear data structure that is open at both ends.
Example: Queue as the name says is the data structure built according to the queues of a bus stop
or train where the person who is standing in the front of the queue(standing for the longest time) is
the first one to get the ticket. So any situation where resources are shared among multiple users and
served on a first come first serve basis. Examples include CPU scheduling, Disk Scheduling. Another
application of queue is when data is transferred asynchronously (data not necessarily received at the
same rate as sent) between two processes. Examples include IO Buffers, pipes, file IO, etc.
Basic Operations on Queue:
void enqueue(int data): Inserts an element at the end of the queue i.e. at the rear end.
int dequeue(): This operation removes and returns an element that is at the front end of the
queue.
Queue Example
Consider we have an empty queue Q1. The behaviour of the queue when performing a number of
operations is illustrated in the table below.
Insert(A); -
Insert (B); -
Remove(); A
22
Advantages of Queues:
1. FIFO (First-In, First-Out) Order: Queues allow for elements to be stored and retrieved in a FIFO
order, which is useful for implementing algorithms like breadth-first search.
2. Efficient Operations: Queues provide efficient enqueue and dequeue operations, as only the front
and rear of the queue need to be updated.
3. Dynamic Size: Queues can grow dynamically, so they can be used in situations where the number
of elements is unknown or can change over time.
Disadvantages of Queues:
1. Limited Operations: Queues only allow for enqueue, dequeue, and peek (accessing the front
element) operations, so they are not suitable for implementing algorithms that require constant-
time access to elements or efficient insertion and deletion operations.
2. Slow Random Access: Queues do not allow for constant-time access to elements by index, so
accessing an element in the middle of the queue can be slow.
3. Cache Unfriendly: Queues can be cache-unfriendly, as elements are retrieved in a different order
than they are stored, which can lead to poor cache utilization and performance.
Efficient data access: Elements can be easily accessed by their position in the sequence.
Dynamic sizing: Linear data structures can dynamically adjust their size as elements are added or
removed.
Ease of implementation: Linear data structures can be easily implemented using arrays or linked
lists.
Versatility: Linear data structures can be used in various applications, such as searching, sorting,
and manipulation of data.
Simple algorithms: Many algorithms used in linear data structures are simple and
straightforward.
1. Limited data access: Accessing elements not stored at the end or the beginning of the sequence
can be time-consuming.
2. Memory overhead: Maintaining the links between elements in linked lists and pointers in stacks
and queues can consume additional memory.
3. Complex algorithms: Some algorithms used in linear data structures, such as searching and
sorting, can be complex and time-consuming.
4. Inefficient use of memory: Linear data structures can result in inefficient use of memory if there
are gaps in the memory allocation.
23
5. Unsuitable for certain operations: Linear data structures may not be suitable for operations that
require constant random access to elements, such as searching for an element in a large dataset.
Sorting Techniques
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to arrange
data in a particular order. Most common orders are in numerical or lexicographical order.
The importance of sorting lies in the fact that data searching can be optimized to a very high level, if
data is stored in a sorted manner. Sorting is also used to represent data in more readable formats.
Following are some of the examples of sorting in real-life scenarios −
Telephone Directory − The telephone directory stores the telephone numbers of people sorted
by their names, so that the names can be searched easily.
Dictionary − The dictionary stores words in an alphabetical order so that searching of any word
becomes easy.
In-place Sorting and Not-in-place Sorting
Sorting algorithms may require some extra space for
comparison and temporary storage of few data
elements. These algorithms do not require any extra
space and sorting is said to happen in-place, or for
example, within the array itself. This is called in-place
sorting. Bubble sort is an example of in-place sorting.
However, in some sorting algorithms, the program
requires space which is more than or equal to the
elements being sorted. Sorting which uses equal or more space is called not-in-place sorting. Merge-
sort is an example of not-in-place sorting.
Stable and Not Stable Sorting
If a sorting algorithm, after sorting the contents, does not change the sequence of similar content in
which they appear, it is called stable sorting.
740 If a sorting algorithm, after sorting the contents,
741 changes the sequence of similar content in which
742 they appear, it is called unstable sorting.
743 Stability of an algorithm matters when we wish to
744 maintain the sequence of original elements, like in a
745 tuple for example.
746
Important Terms : Some terms are generally coined while discussing sorting techniques,
Increasing Order Decreasing Order
24
A sequence of values is said to be in increasing A sequence of values is said to be in decreasing
order, if the successive element is greater than order, if the successive element is less than the
the previous one. For example, 1, 3, 4, 6, 8, 9 are current one. For example, 9, 8, 6, 4, 3, 1 are in
in increasing order, as every next element is decreasing order, as every next element is less
greater than the previous element. than the previous element
Bubble Sorting
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based algorithm in
which each pair of adjacent elements is compared and the elements are swapped if they are not in
order.
How Bubble Sort Works?
We take an unsorted array for our example.
Bubble sort takes Ο(n2) time so we're keeping it
short and precise.
Bubble sort starts with very first two elements,
comparing them to check which one is greater.
In this case, value 33 is greater than 14, so it is
already in sorted locations. Next, we compare 33
with 27
We find that 27 is smaller than 33 and these two
values must be swapped
25
The new array should look like this −
Implementation
One more issue we did not address in our original algorithm and its improvised pseudocode, is that,
after every iteration the highest values settles down at the end of the array. Hence, the next iteration
26
need not include already sorted elements. For this purpose, in our implementation, we restrict the inner
loop to avoid already sorted values.
Insertion Sorting
This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which is always
sorted. For example, the lower part of an array is maintained to be sorted. An element which is to be
'insert'ed in this sorted sub-list, has to find its appropriate place and then it has to be inserted there.
Hence the name, insertion sort.
The array is searched sequentially and unsorted items are moved and inserted into the sorted sub-list
(in the same array)..
How Insertion Sort Works?
We take an unsorted array for our example.
27
So we swap them.
This process goes on until all the unsorted values are covered in a sorted sub-list. Now we shall see
some programming aspects of insertion sort.
Algorithm Pseudocode
Now we have a bigger picture of how this sorting procedure insertionSort( A : array of items )
int holePosition
technique works, so we can derive simple steps int valueToInsert
by which we can achieve insertion sort.
for i = 1 to length(A) inclusive do:
Step 1 − If it is the first element, it is already sorted. return
1;
/* select value to be inserted */
Step 2 − Pick next element
valueToInsert = A[i]
Step 3 − Compare with all elements in the sorted sub-list
holePosition = i
Step 4 − Shift all the elements in the sorted sub-list that is
/*locate hole position for the element to be inserted */
greater than the
while holePosition > 0 and A[holePosition-1] >
value to be sorted
valueToInsert do:
Step 5 − Insert the value
A[holePosition] = A[holePosition-1]
Step 6 − Repeat until list is sorted
holePosition = holePosition -1
end while
/* insert the number at hole position */
A[holePosition] = valueToInsert
end for
end procedure
28
Selection Sorting
Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-based
algorithm in which the list is divided into two parts, the sorted part at the left end and the unsorted
part at the right end. Initially, the sorted part is empty and the unsorted part is the entire list.
The smallest element is selected from the unsorted array and swapped with the leftmost element, and
that element becomes a part of the sorted array. This process continues moving unsorted array
boundary by one element to the right.
How Selection Sort Works?
Consider the following depicted array as an
example.
29
The same process is applied to the rest of the
items in the array.
Following is a pictorial depiction of the entire
sorting process −
Tree
A tree is also one of the data structures that represent
hierarchical data. Suppose we want to show the
employees and their positions in the hierarchical form
then it can be represented as shown below:
A tree data structure is defined as a collection of objects or entities known as nodes that are
linked together to represent or simulate hierarchy.
A tree data structure is a non-linear data structure because it does not store in a sequential
manner. It is a hierarchical structure as elements in a Tree are arranged in multiple levels.
In the Tree data structure, the topmost node is known as a root node. Each node contains
some data, and data can be of any type. In the above tree structure, the node contains the
name of the employee, so the type of data would be a string.
Each node contains some data and the link or reference of other nodes that can be called
children.
31
Some basic terms used in Tree data structure.
820
824 Root: The root node is the topmost node in the tree
825 hierarchy. In other words, the root node is the one
826 that doesn't have any parent. In the above structure,
827 node numbered 1 is the root node of the tree. If a
828 node is directly linked to some other node, it would
829 be called a parent-child relationship.
830 Child node: If the node is a descendant of any
831 node, then the node is known as a child node.
832 Parent: If the node contains any sub-node, then
833 that node is said to be the parent of that sub-node.
834 Sibling: The nodes that have the same parent are
known as siblings.
Leaf Node:- The node of the tree, which doesn't have any child node, is called a leaf node. A
leaf node is the bottom-most node of the tree. There can be any number of leaf nodes present
in a general tree. Leaf nodes can also be called external nodes.
Internal nodes: A node has atleast one child node known as an internal
Ancestor node:- An ancestor of a node is any predecessor node on a path from the root to that
node. The root node doesn't have any ancestors. In the tree shown in the above image, nodes
1, 2, and 5 are the ancestors of node 10.
Descendant: The immediate successor of the given node is known as a descendant of a node.
In the above figure, 10 is the descendant of node 5.
32
Recursive data structure: The tree is also known
as a recursive data structure. A tree can be
defined as recursively because the distinguished
node in a tree data structure is known as a root
node. The root node of the tree contains a link to
all the roots of its subtrees. The left subtree is
shown in the yellow color in the below figure, and
the right subtree is shown in the red color. The left
subtree can be further split into subtrees shown in
three different colors. Recursion means reducing
something in a self-similar manner. So, this recursive property of the tree data structure is
implemented in various applications.
Number of edges: If there are n nodes, then there would n-1 edges. Each arrow in the structure
represents the link or path. Each node, except the root node, will have atleast one incoming link
known as an edge. There would be one link for the parent-child relationship.
Depth of node x: The depth of node x can be defined as the length of the path from the root
to the node x. One edge contributes one-unit length in the path. So, the depth of node x can
also be defined as the number of edges between the root node and the node x. The root node
has 0 depth.
Height of node x: The height of node x can be defined as the longest path from the node x to
the leaf node.
Based on the properties of the Tree data structure, trees are classified into various categories.
Implementation of Tree
The tree data structure can be created by creating
the nodes dynamically with the help of the
pointers. The tree in the memory can be
represented as shown below:
33
The above picture can only be defined for the binary trees because the binary tree can have utmost
two children, and generic trees can have more than two children. The structure of the node for generic
trees would be different as compared to the binary tree.
Applications of trees
Storing naturally hierarchical data: Trees are used to store the data in the hierarchical
structure. For example, the file system. The file system stored on the disc drive, the file and folder
are in the form of the naturally hierarchical data and stored in the form of trees.
Organize data: It is used to organize data for efficient insertion, deletion and searching. For
example, a binary tree has a logN time for searching an element.
Trie: It is a special kind of tree that is used to store the dictionary. It is a fast and efficient way
for dynamic spell checking.
Heap: It is also a tree data structure implemented using arrays. It is used to implement priority
queues.
B-Tree and B+Tree: B-Tree and B+Tree are the tree data structures used to implement indexing
in databases.
Routing table: The tree data structure is also used to store the data in routing tables in the
routers.
General tree: The general tree is one of the types of tree data structure. In the general tree, a
node can have either 0 or
maximum n number of
nodes. There is no
restriction imposed on the
degree of the node (the
number of nodes that a
node can contain). The
topmost node in a general
tree is known as a root
node. The children of the
parent node are known
as subtrees.
34
There can be n number of subtrees in a general tree. In the general tree, the subtrees are
unordered as the nodes in the subtree cannot be ordered.
Every non-empty tree has a downward edge, and these edges are connected to the nodes known
as child nodes. The root node is labeled with level 0. The nodes that have the same parent are
known as siblings.
Binary tree: Here, binary name itself suggests two
numbers, i.e., 0 and 1. In a binary tree, each node in a tree
can have utmost two child nodes. Here, utmost means
whether the node has 0 nodes, 1 node or 2 nodes.
A complete binary tree is another specific type of binary tree where all the tree levels are filled
entirely with nodes, except the lowest level of the tree. Also, in the last or the lowest level of this
35
binary tree, every node should possibly reside on the left side. Here is the structure of a complete
binary tree:
A binary tree is said to be ‘perfect’ if all the internal nodes have strictly two children, and every
external or leaf node is at the same level or same depth within a tree. A perfect binary tree having
height ‘h’ has 2h – 1 node. Here is the structure of a perfect binary tree: