Data Structures Using C
Data Structures Using C
Introduction
In this book, we will explore various data structures, starting from the basics and
progressing to more complex concepts. Each topic will be explained with clear
examples, illustrations, and C code implementations to help you understand the
theoretical concepts and their practical applications. We will also delve into algorithm
analysis, which is vital for evaluating the efficiency of different data structures and
algorithms.
To begin our journey, let's clarify some fundamental terms. A data object is a named
place in memory that stores a value. It's the most basic unit of data. For example, an
integer variable int x; declares a data object x that can store an integer value.
An Abstract Data Type (ADT) is a mathematical model for data types. It defines the
logical properties of a data type, such as the values it can hold and the operations that
can be performed on it, without specifying how these properties are implemented. In
simpler terms, an ADT tells you what a data structure does, but not how it does it. For
example, a
List ADT defines operations like add , remove , get , and size , but doesn't specify
whether it's implemented using an array or a linked list. This abstraction allows
programmers to use the data structure without worrying about its underlying
implementation details, promoting modularity and reusability.
Data structures can be broadly classified into two categories: linear and non-linear
data structures.
Linear Data Structures: In linear data structures, data elements are arranged
sequentially, one after another. Each element has a predecessor and a successor,
except for the first and last elements. This arrangement makes it easy to traverse all
elements in a single run. Examples of linear data structures include:
Linked Lists: A collection of nodes where each node contains data and a pointer
(or reference) to the next node in the sequence. Unlike arrays, elements are not
stored contiguously.
Stacks: A linear data structure that follows the Last-In, First-Out (LIFO) principle.
Operations are performed only at one end, called the 'top'.
Queues: A linear data structure that follows the First-In, First-Out (FIFO)
principle. Elements are added at one end (rear) and removed from the other end
(front).
Non-linear Data Structures: In non-linear data structures, data elements are not
arranged sequentially. Instead, they are organized in a hierarchical or network-like
manner. Each element can have multiple predecessors and successors, making them
more complex but also more flexible for representing complex relationships. Examples
of non-linear data structures include:
Graphs: A collection of nodes (vertices) and edges that connect pairs of nodes.
Graphs are used to model relationships between objects, such as social
networks, road maps, or computer networks.
When we talk about data structures and algorithms, it's crucial to understand how
efficient they are. Algorithm analysis is the process of determining the amount of
resources (like time and space) required by an algorithm to solve a given problem. This
helps us compare different algorithms and choose the most efficient one for a
particular task. The two primary measures of efficiency are time complexity and
space complexity.
Time Complexity: Time complexity measures the amount of time an algorithm takes
to run as a function of the input size. It's not about the actual execution time in
seconds, which can vary depending on the hardware, programming language, and
other factors. Instead, it's about how the number of operations grows with the input
size. We typically express time complexity using Big O notation (O), which describes
the upper bound or worst-case scenario of an algorithm's growth rate. For example:
O(1) - Constant Time: The execution time remains constant regardless of the
input size. Accessing an element in an array by its index is an O(1) operation.
O(log n) - Logarithmic Time: The execution time grows logarithmically with the
input size. This often occurs in algorithms that divide the problem into smaller
halves in each step, like binary search.
O(n) - Linear Time: The execution time grows linearly with the input size.
Traversing a linked list or searching for an element in an unsorted array are O(n)
operations.
O(n log n) - Linearithmic Time: The execution time grows proportionally to n
log n. Many efficient sorting algorithms, like Merge Sort and Quick Sort, have this
time complexity.
O(n^2) - Quadratic Time: The execution time grows quadratically with the input
size. This often occurs in algorithms with nested loops, like simple sorting
algorithms such as Bubble Sort or Insertion Sort.
O(2^n) - Exponential Time: The execution time doubles with each addition to
the input size. These algorithms are usually impractical for even moderately
sized inputs.
O(n!) - Factorial Time: The execution time grows extremely rapidly. These
algorithms are typically only feasible for very small input sizes.
O(n) - Linear Space: The memory usage grows linearly with the input size. For
example, if an algorithm creates a copy of the input array, its space complexity
would be O(n).
O(log n) - Logarithmic Space: The memory usage grows logarithmically with the
input size. This can occur in recursive algorithms where the depth of recursion is
logarithmic.
Understanding both time and space complexity is crucial for designing efficient
algorithms and choosing the right data structure for a given problem. Often, there's a
trade-off between time and space; an algorithm that is very fast might require a lot of
memory, and vice-versa. The goal is to find an optimal balance based on the specific
requirements of the application.
Unit 2: Array
An Array is one of the simplest and most fundamental data structures. It is a collection
of elements, all of the same data type, stored in contiguous memory locations. This
contiguous storage is what makes arrays highly efficient for certain operations,
particularly direct access to elements. In C, arrays are declared with a fixed size,
meaning the number of elements they can hold is determined at compile time or
runtime and cannot be changed later. This fixed size is a key characteristic of arrays.
Homogeneous Elements: All elements in an array must be of the same data type
(e.g., all integers, all characters, all floats).
Fixed Size: Once an array is declared with a certain size, its size cannot be
changed during program execution. If you need more space, you typically have to
create a new, larger array and copy the elements.
Direct Access (Random Access): Elements can be accessed directly using their
index. The index typically starts from 0 for the first element, 1 for the second, and
so on. This means accessing the first element is as fast as accessing the last
element.
Array as an ADT:
size() : Returns the total number of elements the array can hold.
While these are the core operations, other operations like insert (at a specific
position), delete (at a specific position), and search are also commonly performed
on arrays, though their efficiency can vary significantly depending on the
implementation.
// Accessing elements
printf("First score: %d\n", scores[0]); // Output: 100
// Modifying elements
numbers[0] = 10;
numbers[1] = 20;
Arrays are fundamental to programming and serve as the building blocks for many
other complex data structures. Their simplicity and efficiency for direct access make
them invaluable for various applications.
Arrays are incredibly versatile and are used in a wide range of applications due to their
direct access capabilities and efficient storage of homogeneous data. Some common
applications include:
Implementing Other Data Structures: Arrays are often used as the underlying
storage mechanism for other data structures like stacks, queues, hash tables, and
even for representing graphs (e.g., adjacency matrix).
Sorting and Searching Algorithms: Many sorting algorithms (like Bubble Sort,
Selection Sort, Insertion Sort, Merge Sort, Quick Sort) and searching algorithms
(like Linear Search, Binary Search) operate directly on arrays.
Let's delve into two crucial applications of arrays: searching and sorting.
1. Start in the Middle: The algorithm begins by comparing the target element with
the middle element of the sorted array.
If the target element is smaller than the middle element, the search
continues in the left half of the array.
If the target element is larger than the middle element, the search
continues in the right half of the array.
3. Repeat: This process is repeated on the selected half until the element is found
or the search space is exhausted.
Prerequisite: The array must be sorted for Binary Search to work correctly.
Time Complexity: The time complexity of Binary Search is O(log n). This is because
with each comparison, the search space is halved, significantly reducing the number of
operations required as the input size grows. For an array of 1 million elements, a linear
search might take up to 1 million comparisons in the worst case, while a binary search
would take at most about 20 comparisons (log₂ 1,000,000 ≈ 19.9).
Example Implementation in C (Iterative):
#include <stdio.h>
if (arr[mid] == target) {
return mid; // Element found, return its index
} else if (arr[mid] < target) {
low = mid + 1; // Target is in the right half
} else {
high = mid - 1; // Target is in the left half
}
}
return -1; // Element not found
}
int main() {
int arr[] = {2, 5, 8, 12, 16, 23, 38, 56, 72, 91};
int size = sizeof(arr) / sizeof(arr[0]);
int target = 23;
if (result != -1) {
printf("Element %d found at index %d\n", target, result);
} else {
printf("Element %d not found in the array\n", target);
}
target = 10;
result = binarySearch(arr, size, target);
if (result != -1) {
printf("Element %d found at index %d\n", target, result);
} else {
printf("Element %d not found in the array\n", target);
}
return 0;
}
1. Insertion Sort
Insertion Sort is a simple sorting algorithm that builds the final sorted array (or list)
one item at a time. It is much less efficient on large lists than more advanced
algorithms such as quicksort, heapsort, or merge sort. However, it has some
advantages:
Efficient for Small Data Sets: It is efficient for small data sets or data sets that
are already substantially sorted.
Imagine you have a hand of cards, and you want to sort them. You pick up one card at
a time and insert it into its correct position among the cards already in your hand.
Insertion sort works similarly:
3. For each subsequent element in the unsorted part, it is picked and compared
with elements in the sorted part.
4. Elements in the sorted part that are greater than the picked element are shifted
one position to the right to make space.
Time Complexity:
Worst Case: O(n^2) (e.g., reverse sorted array) - Each element might need to be
compared with and shifted past all elements in the sorted part.
Best Case: O(n) (e.g., already sorted array) - Only one comparison is needed for
each element.
Example Implementation in C:
#include <stdio.h>
int main() {
int arr[] = {12, 11, 13, 5, 6};
int n = sizeof(arr) / sizeof(arr[0]);
insertionSort(arr, n);
2. Merge Sort
1. Divide: The unsorted list is divided into n sublists, each containing one element
(a list of one element is considered sorted).
Time Complexity: The time complexity of Merge Sort is O(n log n) in all cases (worst,
average, and best). This makes it a very reliable sorting algorithm, especially for large
datasets.
Space Complexity: Merge Sort typically requires O(n) auxiliary space because it
needs temporary arrays during the merging process. This can be a disadvantage for
memory-constrained environments.
Example Implementation in C:
#include <stdio.h>
#include <stdlib.h>
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr) / sizeof(arr[0]);
3. Quick Sort
Quick Sort is another highly efficient, comparison-based sorting algorithm that also
follows the divide-and-conquer paradigm. It is generally considered one of the fastest
sorting algorithms in practice for large datasets. Its efficiency comes from its ability to
reduce the problem into smaller, independent sub-problems.
1. Choose a Pivot: Select an element from the array, called the 'pivot'. The choice
of pivot significantly impacts performance. Common strategies include picking
the first, last, middle, or a random element.
2. Partition: Rearrange the array such that all elements smaller than the pivot
come before it, and all elements greater than the pivot come after it. Elements
equal to the pivot can go on either side. After partitioning, the pivot is in its final
sorted position.
Time Complexity:
Worst Case: O(n^2) (e.g., already sorted array or reverse sorted array, if a bad
pivot choice is made consistently) - This happens when the partition always
results in one sub-array with n-1 elements and another with 0 elements.
Best Case: O(n log n) (e.g., pivot always divides the array into two roughly equal
halves).
Average Case: O(n log n) - In practice, Quick Sort performs very well on average.
Example Implementation in C:
#include <stdio.h>
int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);
quickSort(arr, 0, n - 1);
// Accessing characters
printf("First character of str2: %c\n", str2[0]); // Output: H
C provides a rich set of library functions in the <string.h> header for manipulating
strings. Some of the most commonly used functions include:
strlen(str) : Returns the length of the string str , excluding the null
terminator.
#include <stdio.h>
#include <string.h>
int main() {
char s1[50] = "Programming";
char s2[50] = " in C";
char s3[50];
int len;
// strlen example
len = strlen(s1);
printf("Length of s1: %d\n", len); // Output: 11
// strcpy example
strcpy(s3, s1);
printf("s3 after strcpy: %s\n", s3); // Output: Programming
// strcat example
strcat(s1, s2);
printf("s1 after strcat: %s\n", s1); // Output: Programming in C
// strcmp example
char s4[] = "apple";
char s5[] = "banana";
char s6[] = "apple";
return 0;
}
Understanding strings as character arrays and utilizing the standard library functions
is crucial for effective text processing in C. While simple, they form the basis for more
complex text manipulation and parsing tasks.
Unit 3: Linked List
A Linked List is a linear data structure, similar to an array, but it stores elements non-
contiguously. Instead of storing data in adjacent memory locations, a linked list stores
elements at arbitrary locations and links them together using pointers. Each element
in a linked list is called a node. A node typically consists of two parts:
2. Pointer (or Link) Part: Stores the address of the next node in the sequence.
The first node of the linked list is called the head. The head pointer stores the address
of the first node. If the list is empty, the head pointer is NULL . The last node in the
linked list points to NULL , indicating the end of the list.
Dynamic Size: Unlike arrays, linked lists can grow or shrink in size during
runtime. Memory is allocated dynamically as needed.
More Memory Overhead: Each node requires extra memory for the pointer part,
which can be a disadvantage compared to arrays for storing simple data types.
No Random Access: Accessing an element takes O(n) time in the worst case.
Linked lists come in various forms, each with its own advantages and use cases:
This is the simplest form of a linked list, as described above. Each node contains data
and a pointer to the next node. Traversal is possible only in one direction (forward).
struct Node {
int data;
struct Node* next;
};
In a doubly linked list, each node contains data, a pointer to the next node, and a
pointer to the previous node. This allows for traversal in both forward and backward
directions.
Advantages:
Deletion of a given node is more efficient as you have access to the previous node
directly.
Disadvantages:
struct Node {
int data;
struct Node* prev;
struct Node* next;
};
In a circular linked list, the last node points back to the first node (head), forming a
circle. This can be a singly or doubly linked list. There is no NULL pointer at the end of
the list.
Advantages:
Disadvantages:
Care must be taken to avoid infinite loops during traversal if not handled
correctly.
struct Node {
int data;
struct Node* next;
}; // The last node's next pointer points to the head
Let's explore the fundamental operations performed on a singly linked list. These
operations form the building blocks for more complex linked list applications.
#include <stdio.h>
#include <stdlib.h>
struct Node {
int data;
struct Node* next;
};
return head;
}
int main() {
struct Node* head = NULL; // Start with an empty list
return 0;
}
2. Traverse:
Traversing a linked list means visiting each node in the list, typically from the head to
the tail. This operation is essential for printing the list, searching for an element, or
performing any operation on all elements.
(See printList function in the insertAtBeginning example above for traversal
implementation.)
Inserting a node can be done at the beginning (as shown above), after a specific node,
or at the end of the list.
// 3. This new node is going to be the last node, so make its next as NULL
new_node->next = NULL;
// 4. If the Linked List is empty, then make the new node as head
if (head == NULL) {
head = new_node;
return head;
}
4. Delete:
Deleting a node from a linked list involves removing a specific node, which can be the
head, a node in the middle, or the tail. The key is to correctly update the pointers of
the surrounding nodes.
// Search for the key to be deleted, keep track of the previous node
// as we need to change 'prev->next'
while (temp != NULL && temp->data != key) {
prev = temp;
temp = temp->next;
}
5. Search:
Searching for an element in a linked list involves traversing the list from the head and
comparing the data of each node with the target value until a match is found or the
end of the list is reached.
Linked lists are used in various applications where dynamic data storage and efficient
insertions/deletions are required. One significant application is the representation of
polynomials.
Using a linked list, each node can represent a term in the polynomial. A node would
typically contain:
struct PolyNode {
int coeff;
int exp;
struct PolyNode* next;
};
This dynamic nature of linked lists makes them a suitable choice for representing
sparse polynomials (polynomials with many zero terms) and for performing algebraic
operations on them.
Unit 4: Stack
A Stack is a linear data structure that follows a particular order in which operations are
performed. The order is LIFO (Last In, First Out) or FILO (First In, Last Out). This
means the element that is inserted last is the first one to be removed. Think of a stack
of plates: you can only add a new plate to the top, and you can only remove the
topmost plate. The plate that was put on last is the first one you take off.
LIFO Principle: The last element added is the first one to be removed.
Single End Operations: All operations (insertion and deletion) occur at one end,
called the top of the stack.
Abstract Data Type: A stack is an ADT, meaning its behavior is defined by its
operations, not by its implementation.
pop() : Removes the element from the top of the stack. It also returns the
removed element.
peek() or top() : Returns the top element of the stack without removing it.
isEmpty() : Checks if the stack is empty. Returns true if empty, false otherwise.
isFull() : Checks if the stack is full. This operation is relevant only for fixed-size
(array-based) stack implementations.
Stacks can be implemented in two primary ways: using arrays (static implementation)
or using linked lists (dynamic implementation).
Static Implementation (Array-based Stack)
Advantages:
Simple to implement.
Disadvantages:
Fixed size: The maximum size of the stack must be defined at compile time. If the
stack overflows (attempts to push an element onto a full stack), it can lead to
errors.
Wasted space: If the stack is not always full, some memory allocated for the
array might be unused.
struct Stack {
int arr[MAX_SIZE];
int top;
};
A linked list-based stack uses a singly linked list to store the stack elements. The push
and pop operations are performed at the head of the linked list, as this is the most
efficient place for insertions and deletions in a singly linked list (O(1) time complexity).
The head pointer of the linked list acts as the top of the stack.
Advantages:
Dynamic size: The stack can grow or shrink as needed, limited only by available
memory. No overflow issues due to fixed size.
No wasted space: Memory is allocated only when an element is pushed onto the
stack.
Disadvantages:
Memory overhead: Each node requires extra memory for the pointer.
struct Node {
int data;
struct Node* next;
};
struct Stack {
struct Node* top;
};
Let's look at the implementation of the core stack operations using both array-based
and linked list-based approaches.
Array-based Stack Operations
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
struct Stack {
int arr[MAX_SIZE];
int top;
};
int main() {
struct Stack myStack;
initStack(&myStack);
push(&myStack, 10);
push(&myStack, 20);
push(&myStack, 30);
push(&myStack, 40);
push(&myStack, 50);
push(&myStack, 60); // This will cause overflow
return 0;
}
Linked List-based Stack Operations
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
struct Node {
int data;
struct Node* next;
};
struct Stack {
struct Node* top;
};
int main() {
struct Stack myStack;
initStack(&myStack);
push(&myStack, 10);
push(&myStack, 20);
push(&myStack, 30);
push(&myStack, 40);
push(&myStack, 50);
return 0;
}
Stacks play a crucial role in handling and evaluating arithmetic expressions, especially
when dealing with different notations: infix, prefix, and postfix.
Infix Expression: This is the most common way we write expressions, where
operators are placed between operands (e.g., A + B , (A + B) * C ).
Parentheses are often used to define the order of operations.
Stacks are extensively used for converting expressions from one form to another,
particularly from infix to postfix or prefix.
4. After scanning the entire expression, the final result will be the only element left
on the stack.
Example: Evaluate 2 3 + 4 *
2 2
3 3, 2
+ 5 (2+3)
4 4, 5
* 20 (5*4)
Final Result: 20
struct Stack {
int arr[MAX_SIZE];
int top;
};
int main() {
char exp[] = "23+4*"; // Corresponds to (2+3)*4 = 20
printf("Postfix evaluation of %s: %d\n", exp, evaluatePostfix(exp));
return 0;
}
4.5 Applications
Stacks are fundamental data structures with a wide range of applications in computer
science. Some of the most common and important applications include:
stack (for undo) or a separate stack (for redo). When undo is performed, the action is
popped from the undo stack and pushed onto the redo stack. * Browser History: Web
browsers use a stack to keep track of the pages visited. When you click the 'back'
button, the current page is popped, and the previous page is displayed. *
Backtracking Algorithms: Algorithms that involve exploring multiple paths (like
solving mazes, N-Queens problem, Sudoku solver) often use a stack to keep track of
the current path and to backtrack when a dead end is reached. * Syntax Parsing:
Compilers use stacks to parse the syntax of programming languages, checking for
balanced parentheses, brackets, and braces. * Recursion: Recursive function calls
implicitly use a call stack to manage their state.
Unit 5: Queue
A Queue is a linear data structure that follows the FIFO (First In, First Out) principle.
This means the element that is inserted first is the first one to be removed. Think of a
queue of people waiting in line at a ticket counter: the first person to join the line is the
first person to be served. This behavior is in contrast to a stack, which follows the LIFO
principle.
FIFO Principle: The first element added is the first one to be removed.
Abstract Data Type: Like a stack, a queue is an ADT, defined by its operations.
dequeue() : Removes the element from the front of the queue. It also returns the
removed element.
front() or peek() : Returns the front element of the queue without removing
it.
rear() : Returns the rear element of the queue without removing it.
isEmpty() : Checks if the queue is empty. Returns true if empty, false otherwise.
isFull() : Checks if the queue is full. This operation is relevant only for fixed-
size (array-based) queue implementations.
5.2 Types of Queue
While the basic concept of a queue remains FIFO, there are several variations of
queues, each designed for specific use cases:
This is the most basic form of a queue, where elements are added at the rear and
removed from the front. Once an element is dequeued, the space it occupied is not
immediately reused, leading to a potential issue of
a full queue even if there are empty slots at the beginning (this is addressed by circular
queues).
2. Circular Queue:
Advantages:
3. Priority Queue:
A priority queue is a special type of queue where each element has a priority. Elements
are dequeued based on their priority, not necessarily their arrival order. Elements with
higher priority are served before elements with lower priority. If two elements have
the same priority, they are served according to their order in the queue.
Applications:
Event simulation.
Bandwidth management.
Applications:
Similar to stacks, queues can also be implemented using arrays. This is known as a
static or array-based implementation. We use an array to store the elements and two
pointers, front and rear , to keep track of the front and rear ends of the queue,
respectively.
Initially, both front and rear are set to -1 or 0, indicating an empty queue. When an
element is enqueued, rear is incremented. When an element is dequeued, front is
incremented. A key challenge with linear array-based queues is that front keeps
moving forward, potentially leading to a situation where the queue is logically empty
but rear has reached the end of the array, preventing further enqueues even if there's
space at the beginning. This is where circular queues become beneficial.
struct Queue {
int arr[MAX_QUEUE_SIZE];
int front;
int rear;
};
#include <stdio.h>
#include <stdbool.h>
#define MAX_QUEUE_SIZE 5
struct Queue {
int arr[MAX_QUEUE_SIZE];
int front;
int rear;
};
int main() {
struct Queue myQueue;
initQueue(&myQueue);
enqueue(&myQueue, 10);
enqueue(&myQueue, 20);
enqueue(&myQueue, 30);
enqueue(&myQueue, 40);
enqueue(&myQueue, 50);
enqueue(&myQueue, 60); // This will cause overflow
return 0;
}
Implementing a queue using a linked list provides dynamic sizing. We maintain two
pointers: front (pointing to the first node) and rear (pointing to the last node).
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
struct Node {
int data;
struct Node* next;
};
struct Queue {
struct Node *front, *rear;
};
if (q->rear == NULL) {
// If queue is empty, new node is both front and rear
q->front = newNode;
q->rear = newNode;
} else {
// Add the new node at the end of queue and change rear
q->rear->next = newNode;
q->rear = newNode;
}
printf("Enqueued %d to queue.\n", item);
}
if (q->front == NULL) {
// If front becomes NULL, queue is empty, so rear must also be NULL
q->rear = NULL;
}
free(temp);
return dequeued_item;
}
int main() {
struct Queue myQueue;
initQueue(&myQueue);
enqueue(&myQueue, 10);
enqueue(&myQueue, 20);
enqueue(&myQueue, 30);
enqueue(&myQueue, 40);
enqueue(&myQueue, 50);
return 0;
}
5.5 Applications
Queues are widely used in various computing scenarios where elements need to be
processed in the order they arrive. Some common applications include:
Operating Systems:
CPU Scheduling: Processes waiting for CPU time are often managed in a
queue (e.g., Round Robin scheduling).
Spooling: Print jobs, keyboard buffers, and other I/O operations are
handled using queues.
Disk Scheduling: Requests for disk access are often processed in a queue.
Network Buffering: Routers and switches use queues to buffer data packets
when network traffic is high, ensuring packets are processed in the order they are
received.
Call Center Systems: Calls are typically placed in a queue and answered in the
order they are received.
Web Servers: Requests from web clients are often placed in a queue to be
processed by the server in a FIFO manner.
So far, we have explored linear data structures like arrays, linked lists, stacks, and
queues, where elements are arranged sequentially. Now, we venture into the realm of
non-linear data structures, which allow for more complex relationships between data
elements. The two most prominent non-linear data structures are Trees and Graphs.
Trees: Trees are hierarchical data structures that simulate a tree structure with a root
value and subtrees of children, represented as a set of linked nodes. They are used to
represent data with a hierarchical relationship between elements, such as file systems,
organizational charts, or family trees.
Graphs: Graphs are more general than trees. They consist of a finite set of vertices (or
nodes) and a set of edges that connect pairs of vertices. Graphs are used to model
relationships between objects where there isn't necessarily a hierarchical order, such
as social networks, road networks, or electrical circuits.
Understanding trees and graphs is essential for solving a wide range of real-world
problems, from optimizing network routes to designing efficient search algorithms.
6.2 Concept and types of Binary trees - skewed tree, strictly binary
tree, full binary tree, complete binary tree, expression tree, binary
search tree, Heap
Depth of a Node: The length of the path from the root to that node. The root
node has a depth of 0.
Height of a Node: The length of the longest path from that node to a leaf node.
The height of a leaf node is 0.
A Binary Tree is a special type of tree in which each node can have at most two
children, referred to as the left child and the right child. This constraint makes binary
trees particularly useful and easier to manage than general trees.
struct Node {
int data;
struct Node* left;
struct Node* right;
};
There are several specific types of binary trees, each with unique properties:
1. Skewed Tree:
A skewed tree is a degenerate binary tree where all nodes have only one child, either a
left child or a right child. It essentially behaves like a linked list.
Left Skewed Tree: Every node has only a left child (except the leaf).
Right Skewed Tree: Every node has only a right child (except the leaf).
A strictly binary tree (also known as a proper binary tree or 2-tree) is a binary tree in
which every node has either zero or two children. No node has only one child.
A full binary tree is a binary tree in which every node has either zero or two children,
and all leaf nodes are at the same level. This means all levels, except possibly the last,
are completely filled, and all nodes are as far left as possible.
A complete binary tree is a binary tree in which all levels are completely filled, except
possibly the last level, which is filled from left to right. This means that all nodes are as
far left as possible.
5. Expression Tree:
An expression tree is a binary tree used to represent arithmetic expressions. Internal
nodes are operators, and leaf nodes are operands. The order of operations is naturally
represented by the tree structure.
Its left child would be the + operator, with A and B as its children.
A Binary Search Tree is a special type of binary tree that maintains a specific ordering
property: for every node, all values in its left subtree are less than the node's value,
and all values in its right subtree are greater than the node's value. This property
makes BSTs highly efficient for searching, insertion, and deletion operations.
The left subtree of a node contains only nodes with keys lesser than the node's
key.
The right subtree of a node contains only nodes with keys greater than the node's
key.
The left and right subtrees must also be binary search trees.
Advantages:
Disadvantages:
Worst-case performance can degrade to O(n) if the tree becomes skewed (like a
linked list).
7. Heap:
A Heap is a specialized tree-based data structure that satisfies the heap property. In a
Max-Heap, for any given node C, if P is a parent of C, then the value of P is greater than
or equal to the value of C. In a Min-Heap, the value of P is less than or equal to the
value of C.
Heaps are typically implemented using an array, taking advantage of the fact that a
complete binary tree can be efficiently represented in an array. They are crucial for
implementing priority queues and for the Heap Sort algorithm.
Tree traversal refers to the process of visiting each node in the tree exactly once. There
are three common ways to traverse a binary tree:
In this traversal, we first visit the left subtree, then the root node, and finally the right
subtree. For a Binary Search Tree, an inorder traversal will always visit the nodes in
ascending order of their values.
Steps:
Example (for BST): If you have a BST with values, inorder traversal will print them in
sorted order.
C Implementation:
In this traversal, we first visit the root node, then the left subtree, and finally the right
subtree. Preorder traversal is often used to create a copy of the tree or to get the prefix
expression of an expression tree.
Steps:
C Implementation:
In this traversal, we first visit the left subtree, then the right subtree, and finally the
root node. Postorder traversal is often used to delete a tree (deleting children before
the parent) or to get the postfix expression of an expression tree.
Steps:
C Implementation:
A Graph is a non-linear data structure that consists of a finite set of vertices (or nodes)
and a set of edges that connect pairs of vertices. Graphs are used to model
relationships between objects, where the relationships can be complex and non-
hierarchical. They are incredibly versatile and can represent a wide variety of real-
world systems, such as social networks, transportation networks, computer networks,
and even dependencies between tasks.
Directed Edge: An edge with a specific direction, meaning the connection goes
from one vertex to another in a one-way fashion (e.g., a one-way street).
Weighted Graph: A graph where each edge has a numerical value (weight)
associated with it. This weight can represent distance, cost, time, capacity, etc.
Adjacent Vertices: Two vertices are adjacent if they are connected by an edge.
Multiple Edges: More than one edge connecting the same pair of vertices.
Graphs can be represented in computer memory using various techniques. The two
most common representations are the Adjacency Matrix and the Adjacency List.
Adjacency Matrix
An adjacency matrix is a square matrix (2D array) used to represent a finite graph. The
size of the matrix is V x V, where V is the number of vertices in the graph. Each cell
matrix[i][j] stores information about the connection between vertex i and vertex
j.
For a weighted graph, matrix[i][j] stores the weight of the edge from vertex
i to vertex j , and 0 or infinity if there is no edge.
Consider a graph with 4 vertices (0, 1, 2, 3) and edges (0,1), (0,2), (1,2), (2,3).
0 1 2 3
0 |0 1 1 0|
1 |1 0 1 0|
2 |1 1 0 1|
3 |0 0 1 0|
Advantages:
Easy to implement.
Checking for an edge: Checking if an edge exists between two vertices (i, j) is
O(1) (just check matrix[i][j] ).
Disadvantages:
Space complexity: O(V^2), which can be very inefficient for sparse graphs
(graphs with relatively few edges compared to the number of vertices). Even if
there are few edges, the matrix still requires V^2 space.
Finding all neighbors: To find all neighbors of a vertex, you need to iterate
through an entire row (O(V)).
Adjacency List
An adjacency list is a collection of linked lists or arrays. For each vertex u in the graph,
there is a list that contains all the vertices v such that there is an edge from u to v .
This is generally the preferred representation for sparse graphs.
Using the same graph with 4 vertices (0, 1, 2, 3) and edges (0,1), (0,2), (1,2), (2,3).
0: -> 1 -> 2
1: -> 0 -> 2
2: -> 0 -> 1 -> 3
3: -> 2
return graph;
}
int main() {
int V = 5;
struct Graph* graph = createGraph(V);
addEdge(graph, 0, 1);
addEdge(graph, 0, 4);
addEdge(graph, 1, 2);
addEdge(graph, 1, 3);
addEdge(graph, 1, 4);
addEdge(graph, 2, 3);
addEdge(graph, 3, 4);
printGraph(graph);
return 0;
}
Advantages:
Space efficiency: O(V + E), where E is the number of edges. This is much more
efficient for sparse graphs than an adjacency matrix.
Finding all neighbors: Efficiently find all neighbors of a vertex (just traverse its
linked list).
Disadvantages:
Checking for an edge: Checking if an edge exists between two vertices (i, j) is
O(V) in the worst case (you might have to traverse the entire list for vertex i).
6.6 Graph Traversals - Breadth First Search and Depth First Search
Graph traversal algorithms are used to visit every vertex and edge in a graph. The two
most common graph traversal algorithms are Breadth-First Search (BFS) and Depth-
First Search (DFS).
Breadth-First Search (BFS)
BFS is an algorithm for traversing or searching tree or graph data structures. It starts at
the tree root (or some arbitrary node of a graph, sometimes referred to as a 'search
key') and explores all of the neighbor nodes at the present depth prior to moving on to
the nodes at the next depth level. BFS uses a queue data structure to keep track of the
nodes to visit.
1. Start by putting any one of the graph's vertices at the back of a queue.
2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which are not yet in the
visited list to the back of the queue.
Applications of BFS:
Web crawlers.
struct Queue {
int arr[MAX_Q_SIZE];
int front, rear;
};
struct AdjList {
struct AdjListNode *head;
};
struct Graph {
int V;
struct AdjList* array;
};
newNode = newAdjListNode(src);
newNode->next = graph->array[dest].head;
graph->array[dest].head = newNode;
}
// BFS function
void BFS(struct Graph* graph, int startVertex) {
bool* visited = (bool*) malloc(graph->V * sizeof(bool));
for (int i = 0; i < graph->V; i++)
visited[i] = false;
struct Queue q;
initQueue(&q);
visited[startVertex] = true;
enqueue(&q, startVertex);
while (!isQEmpty(&q)) {
int currentVertex = dequeue(&q);
printf("%d ", currentVertex);
int main() {
struct Graph* graph = createGraph(6);
addEdge(graph, 0, 1);
addEdge(graph, 0, 2);
addEdge(graph, 1, 3);
addEdge(graph, 1, 4);
addEdge(graph, 2, 4);
addEdge(graph, 3, 5);
addEdge(graph, 4, 5);
BFS(graph, 0);
return 0;
}
DFS is an algorithm for traversing or searching tree or graph data structures. The
algorithm starts at the root (or some arbitrary node) and explores as far as possible
along each branch before backtracking. DFS uses a stack data structure (implicitly
through recursion or explicitly with an iterative approach).
2. Take the top item of the stack and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Push the ones which are not yet in
the visited list to the top of the stack.
Applications of DFS:
Topological sorting.
Pathfinding.
struct AdjList {
struct AdjListNode *head;
};
struct Graph {
int V;
struct AdjList* array;
};
newNode = newAdjListNode(src);
newNode->next = graph->array[dest].head;
graph->array[dest].head = newNode;
}
int main() {
struct Graph* graph = createGraph(6);
addEdge(graph, 0, 1);
addEdge(graph, 0, 2);
addEdge(graph, 1, 3);
addEdge(graph, 1, 4);
addEdge(graph, 2, 4);
addEdge(graph, 3, 5);
addEdge(graph, 4, 5);
DFS(graph, 0);
return 0;
}
Trees and Graphs are incredibly powerful and versatile data structures with a vast
array of applications across various domains of computer science and beyond.
Applications of Trees:
Compilers: Parse trees (syntax trees) are used by compilers to represent the
syntactic structure of source code.
Decision Making: Decision trees are used in machine learning for classification
and regression tasks.
Network Routing: Trees can represent network topologies, and spanning trees
are used in network protocols.
Heaps: Used to implement priority queues and the Heap Sort algorithm.
Applications of Graphs:
World Wide Web: Web pages can be considered vertices, and hyperlinks
between them are directed edges. Used by search engines for ranking pages.
We have already covered some fundamental searching and sorting algorithms in Unit 2
(Binary Search, Insertion Sort, Merge Sort, Quick Sort) as applications of arrays. This
unit will serve as a dedicated section to reinforce and potentially introduce other
important algorithms in these categories, emphasizing their principles, complexities,
and practical considerations.
7.1 Searching Algorithms
Searching algorithms are used to find the location of a target element within a data
structure. The efficiency of a search algorithm depends heavily on the organization of
the data.
Linear search is the simplest searching algorithm. It sequentially checks each element
of the list until a match is found or the end of the list is reached.
How it Works:
3. If they match, the search is successful, and the index of the element is returned.
Time Complexity:
Advantages:
Simple to implement.
Disadvantages:
C Implementation:
#include <stdio.h>
int main() {
int arr[] = {10, 20, 80, 30, 60, 50, 110, 100, 130, 170};
int n = sizeof(arr) / sizeof(arr[0]);
int target = 50;
if (result != -1) {
printf("Element %d found at index %d\n", target, result);
} else {
printf("Element %d not found in the array\n", target);
}
target = 99;
result = linearSearch(arr, n, target);
if (result != -1) {
printf("Element %d found at index %d\n", target, result);
} else {
printf("Element %d not found in the array\n", target);
}
return 0;
}
Binary Search
(Refer back to Unit 2.2.1 for detailed explanation and implementation of Binary
Search.)
(Refer back to Unit 2.2.2 for detailed explanation and implementation of Insertion
Sort.)
Merge Sort
(Refer back to Unit 2.2.2 for detailed explanation and implementation of Merge Sort.)
Quick Sort
(Refer back to Unit 2.2.2 for detailed explanation and implementation of Quick Sort.)
Bubble Sort
Bubble Sort is a simple sorting algorithm that repeatedly steps through the list,
compares adjacent elements and swaps them if they are in the wrong order. The pass
through the list is repeated until no swaps are needed, which indicates that the list is
sorted.
How it Works:
1. Starting from the beginning of the list, compare the first two elements.
3. Move to the next pair of elements and repeat the comparison and swap.
4. Continue this process until the end of the list. After the first pass, the largest
element will be at the end.
5. Repeat the entire process for the remaining unsorted part of the list, excluding
the last element (which is now sorted).
Time Complexity:
Disadvantages:
C Implementation:
#include <stdio.h>
int main() {
int arr[] = {64, 34, 25, 12, 22, 11, 90};
int n = sizeof(arr) / sizeof(arr[0]);
bubbleSort(arr, n);
Selection Sort is an in-place comparison sorting algorithm. It divides the input list into
two parts: a sorted sublist of items built up from left to right at the front (left) of the
list, and an unsorted sublist of the remaining items that occupy the rest of the list.
How it Works:
1. Find the minimum element in the unsorted array and place it at the beginning.
2. For the first position in the unsorted part, search the entire unsorted part for the
smallest element.
3. Swap the smallest element with the element at the current position.
4. Repeat the process for the next position in the unsorted part until the entire array
is sorted.
Time Complexity:
Advantages:
Simple to implement.
Disadvantages:
C Implementation:
#include <stdio.h>
int main() {
int arr[] = {64, 25, 12, 22, 11};
int n = sizeof(arr) / sizeof(arr[0]);
selectionSort(arr, n);
Heap Sort
Heap Sort is a comparison-based sorting technique based on the Binary Heap data
structure. It is an in-place sorting algorithm and is not stable. It works by first building
a max-heap (or min-heap) from the input data, and then repeatedly extracting the
maximum (or minimum) element from the heap and rebuilding the heap.
How it Works:
1. Build a Max-Heap: Convert the input array into a max-heap. In a max-heap, the
largest element is always at the root.
2. Extract Elements: Repeatedly extract the maximum element from the heap
(which is the root), and place it at the end of the sorted portion of the array. After
extraction, the heap property is restored by calling heapify on the reduced
heap.
Advantages:
In-place sorting.
Disadvantages:
Can be slower in practice than Quick Sort due to less optimal cache performance.
C Implementation:
#include <stdio.h>
int main() {
int arr[] = {12, 11, 13, 5, 6, 7};
int n = sizeof(arr) / sizeof(arr[0]);
This unit will cover the final aspects of the book, including a summary of key
takeaways, suggestions for further learning, and a concluding message.
Data Structures are about Organization: They provide efficient ways to store,
organize, and manage data, which is crucial for building performant software.
Arrays are Fundamental: While simple, arrays are the basis for many other data
structures and are highly efficient for direct element access.
Linked Lists Offer Flexibility: Their dynamic nature makes them ideal for
scenarios requiring frequent insertions and deletions, overcoming the fixed-size
limitation of arrays.
Stacks and Queues Manage Order: Stacks (LIFO) and Queues (FIFO) are
essential for managing data flow in various applications, from function calls to
task scheduling.
Trees Model Hierarchy: Binary trees, BSTs, and Heaps are powerful for
representing hierarchical data and enabling efficient search and retrieval
operations.
Graphs Model Relationships: Graphs are versatile for representing complex
connections and are fundamental to solving problems in networks, social
systems, and more.
Searching and Sorting are Core Operations: Efficient algorithms for searching
(Binary Search) and sorting (Merge Sort, Quick Sort, Heap Sort) are critical for
data processing and optimization.
The world of data structures and algorithms is vast and continuously evolving. This
book has provided a solid foundation, but there's always more to explore. Here are
some suggestions for your continued learning journey:
Practice, Practice, Practice: The best way to master data structures and
algorithms is by solving coding problems. Websites like LeetCode, HackerRank,
and GeeksforGeeks offer a plethora of problems to test your understanding.
Explore Advanced Data Structures: Delve into more complex data structures
such as AVL Trees, Red-Black Trees, Hash Tables (with different collision
resolution techniques), Tries, and Disjoint Set Unions.
Read More Books and Research Papers: Consult other textbooks and academic
papers to gain deeper insights and different perspectives on various topics.
Congratulations on completing this journey through Data Structures using C! You have
now acquired a fundamental understanding of how data can be efficiently organized
and manipulated in computer programs. This knowledge is not just theoretical; it is
the bedrock upon which all efficient software is built.
As you continue your computer science education and career, you will find that the
principles and techniques learned here are indispensable. Whether you are developing
operating systems, designing databases, creating artificial intelligence applications, or
building web services, a strong grasp of data structures and algorithms will empower
you to write more efficient, scalable, and robust code.
Remember, the journey of learning is continuous. Keep exploring, keep practicing, and
keep building. The ability to choose the right data structure and algorithm for a given
problem is a hallmark of a skilled computer scientist. We hope this book has ignited
your passion for efficient programming and provided you with the tools to tackle
complex computational challenges. Happy coding!
Author: Manus AI
References:
[5] Data Structures and Algorithms in C++ by Michael T. Goodrich, Roberto Tamassia,
David M. Mount.