0% found this document useful (0 votes)
8 views37 pages

Advanced Data Structures

The document provides an overview of advanced data structures, focusing on sets, maps, dictionaries, and linked lists. It explains the properties, types, and operations associated with these data structures, including hash tables and skip lists. Additionally, it discusses various methods for implementing hash functions and their applications in data retrieval and storage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views37 pages

Advanced Data Structures

The document provides an overview of advanced data structures, focusing on sets, maps, dictionaries, and linked lists. It explains the properties, types, and operations associated with these data structures, including hash tables and skip lists. Additionally, it discusses various methods for implementing hash functions and their applications in data retrieval and storage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

ADVANCED DATA

STRUCTURES
Introduction to Set

 In computer science, a set is an abstract data type that can store


unique values, without any particular order. It is a computer
implementation of the mathematical concept of a finite set.
 sets do not allow duplicate elements and often support operations
like adding, removing, and checking for the presence of elements.
 Sets can be implemented using a variety of data structures, including
arrays, linked lists, binary search trees, and hash tables
 The SET data type is an unordered collection type that stores unique
elements. The number of elements in a SET data type can vary, but
no nulls are allowed.
A set can be implemented in various ways but the most common ways are:
 Hash-Based Set: the set is represented as a hash table where each element in
the set is stored in a bucket based on its hash code.
 Tree-based set: In this implementation, the set is represented as a binary
search tree where each node in the tree represents an element in the set.

Types of Set Data Structure: The set data structure can be classified into the
following two categories:
 An unordered set is an unordered associative container implemented using a
hash table where keys are hashed into indices of a hash table so that the
insertion is always randomized.
 All operations on the unordered set take constant time O(1) on an average which
can go up to linear time O(n) in the worst case which depends on the internally
used hash function.
 An Ordered set is the common set data structure we are familiar with. It is
generally implemented using balanced BSTs and it supports O(log n) lookups,
insertions and deletion operations.
Properties of Set Data Structure:
 Storing order – The set stores the elements in sorted order.
 Values Characteristics – All the elements in a set have unique
values .
 Values Nature – The value of the element cannot be modified once
it is added to the set, though it is possible to remove and then add
the modified value of that element. Thus, the values are immutable.
 Search Technique – Sets follow the Binary search
tree implementation and Hash table implementation.
 Arranging order – The values in a set are unindexed .
Insert an element with BST implementation

Hash table implementations


 Remove an element
minimum/maximum element

 begin – Returns an iterator to the first element in the set.


 end – Returns an iterator to the theoretical element that follows the last
element in the set.
 size – Returns the number of elements in the set.
 max_size – Returns the maximum number of elements that the set can
hold.
 empty – Returns whether the set is empty.
MAPS
 MapMap data structure (also known as a hash map ) is defined as a data
structure that stores a collection of key-value pairs, where each key is
associated with a single value.
 Maps provide an efficient way to store and retrieve data based on a unique
identifier (the key).
 An ordered map maintains the order in which key-value pairs are inserted.
This means that iterating over the map will return the pairs in the order they
were added. These are implemented by Self-balancing tree, skip list.
 An unordered map does not maintain the order of key-value pairs. The order
in which elements are returned during iteration is not guaranteed and may
vary across different implementations or executions. Implemented by Hash
table.
 Example:
 Imagine a map storing student IDs and their corresponding names. The
student ID would be the key, and the student's name would be the value.
Map data structure is typically implemented as an Associative array or hash table, which uses a
hash function to compute a unique index for each key-value pair. This index is then used to store and
retrieve the value associated with that key.

When a new key-value pair is added to the Map, the hash function is applied to the key to compute
its index, and the value is stored at that index. If there is already a value stored at that index, then the
new value replaces the old one.
Operations on Map Data Structures:
 A map is a data structure that allows you to store key-value pairs. Here are
some common operations that you can perform with a map:
 Insert: we can insert a new key-value pair into the map and can assign a
value to the key.
 Retrieve: we can retrieve the value associated with a key and can pass in
the key as an argument.
 Update: we can update the value associated with a key and can assign a
new value to the key.
 Delete: we can delete a key-value pair from the map by using
the erase() method and passing in the key as an argument.
 Lookup: we can look up if a key exists in the map by using
the count() method or by checking if the value associated with the key is
equal to the default value.
 Iteration: we can iterate over the key-value pairs in the map by using
a for loop or an iterator.
 Sorting: Depending on the implementation of the map, we can sort the
key-value pairs based on either the keys or the values.
Key Characteristics of Maps:
 Key-Value Pairs:
Maps store data as pairs, where each key is linked to a corresponding value.
 Uniqueness of Keys:
Each key within a map must be unique, ensuring that a value can be easily accessed using its
key.
 Efficient Lookups:
A primary feature of maps is their ability to quickly retrieve a value given its key. This is often
achieved with a time complexity of O(1) on average, making them very efficient for searching
and retrieving data.
 Flexibility:
Maps can store a wide range of data types as both keys and values, offering flexibility in
representing various relationships between data.
 Dynamic Size:
Maps can typically grow or shrink as needed to accommodate the addition or removal of key-
value pairs.
 Implementation Variations:
Maps can be implemented using different underlying data structures like hash tables or trees,
which affect their performance characteristics (e.g., insertion and deletion time complexity).
 Order Preservation:
Some map implementations, like LinkedHashMap in Java, preserve the order in which key-
value pairs are inserted, while others, like Hash Map, do not.
 Applications:
Maps are widely used in various applications, including symbol tables in compilers, caching,
database indexing, and more.
Dictionary
A dictionary in data structure supports a wide variety of operations using
a wide variety of methods and functions. The keys are always unique
within a dictionary. The values of the dictionary in the data structure
may or may not be unique. We can put heterogeneous type values
inside the dictionaries.
A Dictionary is a collection of key-value pairs, where each key is unique
and associated with a value. Lists maintain the order of the elements
they contain. Dictionaries do not maintain the order of the elements
they contain.
Dictionaries support various operations, including: Adding new key-value
pairs.
 Retrieving a value given its key.
 Updating the value associated with an existing key.
 Deleting a key-value pair.
 Checking if a key exists in the dictionary
Key-Value Pairs:
 Dictionaries store data in pairs, where each key is associated with a
specific value.
Uniqueness of Keys:
 A key in a dictionary must be unique; no two entries can have the same
key. This allows for efficient retrieval of values.
Accessing Values:
 Values are accessed using their corresponding keys. For example, if you
have a dictionary of student names and their grades, you can easily
retrieve a student's grade by using their name as the key.
 Examples:
 Dictionaries can be used to represent various real-world scenarios, such
as: A phone book where names are keys and phone numbers are
values.
 A library catalog where book titles are keys and book details are values.
 A website's user data where usernames are keys and user profiles are
values.
Linear List
 Linked list is a linear data structure, meaning that one data point
follows another. It's a list of values that could be stored at non-
contiguous locations in memory, called nodes, connected by links.
Each node contains data and a pointer to the next node. Unlike
arrays, linked lists don't allow random access. All access is sequential.
 Node: a record in a linked list that contains a data field and a
reference, a self-referential structure.
 Next pointer: the field of a node that contains a reference to the next
node.

 Prev pointer: the field of the node that contains a reference to the
revious node.
 Head Node: the first node of the linked list.
 Tail Node: the last node of the linked list.
Singly-Linked List consists of nodes, starting from head node to NULL, where
each node contains a data field and a next pointer.

Doubly-Linked List consists of nodes, where each node contains a data field,
a next pointer and a prev pointer.

Circular Linked List is similar to a singly-linked list except that the last node
instead of connecting to NULL connects to the first node, creating a ring.
 The following are some basic operations performed on a Single Linked
List:
 Insertion: The insertion operation can be performed in three ways.
They are as follows:
 Inserting At the Beginning of the list
 Inserting At End of the list
 Inserting At Specific location in the list
 Deletion: The deletion operation can be performed in three ways.
They are as follows:
 Deleting from the Beginning of the list
 Deleting from the End of the list
 Deleting a Specific Node
 Search: It is a process of determining and retrieving a specific node
either from the front, the end or anywhere in the list.
skip list

The skip list is an extended version of the linked list. It allows the user to search, remove, and insert

the element very quickly. It consists of a base list that includes a set of elements which maintains the

link hierarchy of the subsequent elements.

Basic Operations
 Following are the Operations Performed on Skip list:-
 Insertion Operation : To Insert any element in a list
 Search Operation : To Search any element in a list
 Deletion Operation : To Delete any element from a list
Insertion Operation
We start from highest level in the list and compare key of next node of
the current node with the key to be inserted. Basic idea is:
 Key of next node is less than key to be inserted then we keep on
moving forward on the same level
 Key of next node is greater than the key to be inserted then we store
the pointer to current node i at update[i] and move one level down
and continue our search.
Example
 Starting with an empty Skip list with MAXLEVEL 4, Suppose we want
to insert these following keys with their "Randomly Generated
Levels":
5 with level 1, 26 with level 1, 25 with level 4, 6 with level 3,
21 with level 1, 3 with level 2, 22 with level 2 .
Searching Operation
 Searching an element is very similar to approach for searching a spot
for inserting an element in Skip list. The basic idea is if –
1. Key of next node is less than search key then we keep on moving
forward on the same level.
2. Key of next node is greater than the key to be inserted then we
store the pointer to current node i at update[i] and move one level
down and continue our search.
Example Of Searching
 Consider this example where we want to search for key 17.
 compare the key values of every node with our search key (ie 17). if
the key of next node is greater than our key 17 then we keep on
moving on that same level otherwise we store the pointer to current
node i at update[i] and move one level down and continue our
search. Here, we will stop at which the key of next node is 19 (ie 17 <
19) and store pointer of that node.
Deletion Operation
 Deletion of an element k is preceded by locating element in the Skip
list .Once the element is located, rearrangement of pointers is done to
remove element from list just like in singly linked list.
Example of deletion
 Consider this example where we want to delete element 6 –
Deletion of an element 6 is preceded by locating this element 6 in the
Skip list using above mentioned search algorithm. Once 6 is located,
rearrangement of pointers is done to remove 6 from list
Practise Problems
 Example 1: Create a skip list, we want to insert these following keys
in the empty skip list.
 6 with level 1.
 29 with level 1.
 22 with level 4.
 9 with level 3.
 17 with level 1.
 4 with level 2
Hash Table Implementation
A hash table is a data structure that allows for quick insertion, deletion,
and retrieval of data. It works by using a hash function to map a key to
an index in an array.
Types of Hash Functions
 Division Method.
 Mid Square Method.
 Folding Method.
 Multiplication Method.
Division Method
 The easiest and quickest way to create a hash value is through
division. The k-value is divided by M in this hash function, and the
result is used.
Formula: h(K) = k mod M
where k = key value and M = the size of the hash table
For example, if the key value is 42 and the size of the hash table is 20.
When we apply the hash function to key 42 then the index would be:
h(42) = 42%20 = 2
Sr.No. Key Hash Array Index

1 1 1 % 20 = 1 1

2 2 2 % 20 = 2 2

3 42 42 % 20 = 2 2

4 4 4 % 20 = 4 4
 2. Mid Square Method:
 The mid-square method is a very good hashing method. It involves
two steps to compute the hash value-
 Square the value of the key k i.e. k2
 Extract the middle r digits as the hash value.
 Formula:
 h(K) = h(k x k)
 Here,
k is the key value.
 The value of r can be decided based on the size of the table.
 Example:
 Suppose the hash table has 100 memory locations. So r = 2 because
two digits are required to map the key to the memory location.
 k = 60
k x k = 60 x 60
= 60
h(60) = 60
 The hash value obtained is 60
 3. Digit Folding Method:
 This method involves two steps:
 Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn,
where each part has the same number of digits except for the last
part that can have lesser digits than the other parts.
 Add the individual parts. The hash value is obtained by ignoring the
last carry if any.
 Formula:
 k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
 Here,
s is obtained by adding the parts of the key k
 Example:
 k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
 4. Multiplication Method
 This method involves the following steps:
 Choose a constant value A such that 0 < A < 1.
 Multiply the key value with A.
 Extract the fractional part of kA.
 Multiply the result of the above step by the size of the hash table i.e. M.
 The resulting hash value is obtained by taking the floor of the result obtained in step
4.
 Formula:
 h(K) = floor (M (kA mod 1))
 Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
 Example:
 k = 12345
A = 0.357840
M = 100
 h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Collision resolution Techniques
 When the two different values have the same value, then the problem
occurs between the two values, known as a collision
Separate Chaining/Open Hashing
 Let's first understand the chaining to resolve the collision.
 Suppose we have a list of key values
Key Location(u)
 A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
3 ((2*3)+3)%10 = 9
The index of key value 6 is:
2 ((2*2)+3)%10 = 7
index = h(6) = (2(6)+3)%10 = 5, The value 6 would be stored at the
index 5. 9 ((2*9)+3)%10 = 1

The index of key value 11 is: 6 ((2*6)+3)%10 = 5


index = h(11) = (2(11)+3)%10 = 5, The value 11 would be stored at the
11 ((2*11)+3)%10 = 5
index 5.
13 ((2*13)+3)%10 = 9
Two values (6, 11) stored at the same index, i.e., 5. This leads to the
collision problem, so we will use the chaining method to avoid the collision. 7 ((2*7)+3)%10 = 7
We will create one more list and add the value 11 to this list. After the
12 ((2*12)+3)%10 = 7
creation of the new list, the newly created list will be linked to the list
having value 6.
Open Addressing/Closed Hashing
 Linear probing is one of the simplest ways to implement Open
Addressing, a method to resolve hashing collisions. The main idea of
linear probing is that we perform a linear search to locate the next
available slot in the hash table when a collision happens.
Quadratic Probing
 In case of linear probing, searching is performed linearly. In contrast,
quadratic probing is an open addressing technique that uses quadratic
polynomial for searching until a empty slot is found.
 It can also be defined as that it allows the insertion ki at first free location
from (u+i2)%m where i=0 to m-1.
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
 The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively.
We do not need to apply the quadratic probing technique on these key
values as there is no occurrence of the collision.
 The index value of 11 is 5, but this location is already occupied by the 6.
So, we apply the quadratic probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
Since location 6 is empty, so the value 11 will be added at the index 6.
Double Hashing
 Double hashing is a collision resolution technique used in hash tables.
It works by using two hash functions to compute two different hash
values for a given key.
 The first hash function is used to compute the initial hash value, and
the second hash function is used to compute the step size for the
probing sequence.
 Double hashing has the ability to have a low collision rate, as it uses
two hash functions to compute the hash value and the step size.
 This means that the probability of a collision occurring is lower than
in other collision resolution techniques such as linear probing or
quadratic probing.
Application
Text compression using dictionary
 Dictionary-based text compression is a widely used technique in data
compression that replaces repeated substrings with shorter representations
using a dictionary of commonly occurring phrases or words.
 A dictionary (or table) stores strings (words or phrases) that occur
multiple times in the text.
 The actual text is replaced by references (indices or codes) pointing to the
dictionary.
Steps
 Initialization: Start with an empty or predefined dictionary.
 Scanning: Read the input text and identify repeated sequences.
 Dictionary Entry: Add unique sequences to the dictionary.
 Substitution: Replace sequences in the text with a pointer (index/code) to
the dictionary.

You might also like