DS Unit 1
DS Unit 1
Data Structure is a way of collecting and organizing data in such a way that we can perform
operations on these data in an effective way.
In simple language, Data Structures are structures programmed to store ordered data, so that
various operations can be performed on it easily.
DATA REPRESENTATION
Data : The term ‘DATA’ simply refers to a a value or a set of values. These values may present
anything about something, like it may be roll no of a student, marks, name of an employee, address of
person etc.
Data item : A data item refers to a single unit of value. For eg. roll no of a student, marks, name of an
employee, address of person etc. are data items. Data items that can be divided into sub items are
called group items (Eg. Address, date, name), where as those who can not be divided in to sub items
are called elementary items (Eg. Roll no, marks, city, pin code etc.).
Entity - with similar attributes ( e.g all employees of an organization) form an entity set
Information: Processed data, Data with given attribute
A 17 M 109cs0132 CSE
Name Age Sex Roll Number Branch
B 18 M 109ee1234 EE
An abstract data type (ADT) refers to a set of data values and associated operations that are specified
accurately, independent of any particular implementation. With an ADT, we know what a specific data
type can do, but how it actually does it is hidden. Simply hiding the implementation
Then we also have some complex Data Structures, which are used to store large and connected data. Some
example of Abstract Data Structure are :
● Array
● Linked List
● Stack
● Queue
● Tree
● Graph
All these data structures allow us to perform different operations on data. We select these data
structures based on which type of operation is required.
The data in the data structures are processed by certain operations. The particular data structure
chosen largely depends on the frequency of the operation that needs to be performed on the data
structure. The Operations are:
• Traversing
• Searching
• Insertion
• Deletion
• Sorting
• Merging
(1) Traversing: Accessing each record exactly once so that certain items in the record may be
processed.
(2) Searching: Finding the location of a particular record with a given key value, or finding
the location of all records which satisfy one or more conditions.
(5) Sorting: Managing the data or record in some logical order (Ascending or descending
order).
(6) Merging: Combining the record in two different sorted files into a single sorted file.
ARRAYS
Array is a container which can hold fix number of items and these items should be of same type.
Most of the data structures make use of array to implement their algorithms. Following are important
terms to understand the concepts of Array.
● Index − each location of an element in an array has a numerical index which is used to identify
the element.
Array Representation
As per above shown illustration, following are the important points to be considered.
Array Types:
1D array
2D array
Applications of Array :
Storing and accessing data: Arrays are used to store and retrieve data in a specific order. For example,
an array can be used to store the scores of a group of students, or the temperatures recorded by a weather
station.
Sorting: Arrays can be used to sort data in ascending or descending order. Sorting algorithms such as
bubble sort, merge sort, and quick sort rely heavily on arrays.
Searching: Arrays can be searched for specific elements using algorithms such as linear search and
binary search.
Matrices: Arrays are used to represent matrices in mathematical computations such as matrix
multiplication, linear algebra, and image processing.
Stacks and queues: Arrays are used as the underlying data structure for implementing stacks and queues,
which are commonly used in algorithms and data structures.
Graphs: Arrays can be used to represent graphs in computer science. Each element in the array represents
a node in the graph, and the relationships between the nodes are represented by the values stored in the
array.
Dynamic programming: Dynamic programming algorithms often use arrays to store intermediate results
of sub problems in order to solve a larger problem.
Dynamic memory allocation is the process of assigning the memory space during the execution time or the
run time. Reasons and Advantage of allocating memory dynamically: When we do not know how much
amount of memory would be needed for the program beforehand.
INTRODUCTION TO LISTS:
List in data structure is an ordered data structure that stores elements sequentially and can be accessed
by the index of the elements. list in the data structure can store different or same data types elements
depending on the type of programming language that is being used.
LINKED LISTS
Linked List is a linear data structure and it is very common data structure which consists of group of
nodes in a sequence which is divided in two parts. Each node consists of its own data and the address
of the next node and forms a chain. Linked Lists are used to create trees and graphs.
Commonly used operations on a Linked List:
The following operations are performed on a Linked List
Insertion: The insertion operation can be performed in three ways:
Insertion in an empty list
Insertion at the beginning of the list
Insertion at the end of the list
Insertion in between the nodes
Deletion: The deletion operation can be performed in three ways:
Deleting from the Beginning of the list
Deleting from the End of the list
Deleting a Specific Node
Display: This process displays the elements of a linked list.
Doubly Linked List : In a doubly linked list, each node contains two links the first link points to the
previous node and the next link points to the next node in the sequence.
Circular Linked List : In the circular linked list the last node of the list contains the address of the
first node and forms a circular chain.
Linked lists let you insert elements at the beginning and end of the list.
Algorithm is a step by step procedure, which defines a set of instructions to be executed in certain
order to get the desired output. Algorithms are generally created independent of underlying
languages, i.e. an algorithm can be implemented in more than one programming language.
An algorithm is a finite set of instructions or logic, written in order, to accomplish a certain predefined
task. Algorithm is not the complete code or program, it is just the core logic (solution) of a problem,
which can be expressed either as an informal high level description as pseudo code or using a
flowchart.
From data structure point of view, following are some important categories of algorithms
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the below mentioned
characteristics −
• Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and
their input/outputs should be clear and must lead to only one meaning.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes less memory
space. The performance of an algorithm is measured on the basis of following properties:
1. Time Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the Algorithm X are
the two main factors which decide the efficiency of X.
• Time Factor − The time is measured by counting the number of key operations such as
comparisons in sorting algorithm
• Space Factor − The space is measured by counting the maximum memory space required by the
algorithm.
The complexity of an algorithm f(n) gives the running time and / or storage space required by the
algorithm in terms of n as the size of input data.
2. Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the algorithm in
its life cycle. Its the amount of memory space required by the algorithm, during the course of its
execution. Space complexity must be taken seriously for multi- user systems and in situations where
limited memory is available.
Space required by an algorithm is equal to the sum of the following two components −
• A fixed part that is a space required to store certain data and variables that are independent of
the size of the problem. For example simple variables & constant used and program size
etc.
• A variable part is a space required by variables, whose size depends on the size of the problem.
For example dynamic memory allocation, recursion stacks space etc.
An algorithm generally requires space for following components:
• Instruction Space: It is the space required to store the executable version of the program. This
space is fixed, but varies depending upon the number of lines of code in the program.
• Data Space: It is the space required to store all the constants and variables value.
• Environment Space: It is the space required to store the environment information needed to
resume the suspended function.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I) Where C is the fixed part and S(I) is the
variable part of the algorithm which depends on instance characteristic I. Following is a simple
example that tries to explain the concept −
ASYMPTOTIC NOTATIONS:
The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that doesn’t
depend on machine specific constants, and doesn’t require algorithms to be implemented and time
taken by programs to be compared. Asymptotic notations are mathematical tools to represent time
complexity of algorithms for asymptotic analysis. The following 3 asymptotic notations are mostly
used to represent time complexity of algorithms.
1) Θ Notation:
The theta notation bounds a function from above and below, so it defines exact asymptotic behavior. A
simple way to get Theta notation of an expression is to drop low order terms and ignore leading
constants. For example, consider the following expression. 3𝑛3 + 6𝑛2 + 6000 = 𝛩(𝑛3)
Dropping lower order terms is always fine because there will always be a n0 after which
𝛩(𝑛3) beats 𝛩(𝑛2) irrespective of the constants involved. For a given function g(n), we denote Θ(g(n))
is following set of functions.𝛩((𝑔(𝑛)) = {𝑓(𝑛): 𝑡𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐1, 𝑐2 𝑎𝑛𝑑 𝑛0 𝑠𝑢𝑐
𝑡𝑎𝑡 0 <= 𝑐1 ∗ 𝑔(𝑛) <= 𝑓(𝑛) <= 𝑐2 ∗ 𝑔(𝑛) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑛 >= 𝑛0}
The above definition means, if f(n) is theta of g(n), then the value f(n) is always between c1*g(n) and
c2*g(n) for large values of n (n >= n0). The definition of theta also requires that f(n) must be non-
negative for values of n greater than n0.
2. Big O Notation:
The Big O notation defines an upper bound of an algorithm, it bounds a function only from above. For
example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in
worst case. We can safely say that the time complexity of Insertion sort is O(𝑛2). Note that O(𝑛2) also
covers linear time. If we use Θ notation to represent time complexity of Insertion sort, we have to use
two statements for best and worst cases:
𝑂(𝑔(𝑛)) = { 𝑓(𝑛): 𝑡𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐 𝑎𝑛𝑑 𝑛0 𝑠𝑢𝑐 𝑡𝑎𝑡 0 <=
Just as Big O notation provides an asymptotic upper bound on a function, Ω notation provides an
asymptotic lower bound. Ω Notation< can be useful when we have lower bound on time complexity of
an algorithm. As discussed in the previous post, the best case performance of an algorithm is generally
not useful; the Omega notation is the least used notation among all three.
For a given function g(n), we denote by Ω(g(n)) the set of functions.
𝛺 (𝑔(𝑛)) = {𝑓(𝑛): 𝑡𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑐 𝑎𝑛𝑑 𝑛0 𝑠𝑢𝑐 𝑡𝑎𝑡 0 <=
𝑐𝑔(𝑛) <= 𝑓(𝑛) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑛 >= 𝑛0
RECURSION:
The process in which a function calls itself directly or indirectly is called recursion
and the corresponding function is called a recursive function.
Properties of Recursion:
Performing the same operations multiple times with different inputs.
In every step, we try smaller inputs to make the problem smaller.
Base condition is needed to stop the recursion otherwise infinite loop will occur
A Mathematical Interpretation
Let us consider a problem that a programmer has to determine the sum of first n natural numbers, there
are several ways of doing that but the simplest approach is simply to add the numbers starting from 1 to n.
So the function simply looks like this,
approach(1) – Simply adding one by one
f(n) = 1 + 2 + 3 +……..+ n
but there is another mathematical approach of representing this,
approach(2) – Recursive adding
f(n) = 1 n=1
f(n) = n + f(n-1) n>1
There is a simple difference between the approach (1) and approach(2) and that is in approach(2) the
function “ f( ) ” itself is being called inside the function, so this phenomenon is named recursion, and the
function containing recursion is called recursive function, at the end, this is a great tool in the hand of the
programmers to code some problems in a lot easier and efficient way.
int fact(int n)
{
if (n < = 1) // base case
return 1;
else
return n*fact(n-1);
In the above example, the base case for n < = 1 is defined and the larger value of a number can be solved
by converting to a smaller one till the base case is reached