Unit - 1 Introduction to DSA
Unit - 1 Introduction to DSA
Data Structure is a way of collecting and organizing data in such a way that we can perform
operations on these data in an effective way.
In computer science, a data structure is a data organization, management and storage format that
enable efficient access and modification - Wikipedia
Data Structure is not a physical entity or data type (like integer, float) or actual memory, it is
logical specifications or set of rules so that data can be stored in a efficient way
Non-primitive data structure:- They are sophisticated data structure. These are derived
from the primitive data structure. It emphasized a group of homogeneous or heterogeneous data
items. Under non-primitive data structure array, list, and files.
enum (short for enumeration) is a data type that represents a set of named values. It is typically
used to define a list of constants that have a specific meaning or purpose within a program.
Enumerations provide a way to organize related values and improve code readability.
#include<stdio.h>
enum colour{red, blue, green, yellow};
int main()
{
enum colour x;
x= yellow;
printf("Value = %d",x);
return 0;
}
ii) Structure:
A structure is a user defined data type in C/C++. A structure creates a data type that can be used
to group items of possibly different types into a single type.
struct address {
char name[50];
char street[100];
char city[50];
char state[20];
int pin;
};
iii) Union:
Union is a composite data type that allows two or more data types to be accessed via the same
memory location.
union Data {
int intValue;
float floatValue;
char stringValue[20];
};
iv) Typedef:
typedef is a keyword or construct that allows defining a new name for an existing data type. It is
often used to create aliases or alternative names for existing types, which can enhance code
readability and provide a level of abstraction.
struct student
{
char name[20];
int age;
};
typedef struct student stud;
stud s1, s2;
Operations on Data Structure:
The basic operations that are performed on data structures are as follows:
❖ Deletion: Deletion means removal of a data element from a data structure if it is found.
❖ Searching: Searching involves searching for the specified data element in a data
structure.
❖ Traversal: Traversal of a data structure means processing all the data elements present in
it.
❖ Sorting: Arranging data elements of a data structure in a specified order is called sorting.
❖ Merging: Combining elements of two similar data structures to form a new data structure
of the same type, is called merging.
In the programming world, the code becomes so large that unlikely we will have to understand
the code. So we divide the programming code into small chunk and makes the coding pattern
easier to understand. It makes easier of the updating or new implementation in the code.
This is the most important concept of data science. Abstraction Data Types (ADT)
implementation are not visible to the user. It has an inside and outside. The outside is called
Interface and inside is called Implementation. The separation between the inside and outside is
called Abstraction barrier.
Abstract data type is totally theoretical part. It specifies the description of abstract algorithm,
evaluate the data structure and describe the type system of programming languages.
Example of ADT:
Integers are an abstract data type, which is used in addition, multiplication, division and other
mathematical calculation. The computer displays the result, according to the given mathematical
instructions without disclosing how the integers are evaluated by the computer.
malloc():
malloc is a built-in function declared in the header file <stdlib.h>
malloc is the short name for "memory allocation: and is used to dynamically allocate a single
large block of contiguous memory according to the size specified.
Syntax:
Example:
callof():
Example:
realloc()
If the dynamically allocated memory is insufficient or more than required, you can change the
size of previously allocated memory using the realloc() function.
Syntax:
ptr = realloc(ptr, X);
Here, ptr is reallocated with a new size X.
free()
Dynamically allocated memory created with either calloc() or malloc() doesn't get freed on their
own. You much explicitly use free() to release the space.
Syntax:
free(ptr);
This statement frees the space allocated in the memory pointed by ptr.
Algorithm:
An algorithm is a finite set of instructions, written in order to accomplish a certain predefined
task.
Algorithm is not the complete code or program, it is just the core logic of a problem, which can
be expressed either as an informal high-level description as pseudocode or using a flowchart.
Properties of algorithm:
Types of Algorithms:
i) Deterministic and Non-deterministic Algorithm:
Deterministic and non-deterministic algorithms are two different types of algorithms based on
the predictability of their execution and the output they produce.
Deterministic Algorithms:
➢ Deterministic algorithms are those in which the execution steps are precisely defined and
will produce the same output for a given input every time they are run.
➢ These algorithms follow a well-defined sequence of steps, where each step is
deterministic and has a unique outcome.
➢ Examples of deterministic algorithms include sorting algorithms like bubble sort,
insertion sort, and merge sort, as well as mathematical algorithms like Euclid's algorithm
for finding the greatest common divisor.
Non-deterministic Algorithms:
➢ Non-deterministic algorithms are those in which the execution steps are not precisely
defined, and they may produce different outputs for the same input on different runs.
➢ Non-deterministic algorithms are often used in optimization problems, where they
explore various possibilities to find an optimal solution.
➢ Examples of non-deterministic algorithms include randomized algorithms like quicksort
with random pivot selection, genetic algorithms, and simulated annealing.
Non-deterministic algorithms can be used to solve problems that deterministic algorithms may
struggle with or take a long time to solve. However, the non-deterministic nature of these
algorithms means that they may not always produce the optimal solution, and their performance
can vary across different runs.
Examples
The following computer algorithms are based on divide-and-conquer programming approach
• Merge Sort
• Quick Sort
• Sequential Computer
• Parallel Computer
Depending on the architecture of computers, we have two types of algorithms −
• Sequential Algorithm − An algorithm in which some consecutive steps of instructions
are executed in a chronological order to solve a problem.
• Parallel Algorithm − The problem is divided into sub-problems and are executed in
parallel to get individual outputs. Later on, these individual outputs are combined together
to get the final desired output. A parallel algorithm is an algorithm that can execute
several instructions simultaneously on different processing devices and then combine all
the individual outputs to produce the final result.
It's important to note that while heuristic algorithms can provide efficient and practical solutions,
they do not guarantee optimality and may not always produce the best possible solution for a given
problem.
Algorithm Complexity
The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.
1. Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the following
two components −
• A fixed part that is a space required to store certain data and variables, that are independent
of the size of the problem. For example, simple variables and constants used, program size,
etc.
• A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
2. Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run
to completion. Time requirements can be defined as a numerical function T(n), where T(n) can be
measured as the number of steps, provided each step consumes constant time.
Asymptotic Notations
Asymptotic notations are the mathematical notations used to describe the running time of an
algorithm when the input tends towards a particular value or a limiting value.
For example: In bubble sort, when the input array is already sorted, the time taken by the algorithm
is linear i.e. the best case.
But, when the input array is in reverse condition, the algorithm takes the maximum time (quadratic)
to sort the elements i.e. the worst case.
When the input array is neither sorted nor in reverse order, then it takes average time. These
durations are denoted using asymptotic notations.
• Big-O notation
• Omega notation
• Theta notation
Big-O notation represents the upper bound of the running time of an algorithm. Thus, it gives the
worst-case complexity of an algorithm.
Big-O gives the upper bound of a function
O(g(n)) = { f(n): there exist positive constants C and N0 such that 0<=f(n) <= C*G(n) for all
N>= N0}
For any value of n, the running time of an algorithm does not cross the time provided by O(g(n)).
Since it gives the worst-case running time of an algorithm, it is widely used to analyze an
algorithm as we are always interested in the worst-case scenario.
The above expression can be described as a function f(n) belongs to the set Ω(g(n)) if there exists
a positive constant c such that it lies above cg(n), for sufficiently large n.
For any value of n, the minimum time required by the algorithm is given by Omega Ω(g(n)).
Theta notation encloses the function from above and below. Since it represents the upper and the
lower bound of the running time of an algorithm, it is used for analyzing the average-case
complexity of an algorithm.
Θ (g(n)) = { f(n): there exist positive constants C1, C2 and N0 such that 0<=C1*g(n) <= f(n) <=
C2*g(n) for all N>= N0}
The above expression can be described as a function f(n) belongs to the set Θ(g(n)) if there exist
positive constants c1 and c2 such that it can be sandwiched between c1g(n) and c2g(n), for
sufficiently large n.
If a function f(n) lies anywhere in between c1g(n) and c2g(n) for all n ≥ n0, then f(n) is said to be
asymptotically tight bound.
Big O Notation
• It looks for upper bound in the worst case in the given expression
• It always ignores the lower value
• It always ignores the constant value as well
Examples:
Logarithms
Example - 1
a. 2x = 8
Here 2 * 2 * 2 = 8 , so 3 times of two exponent equals to 8, the value of x = 3
23 = 8
In above expression,
2 is Base
3 is Exponent, and
8 is Number
Is equals to
b. Log28 = 3
Here, 2 is base, 8 is number and 3 is exponent
Log of the number to the base is equals to exponent
Example – 2
a. 3x = 81
Here 3 * 3 * 3 * 3 = 81, so value of x = 4
34 = 81
Here, 3 is base, 4 is exponent and 81 is number
Log3 81 = 4
So log of the number to the base is equals to exponent
Note:
Exponent of base is Number is always equals to log of the number of base is exponent
i.e.
aExponent = N is equals to
log a N = Exponent