Data Structures
Data Structures
What is Information? If we arrange some data in an appropriate sequence, then it forms a Structure and gives us a meaning. This meaning is called Information. The basic unit of Information in Computer Science is a bit, Binary Digit. So, we found two things in Information: One is Data and the other is Structure. What is Data Structure? 1. A data structure is a systematic way of organizing and accessing data. 2. A data structure tries to structure data! Usually more than one piece of data Should define legal operations on the data The data might be grouped together (e.g. in an linked list) 3. When we define a data structure we are in fact creating a new data type of our own. i.e. using predefined types or previously user defined types. Such new types are then used to reference variables type within a program Why Data Structures? 1. Data structures study how data are stored in a computer so that operations can be implemented efficiently 2. Data structures are especially important when you have a large amount of information 3. Conceptual and concrete ways to organize data for efficient storage and manipulation. Linear data structures: Array: In computer programming, a group of homogeneous elements of a specific data type is known as an array, one of the simplest data structures. Arrays hold a series of data elements, usually of the same size and data type. Individual elements are accessed by their position in the array. The position is given by an index, which is also called a subscript. The index usually uses a consecutive range of integers, (as opposed to an associative array) but the index can have any ordinal set of values
Some arrays are multi-dimensional, meaning they are indexed by a fixed number of integers, for example by a tuple of four integers. Generally, one- and two-dimensional arrays are the most common. Most programming languages have a built-in array data type.
Link List : In computer science, a linked list is one of the fundamental data structures used in computer programming. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes. A linked list is a self-referential data type because it contains a link to another data of the same type. Linked lists permit insertion and removal of nodes at any point in the list in constant time, but do not allow random access.
Types of Link List 1. Linearly-linked List o Singly-linked list o Doubly-linked list 2. Circularly-linked list o Singly-circularly-linked list o Doubly-circularly-linked list
Stack:
A stack is a linear Structure in which item may be added or removed only at one end. There are certain frequent situations in computer science when one wants to restrict insertions and deletions so that they can take place only at the beginning or the end of the end of the list, not in the middle. Two of the Data Structures that are useful in such situations are Stacks and queues. A stack is a list of elements in which elements may be inserted or deleted only at one end, called the Top. This means, in particular, the elements are removed from a stack in the reverse order of that which they are inserted in to the stack. The stack also called "last-in first -out (LIFO) " list.
Special terminology is used for two basic operation associated with stack : 1. "Push" is the term used to insert an element into a stack. 2. "Pop" is the term used to delete an element from a stack.
Queue:
A queue is a linear list of elements in which deletions can take place only at one end, called the " front " and insertion can take place only at the other end, called " rear ". The term " front " and " rear " are used in describing a linear list only when it is implemented as a queue.
Queues are also called " first-in first-out " (FIFO) list. Since the first element in a queue will be the first element out of the queue. In other words, the order in which elements enter in a queue is the order in which they leave. The real life example: the people waiting in a line at Railway ticket Counter form a queue, where the first person in a line is the first person to be waited on. An important example of a queue in computer science occurs in timesharing system, in which programs with the same priority form a queue while waiting to be executed.
Trees:
Data frequently contain a hierarchical relationship between various elements. This non-linear Data structure, which reflects this relationship, is called a rooted tree graph or, tree. This structure is mainly used to represent data containing a hierarchical relationship between elements, e.g. record, family tree and table of contents. A tree consist of a distinguished node r , called the root and zero or more (sub) tree t1 , t2 , ... tn , each of whose roots are connected by a directed edge to r . In the tree of figure, the root is A, Node t 2 has r as a parent and t 2.1 , t 2.2 and t 2.3 as children. Each node may have arbitrary number of children, possibly zero. Nodes with no children are known as leaves.
Graph :
A graph consists of a set of nodes (or Vertices) and a set of arc (or edge). A pair of nodes specifies each arc in a graph. A node n is incident to an arc x if n is one of the two nodes in the ordered pair of nodes that constitute x. The degree of a node is the number of arcs incident to it. The in degree of a node n is the number of arcs that have n as the head, and the out degree of n is the number of arcs that have n as the tail. The graph is the nonlinear data structure. The graph shown in the figure represents 7 vertices and 12 edges. The Vertices are { 1, 2, 3, 4, 5, 6, 7} and the arcs are {(1,2), (1,3), (1,4), (2,4), (2,5), (3,4), (3,6), (4,5), (4,6), (4,7), (5,7), (6,7) }. Node (4) in figure has in degree 3, out degree 3 and degree 6.
Abstract Data Types (ADT's) are a model used to understand the design of a data structure 1. 'Abstract ' implies that we give an implementationindependent view of the data structure 2. ADTs specify the type of data stored and the operations that support the data 3. Viewing a data structure as an ADT allows a programmer to focus on an idealized model of the data and its operations Introduction to Algorithms : An algorithm is a well-defined computational method that takes some value(s) as input and produces some value(s) as output. In other words, an algorithm is a sequence of computational steps that transforms input(s) into output(s). An algorithm is correct if for every input, it halts with correct output. A correct algorithm solves the given problem, where as an incorrect algorithm might not halt at all on some input instance, or it might halt with other than designed answer. Each algorithm must have
Specification: Description of the computational procedure. Pre-conditions: The condition(s) on input. Body of the Algorithm: A sequence of clear and unambiguous instructions. Post-conditions: The condition(s) on output.
Consider a simple algorithm for finding the factorial of n. Algorithm Factorial (n) Step Step Step Step 1: 2: 3: 4: FACT = 1 for i = 1 to n do FACT = FACT * i print FACT
For better understanding conditions can also be defined after any statement, to specify values in particular variables. Pre-condition and post-condition can also be defined for loop, to define conditions satisfied before starting and after completion of loop respectively. What is remain true before execution of the ith iteration of a loop is called "loop invariant". These conditions are useful during debugging proces of algorithms implementation. Moreover, these conditions can also be used for giving correctness proof.
A Sorting Algorithm:
Now, we take a more complex problem called sorting. Problem Definition: Sort given n numbers by nondescending order. There are many sorting algorithm. Insertion sort is a simple algorithm. Insertion Sort: We can assume up to first number is sorted. Then sort up to two numbers. Next, sort up to three numbers. This process continues till we sort all n numbers. Consider the following example of five integer: 79 43 39 58 13: Up to first number, 79, is sorted.
That is, if first (i1) numbers are sorted then insert ith number into its correct position. Shifting numbers right one number at a time till a position for ith number is found can do this. That is, shift number at (i-1)th position to ith position, number in (i-2)th position to (i-1)th position, and so on, till we find a correct position for the number in ith position. This method is depicted in the figure on right side.
Time Complexity:
To measure the time complexity in absolute time unit has the following problems
1. The time required for an algorithm depends on number of instructions executed, which is a complex polynomial. 2. The execution time of an instruction depends on computer's power. Since, different computers take different amount of time for the same instruction. 3. Different types of instructions take different amount of time on same computer.
Complexity analysis technique abstracts away these machine dependent factors . In this approach, we assume all instruction takes constant amount of time for execution. Asymptotic bounds as polynomials are used as a measure of the estimation of the number of instructions to be executed by the algorithm . Three main types of asymptotic order notations are used in practice: - notation : For a given function g(n), is defined as
1.
is said to belong to
, if
there exists positive constants and such that for sufficiently large value of n. For example,. This is because we can find constants such that, for all . is defined as
For example, as for all 3. - notation: This notation provides asymptotic lower , is defined as
For example,
because
, for all
Traversal : Visit every part of the data structure Search : Traversal through the data structure for a given element Insertion : Adding new elements to the data structure Deletion : Removing an element from the data structure. Sorting : Rearranging the elements in some type of order(e.g Increasing or Decreasing) Merging : Combining two similar data structures into one
STACKS:
1. Stack is basically a data object 2. The operational semantic (meaning) of stack is LIFO i.e. last in first out Definition : It is an ordered list of elements n , such that n>0 in which all insertions and deletions are made at one end called the top.
3. Primary operations defined on a stack: 1. PUSH : add an element at the top of the list. 2. POP : remove the at the top of the list. 3. Also "IsEmpty()" and IsFull" function, which tests whether a stack is empty or full respectively. Example :
1. Practical daily life: a pile of heavy books kept in a vertical box, dishes kept one on top of another 2. In computer world: In processing of subroutine calls and returns ; there is an explicit use of stack of return addresses. Also in evaluation of arithmetic expressions , stack is used. Large number of stacks can be expressed using a single one dimensional stack only. Such an array is called a multiple stack array.
Algorithms : Push (item,array , n, top) { if(n>=top) then print(stack is full); else { top=top+1;
array[top]=item; } }
Algorithm: Pop (item, array, top) if ( top<= 0) Then print, " stack is empty". Else { Item=array[top]; top = top - 1; } }
Arithmetic Expressions :
Arithmetic expressions are expressed as combinations of:
1. Operands 2. Operators (arithmetic, Boolean, relational operators) Various rules have been formulated to specify the order of evaluation of combination of operators in any expression. The arithmetic expressions are expressed in 3 different notations: 1. Infix :
In this if the operator is binary; the operator is between the 2 operands. And if the operator is unary, it precedes the operand.
2. Prefix :
In this notation for the case of binary operators, the operator precedes both the operands. Simple algorithm using stack can be used to evaluate the final answer.
3. Postfix :
In this notation for the case of binary operators, the operator is after both the corresponding operands. Simple algorithm using stack can be used to evaluate the final answer.
Always remember that the order of appearance of operands does not change in any Notation. What changes is the position of operators working on those operands. RULES FOR EVALUATION OF ANY EXPRESSION : An expression can be interpreted in many different ways if parentheses are not mentioned in the expression.
For example the below given expression can be interpreted in many different ways: Hence we specify some basic rules for evaluation of any expression :
A priority table is specified for the various type of operators being used: PRIORITY LEVEL 6 5 OPERATORS ** ; unary - ; unary + *;/
4 3 2 1
!> ; !< ;
!=
We assume that the given prefix notation starts with IsEmpty() . If number of symbols = n in any infix expression then number of operations performed = some constant times n. here next token function gives us the next occurring element in the expression in a left to right scan. The PUSH function adds element x to stack Q which is of maximum length n
Algorithm:
We assume that the postfix notation specifies end of expression by appending NULL at the end of expression.
here next token function gives us the next occurring element in the expression in a left to right scan. The PUSH function adds element x to stack Q which is of maximum length n
QUEUES:
Introduction: 1. It is basically a data object 2. Theoperational semantic of queue is FIFO i.e. first in first out Definition : It is an ordered list of elements n , such that n>0 in which all deletions are made at one end called the front end and all insertions at the other end called the rear end . Primary operations defined on a Queue: 1. EnQueue : This is used to add elements into the queue at the back end. 2. DeQueue : This is used to delete elements from a queue from the front end. 3. Also "IsEmpty()" and "IsFull()" can be defined to test whether the queue is Empty or full. Example : 1. PRACTICAL EXAMPLE : A line at a ticket counter for buying tickets operates on above rules 2. IN COMPUTER WORLD : In a batch processing system, jobs are queued up for processing. Circular queue : In a queue if the array elements can be accessed in a circular fashion the queue is a circular queue. Priority queue : Often the items added to a queue have a priority associated with them: this priority determines the order in which they exit the queue - highest priority items are removed first. CIRCULAR QUEUE :
Primary operations defined for a circular queue are : 1. add_circular - It is used for addition of elements to the circular queue. 2. delete_circular - It is used for deletion of elements from the queue.
We will see that in a circular queue , unlike static linear array implementation of the queue ; the memory is utilized more efficient in case of circular queue's. The shortcoming of static linear that once rear points to n which is the max size of our array we cannot insert any more elements even if there is space in the queue is removed efficiently using a circular queue. As in case of linear queue, we'll see that condition for zero elements still remains the same i.e.. rear=front ALGORITHM FOR ADDITION AND DELETION OF ELEMENTS
PRIORITY QUEUE: Queues are dynamic collections, which have some concept of order. This can be either based on order of entry into the queue - giving us First-In-First-Out (FIFO) or Last-In-First-Out (LIFO) queues. Both of these can be built with linked lists: the simplest "add-to-head" implementation of a linked list gives LIFO behavior. A minor modification - adding a tail pointer and adjusting the addition method implementation - will produce a FIFO queue. Often the items added to a queue have a priority associated with them: this priority determines the order in which they exit the queue - highest priority items are removed first. This situation arises often in process control systems. Imagine the operator's console in a large automated factory. It receives many routine messages from all parts of the system: they are assigned a low priority because they just report the normal functioning of the system - they update various parts of the operator's console display simply so that there is some confirmation that there are no problems. It will make little difference if they are delayed or lost. However, occasionally something breaks or fails and alarm messages are sent. These have high priority because some action is required to fix the problem (even if it is mass evacuation because nothing can stop the imminent explosion!). Typically such a system will be composed of many small units, one of which will be a buffer for messages received by the operator's console. The communications system places messages in the buffer so that communications links can be
freed for further messages while the console software is processing the message. The console software extracts messages from the buffer and updates appropriate parts of the display system. Obviously we want to sort messages on their priority so that we can ensure that the alarms are processed immediately and not delayed behind a few thousand routine messages while the plant is about to explode.
LINKED LISTS:
a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes. The principal benefit of a linked list over a conventional array is that the order of the linked items may be different from the order that the data items are stored in memory or on disk, allowing the list of items to be traversed in a different order. A linked list is a self-referential data type because it contains a pointer or link to another datum of the same type. Linked lists permit insertion and removal of nodes at any point in the list in constant time,[1] but do not allow random access. Several different types of linked list exist: singly linked lists, doubly linked lists, and circularly linked lists. Types of linked lists Singly-linked list: The simplest kind of linked list is a singly-linked list (or slist for short), which contains a node having two fields, one is information field and another is link field. This link points to the next node in the list, and last node points to a null value
[A singly-linked list containing two values: the value of the current node
and a link to the next node]
A singly linked list's node is divided into two parts. The first part holds or points to information about the node, and second part hold the address of next node. A singly linked list travels one way. Doubly-linked list: A more sophisticated kind of linked list is a doubly-linked list or two-way linked list. Each node has two links: one points to
the previous node, or points to a null value or empty list if it is the first node; and one points to the next, or points to a null value or empty list if it is the final node.
[A doubly linked list containing three integer values: the value, the link
forward to the next node, and the link backward to the previous node]
In some very low-level languages, XOR-linking offers a way to implement doubly linked lists using a single word for both links, although the use of this technique is usually discouraged. Circularly-linked list: In a circularly linked list, the first and final nodes are linked together. This can be done for both singly and doubly linked lists. To traverse a circular linked list, you begin at any node and follow the list in either direction until you return to the original node. Viewed another way, circularly linked lists can be seen as having no beginning or end. This type of list is most useful for managing buffers for data ingest, and in cases where you have one object in a list and wish to iterate through all other objects in the list in no particular order. The pointer pointing to the whole list may be called the access pointer.
Linked list operations: Linked list data structure will have two fields. We also keep a variable firstNode which always points to the first node in the list, or is null for an empty list. Traversal: Traversal of a singly-linked list is simple, beginning at the first node and following each next link until we come to the end:
node := list.firstNode while node not null { (do something with node.data)
node := node.next }
Insertion: The following code inserts a node after an existing node in a singly linked list. The diagram shows how it works. Inserting a node before an existing one cannot be done; instead, you have to locate it while keeping track of the previous node.
Deletion:
Similarly, we have functions for removing the node after a given node, and for removing a node from the beginning of the list. The diagram demonstrates the former. To find and remove a particular node, one must again keep track of the previous element.
With doubly-linked lists there are even more pointers to update, but also less information is needed, since we can use backwards pointers to observe preceding elements in the list. This enables new operations, and eliminates special-case functions. We will add a prev field to our nodes, pointing to the previous element, and a lastNode field to our list structure which always points to the last node in the list. Both list.firstNode and list.lastNode are null for an empty list. Traversal : Iterating through a doubly linked list can be done in either direction. In fact, direction can change many times, if desired. Forwards node := list.firstNode while node null <do something with node.data> node := node.next Backwards node := list.lastNode while node null <do something with node.data> node := node.prev Insertion: These symmetric functions add a node either after or before a given node, with the diagram demonstrating after:
1. function insertAfter(List list, Node node, Node newNode) newNode.prev := node newNode.next := node.next
if node.next = null list.lastNode := newNode else node.next.prev := newNode node.next := newNode 2. function insertBefore(List list, Node node, Node newNode) newNode.prev := node.prev newNode.next := node if node.prev is null list.firstNode := newNode else node.prev.next := newNode node.prev := newNode HEADER NODES: A header linked list is a linked list which always contains a special node called the header node at the beginning of the list. It is an extra node kept at the front of a list. Such a node does not represent an item in the list. The information portion might be unused. There are two types of header list 1. Grounded header list : is a header list where the last node contain the null pointer. 2. Circular header list : is a header list where the last node points back to the header node. More often , the information portion of such a node could be used to keep global information about the entire list such as:
number of nodes (not including the header) in the list count in the header node must be adjusted after adding or deleting the item from the list pointer to the last node in the list it simplifies the representation of a queue pointer to the current node in the list eliminates the need of a external pointer during traversal
Searching : Retrieval of information from a database by any user is very trivial aspect of all the databases. To retrieve any data, first we have to locate it. This operation which finds the location of a given element in a list is called searching. If the element to be searched is found, then the search is successful otherwise it is unsuccessful . We will discuss different the searching methods. The two standard ones are : 1. Linear Search 2. Binary search Linear Search : Linear search is the most simple of all searching techniques. It is also called sequential search. To find an element with key value='key',every element of the list is checked for key value='k' sequentially one by one.If such an element with key=k is found out, then the search is stopped. But if we eventually reach the end of the list & still the required element is not found then also we terminate the search as an unsuccessful one. The linear search can be applied for both unsorted & sorted list. Linear Search for Unsorted List: In case of unsorted list, we have to search the entire list every time i.e.we have to keep on searching the list till we find the required element or we reach the end of the list.this is because as elements are not in any order,so any element can be found just anywhere. Algorithm: linear search(int x[],int n,int key) { int i,flag = 0; for(i=0;i < n ; i++) {
if(x[i]==key) { flag=1; break; } } if(flag==0) return(-1); else return(1); } Linear Search for Sorted List: The efficiency of linear search can be increased if we take a previously sorted array say in ascending order. Now in this case, the basic algorithm remains the same as we have done in case of an unsorted array but the only difference is we do not have to search the entire array everytime. Whenever we encounter an element say y greater than the key to be searched, we conclude that there is no such element which is equal to the key, in the list.
This is because all the elements in the list are in ascending order and all elements to the right of y will be greater or equal to y, ie greater than the key. So there is no point in continuing the search even if the end of the list has not been reached and the required element has not been found.
Linear search( int x[], int n, int key) { int i, flag=0; for(i=0; i < n && x[i] <= key; i++) { if(x[i]==key) { flag=1; break; } } if(flag==1) /* Unsuccessful Search*/ return(-1); else return(1); /*Successful search*/ }
Linear Search for Sorted List: Illustrative Explanation: The array to be sorted is as follows: 21 35 41 65 72 It is sorted in ascending order. Now let key = 40. At first 21 is checked as x[0]=21. It is smaller than 40. So next element is checked which is 35 that is also smaller than 40. So now 41 is checked. But 41 > 40 & all elements to the right of 41 are also greater than 40.So we terminate the search as an unsuccessful one and we may not have to search the entire list. Binary Search: The most efficient method of searching a sequential file is binary search. This method is applicable to elements of a sorted list only. In this method, to search an element we compare it with the center element of the list. If it matches, then the search is successful and it is terminated. But if it does not match, the list is divided into two halves. The first half consists of 0th element to the center element whereas the second list consists of the element next to the center element to the last element. Now It is obvious that all elements in first half will be < or = to the center element and all element elements in the second half will be > than the center element. If the element to be searched is greater than the center element then searching will be done in the second half, otherwise in the first half. Same process of comparing the element to be searched with the center element & if not found then dividing the elements into two halves is repeated for the first or second half. This process is repeated till the required element is found or the division of half parts gives a single element. Algorithm for Binary Search. Illustrative Explanation : Let the array to be sorted is as follows: 11 23 31 33 65 68 71 89 100 Now let the element to be searched ie key = 31 At first hi=8 low=0 so mid=4 and x[mid]= 65 is the center element but 65 > 31. So now hi = 4-1=3. Now mid= (0 + 3)/2 = 1, so x[mid]= 23 < 31.So again low= 1 + 1 = 2. Now mid = (3 + 2)/2 = 2 & x[mid]= 31 = key.So the search is successful. Similarly had the key been 32 it would have been an unsuccessful search.
SORTING: Sorting is a process by which a collection of items is placed intoorder * Operation of arranging data Order typically based on a data field called a key Sink Sort :
Main idea in this method is to compare two adjacent elements and put them in proper order if they are not. Do this by scanning the element from first to last. After first scan, largest element is placed at last position. This is like a largest object sinking to bottom first. After second, scan second largest is placed at second last position. After n passes all elements are placed in their correct positions, hence sorted.
Selection Sort: As we have discussed earlier, purpose of sorting is to arrange a given set of records in order by specified key. In the case of student records, key can be roll number. In the case of employees records, key can be employ identification number. That is, sorting a set of records based on specific key. Insertion and Sink sorts compare keys of two records and swap them if the keys are not in order. This is not just swapping keys, we have to swap whole records associated with those keys. The time required to swap records is proportional to its size, that is number of fields. In these methods, same record may be swapped many times, before reaching its final position in sorted order. This method, selection sort, minimized the number of swaps. Hence efficient if the size of the records are large. Look at keys of all records to find a record with smallest key and place the record in the first place. Next, find the smallest key record among remaining records and swap with the record at second position. Continue this procedure till all records are placed correctly.
Merge Sort: The time complexity of the sorting algorithms discussed till now are in O( ). That is, the number of comparisons performed by these algorithms are bound above by c , for some constant c > 1. Can we have better sorting algorithms? Yes, merge sort method sorts given a set of number in O(n log n) time. Before discussing merge sort, we need to understand what is the meaning of merging?
Merge sort Method : Definition: Given two sorted arrays, a[p] and b[q], create one sorted array of size p + q of the elements of a[p] and a[q] .
Assume that we have to keep p+q sorted numbers in an array c[p+q]. One way of doing this is by copying elements of a[p] and b[q] into c[p+q] and sort c[p+q]which has time complexity of atleast O(n log n). Another efficient method is by merging which is of time complexity of O(n) . Method is simple, take one number from each array a and b, place the smallest element in the array c. Take the next elements and repeat the procedure till all p+q element are placed in the array c.
Merge Sort: We can assume each element in a given array as a sorted sub array.Take adjacent arrays and merge to obtain a sorted array of two elements. Next step, take adjacent sorted arrays, of size two, in pair and merge them to get a sorted array of four elements. Repeat the step until whole array is sorted.
Quick Sort Another very interesting sorting algorithm is Quick sort. Its worst time complexity is O( ), but it is a faster then many optimal algorithms for many input instances. It is an another divide and conquer algorithm. Main idea of the algorithm is to partition the given elements into two sets such that the smaller numbers into one set and larger number into another. This partition is done with respective an element called pivot. Repeat the process for both the sets until each partition contains only one element.
Radix Sort:
All the algorithms discussed till now are comparison based algorithms. That is, comparison is the key for sorting. Assume that the input number are three decimal digits. First we sort the numbers using least significant digit. Then by next significant digit and so on.
TREES
A tree is a finite set of nodes having a distinct node called root. Binary Tree is a tree which is either empty or has at most two subtrees, each of the subtrees also being a binary tree. It means each node in a binary tree can have 0, 1 or 2 subtrees. A left or right subtree can be empty. A binary tree is made of nodes, where each node contains a "left" pointer, a "right" pointer, and a data element. The "root" pointer points to the topmost node in the tree. The left and right pointers point to smaller "subtrees" on either side. A null pointer represents a binary tree with no elements -- the empty tree. The formal recursive definition is: a binary tree is either empty (represented by a null pointer), or is made of a single node, where the left and right pointers (recursive definition ahead) each point to a binary tree.
Binary Tree : It has a distinct node called root ie 2. And every node has either0,1 or 2 children. So it is a binary tree as every node has a maximum of 2 children. If A is the root of a binary tree & B the root of its left or right subtree, then A is the parent or father of B and B is the left or right child of A. Those nodes having no children are leaf nodes. Any node say A is the ancestor of node B and B is the descendant of A if A is either the father of B or the father of some ancestor of B. Two nodes having same father are called brothers or siblings. Going from leaves to root is called climbing the tree & going from root to leaves is called descending the tree. A binary tree in which every non leaf node has non empty left & right subtrees is called a strictly binary tree. The tree shown below is a strictly binary tree.
The no. of children a node has is called its degree. The level of root is 0 & the level of any node is one more than its father. In the strictly binary tree shown above A is at level 0, B & C at level 1, D & E at level 2 & F & g at level 3.
The depth of a binary tree is the length of the longest path from the root to any leaf. In the above tree, depth is 3. Representation of Binary Tree : The structure of each node of a binary tree contains one data field and two pointers, each for the right & left child. Each child being a node has also the same structure. The structure of a node is shown below.
The structure defining a node of binary tree in C is as follows. Struct node { struct node *lc ; int data; struct node *rc; } Binary Tree Traversal : Traversal of a binary tree means to visit each node in the tree exactly once. The tree traversal is used in all the applications of it. In a linear list nodes are visited from first to last, but a tree being a non linear one we need definite rules. There are a no. of ways to traverse a tree. All of them differ only in the order in which they visit the nodes. The three main methods of traversing a tree are:
In all of them we do not require to do anything to traverse an empty tree. All the traversal methods are based on recursive functions since a binary tree is itself recursive as every child of a node in a binary tree is itself a binary tree. Inorder Traversal :
To traverse a non empty tree in inorder the following steps are followed recursively.
Traverse the left subtree Visit Root node Traverse the right subtree
Preorder Traversal : To traverse a non empty tree in Preorder the following steps are followed recursively.
Visit the Root Traverse the left subtree Traverse the Right subtree
Pre order: 43,15,8,30,20,35,60,50,82,70 Postorder Traversal: To traverse a non empty tree in Postorder the following steps are followed recursively
Traverse the left subtree Traverse the right subtree Visit the Root node
Post order:8,20,35,30,15,50,70,82,60,43
inorder Traversal : Algorithm The algorithm for inorder traversal is as follows. Struct node { struct node * lc; int data; struct node * rc; }; void inorder(struct node * root); { if(root != NULL) { inorder(roo-> lc); printf("%d\t",root->data); inorder(root->rc); } } Preorder Traversal : Algorithm The algorithm for Preorder traversal is as follows. Struct node { int data; struct node * lc; struct node * rc; }; void preorder(struct node * root); { if(root != NULL) { printf("%d\t",root->data); inorder(roo-> lc); inorder(root->rc); } }
Postorder Traversal : Algorithm The algorithm for Postorder traversal is as follows. Struct node { struct node * lc; struct node * rc; int data; }; void postorder(struct node * root); { if(root != NULL) { inorder(roo-> lc); inorder(root->rc); printf("%d\t",root->data); } } Array Representation of Binary Tree : A single array can be used to represent a binary tree. For these nodes are numbered / indexed according to a scheme giving 0 to root. Then all the nodes are numbered from left to right level by level from top to bottom. Empty nodes are also numbered. Then each node having an index i is put into the array as its ith element. In the figure shown below the nodes of binary tree are numbered according to the given scheme.
The figure shows how a binary tree is represented as an array. The root 3 is the 0 th element while its leftchild 5 is the 1 st element of the array. Node 6 does not have any child so its children ie 7 th & 8 th element of the array are shown as a Null value. It is found that if n is the number or index of a node, then its left child occurs at (2n + 1)th position & right child at (2n + 2) th position of the array. If any node does not have any of its child, then null value is stored at the corresponding index of the array. Operation on Binary Tree : Operations on Binary Tree are follows
Searching Searching a binary tree for a specific value is a process that can be performed recursively because of the order in which values are stored. At first examining the root. If the value is equals the root, the value exists in the tree. If it is less than the root, then it must be in the left subtree, so we recursively search the left subtree in the same manner. Similarly, if it is greater than the root, then it must be in the right subtree, so we recursively search the right subtree. If we reach a leaf and have not found the value, then the item does not lie in the tree at all. Here is the search algorithm search_btree(node, key): if node is None: return None // key not found if key < node.key: return search_btree(node.left, key) else if key > node.key: return search_btree(node.right, key) else : // key is equal to node key return node.value // found key
Insertion
The way to insert a new node in the tree, its value is first compared with the value of the root. If its value is less than the root's, it is then compared with the value of the root's left child. If its value is greater, it is compared with the root's right child. This process continues, until the new node is compared with a leaf node, and then it is added as this node's right or left child, depending on its value. Another way is examine the root and recursively insert the new node to the left subtree if the new value is less than or equal to the root, or the right subtree if the new value is greater than the root.
Deletion
There are several cases to be considered:
Deleting a leaf: If the key to be deleted has an empty left or right subtree, Deleting the key is easy, we can simply remove it from the tree. Deleting a node with one child: Delete the key and fill up this place with its child. Deleting a node with two children: Suppose the key to be deleted is called K . We replace the key K with either its in-order successor (the left-most child of the right subtree) or the in-order predecessor (the right-most child of the left subtree). we find either the in-order successor or predecessor, swap it with K, and then delete it. Since either of these nodes must have less than two children (otherwise it cannot be the in-order successor or predecessor), it can be deleted using the previous two cases.
Sort :
A binary tree can be used to implement a simple but inefficient sorting algorithm. We insert all the values we wish to sort into a new ordered data structure.
Heap Sort :
Heap sort is an efficient sorting algorithm with average and worst case time complexities are in O(n log n).
Heap sort is an in-place algorithm i.e. does not use any extra space, like merge sort. This method is based on a data structure called Heap. Heap data structure can also be used as a priority queue.
Heap :
A binary heap is a complete binary tree in which each node other than root is smaller than its parent. Heap example:
Heap Representation:
A Heap can be efficiently represented as an array The root is stored at the first place, i.e. a[1]. The children of the node i are located at 2*i and 2*i +1. In other words, the parent of a node stored in i th location is at floor . The array representation of a heap is given in the figure below.
Fig 2
Before discussing the method for building heap of an arbitrary complete binary tree, we discuss a simpler problem. Let us consider a binary tree in which left and right subtrees of the root satisfy the heap property, but not the root. See the following figure.
Now the question is how to transform the above tree into a heap? Swap the root and left child of root, to make the root satisfy the heap property. Then check the subtree rooted at left child of the root is heap or not.If it is, we are done. If not, repeat the above action of swapping the root with the maximum of its children. That is, push down the element at root till it satisfies the heap property. The following sequence of figures depicts the heapification process.
(fig 4 )
(Fig : 4.2)
Heap building can be done efficiently with bottom up fashion. Given an arbitrary complete binary tree, we can assume each leaf is a heap.
Start building the heap from the parents of these leaves. i.e., heapify subtrees rooted at the parents of leaves. Then heapify subtrees rooted at their parents. Continue this process till we reach the root of the tree.
(g)
(h)
Search Trees :
As we know that searching in a binary search tree is efficient if the height of the left sub-tree and right sub-tree is same for a node. But frequent insertion and deletion in the tree affects the efficiency and makes a binary search tree inefficient. The efficiency of searching will be ideal if the difference in height of left and right sub-tree with respect of a node is at most one. Such a binary search tree is called balanced binary tree (sometimes called AVL Tree). REPRESENTATION OF AVL TREE In order to represent a node of an AVL Tree, we need four fields :- One for data, two for storing address of left and right child and one is required to hold the balance factor. The balance factor is calculated by subtracting the right sub-tree from the height of left sub - tree. The structure of AVL Tree can be represented by : Struct AVL { struct AVL *left; int data; struct AVL *right; int balfact; };
The value of balance factor may be -1, 0 or 1. Any value other than these represent that the tree is not an AVL Tree 1. If the value of balance factor is -1, it shows that the height of right sub-tree is one more than the height of the left sub-tree with respect to the given node. 2. If the value of balance factor is 0, it shows that the height of right sub-tree is equal to the height of the left Subtree with respect to the given node. 3. If the value of balance factor is 1, it shows that the height of right sub-tree is one less than the height of the left sub-tree with respect to the given node . INVENTION AND DEFINITION It was invented in the year 1962 by two Russian mathematicians named G.M. Adelson-Velskii and E.M. Landis and so named AVL Tree. It is a binary tree in which difference of height of two sub-trees with respect to a node never differ by more than one(1).
INSERTION OF A NODE IN AVL TREE : Insertion can be done by finding an appropriate place for the node to be inserted. But this can disturb the balance of the tree if the difference of height of sub-trees with respect to a
node exceeds the value one. If the insertion is done as a child of non-leaf node then it will not affect the balance, as the height doesn't increase. But if the insertion is done as a child of leaf node, then it can bring the real disturbance in the balance of the tree. This depends on whether the node is inserted to the left subtree or the right sub-tree, which in turn changes the balance factor. If the node to be inserted is inserted as a node of a subtree of smaller height then there will be no effect. If the height of both the left and right sub-tree is same then insertionto any of them doesn't affect the balance of AVL Tree. But if it is inserted as a node of sub-tree of larger height, then the balance will be disturbed. To rebalance the tree, the nodes need to be properly adjusted. So, after insertion of a new node the tree is traversed starting from the new node to the node where the balance has been disturbed. The nodes are adjusted in such a way that the balance is regained.
REBALANCING OF AVL TREE: When we insert a node to the taller sub-tree, four cases arise and we have different rebalancing methods to bring it back to a balanced tree form. 1. Left Rotation 2. Right Rotation LEFT ROTATION :
( Before Rotation)
(After Rotation )
In the given AVL tree when we insert a node 8,it becomes the left child of node 9 and the balance doesn't exist, as the balance factor of node 3 becomes -2. So, we try to rebalance it. In order to do so, we do left rotation at node 3. Now node 5 becomes the left child of the root. Node 9 and node 3 becomes the right and left child of node 5 respectively. Node 2 and node 4 becomes the left and right child of node 3 respectively. Lastly, node 8 becomes the left child of node 9. Hence, the balance is once again attained and we get AVL Tree after the left rotation. Right rotation:
Before Rotation
After Rotation
In the given AVL tree when we insert a node 7,it becomes the right child of node 5 and the balance doesn't exist, as the balance factor of node 20 becomes 2. So, we try to rebalance it. In order to do so, we do right rotation at node 20. Now node 10 becomes the root. Node 12 and node 7 becomes the right and left child of root respectively. Node 20 becomes the right child of node 12. Node 30 becomes the right child of node 20. Lastly, node 5 becomes the left child of node 7. Hence, the balance is once again attained and we get AVL Tree after the right rotation