Advanced Data Research Paper
Advanced Data Research Paper
Saurabh Pandey
Department of Computer Application
Azad Institute of Engineering & Technology
Lucknow, India
e-mail: [email protected]
Abstract— This paper investigates how to maintain an efficient and communication. We hope to give a cleaner and more
dynamic ordered set of bit strings, which is an important modern view than you might have seen before.
problem in the field of information search and information • Data structures for range queries (and other geometric
processing. Generally, a dynamic ordered set is required to problems) and problems on trees. There are surprising
support 5 essential operations including search, insertion, equivalences between such problems, as well as interesting
deletion, max-value retrieval and next-larger-value retrieval. solutions.
Based on previous research fruits, we present an advanced
data structure named rich binary tree (RBT), which follows
both the binary-search-tree property and the digital-search-
tree property. Also, every key K keeps the most significant
difference bit (MSDB) between itself and the next larger value
among K’s ancestors, as well as that between itself and the next
smaller one among its ancestors. With the new data structure,
we can maintain a dynamic ordered set in O(L) time. Since
computers represent objects in binary mode, our method has a
big potential in application. In fact, RBT can be viewed as a
general-purpose data structure for problems concerning order,
such as search, sorting and maintaining a priority queue. For
example, when RBT is applied in sorting, we get a linear-time
algorithm with regard to the key number and its performance
is far better than quick-sort. What is more powerful than
Figure 1: Structure in lab view
quick-sort is that RBT supports constant-time dynamic Data structures optimized for external memory, and
insertion/deletion. cache-oblivious data structures. Any problem (e.g., sorting,
priority queues) is different when you're dealing with disk
Keywords- - information processing - dynamic ordered set - instead of main memory, or you care about cache
algorithms and data structures - rich binary tree Introduction. performance. Memory hierarchies have become important in
practice because of the recent escalation in data size.
I. INTRODUCTION
I. COMPLEXITY OF ALGORTHIM
Data structures play a central role in modern computer
science. You interact with data structures much more often An algorthim is a sequence of steps that gives method of
than with algorithms (think of Google, your mail server, and solving a problem.It creates the logic program.Efficency of
even your network routers). In addition, data structures are alogrithm depends upon on two major criteria,first one is run
essential building blocks in obtaining efficient algorithms. time of that algorthim and second is the space.Run time of
This course will cover major results and current directions of algorthim means the time taken by program for
research in data structures: exeucation.complexity tends to be used to characterize
• Classic comparison-based data structures. The area something with many parts in intricate arrangement. The
is still rich with open problems, such as whether there is a study of these complex linkages is the main goal of network
single best (dynamically optimal) binary search tree. theory and network science. In science[1] there are at this
• Dynamic graph problems. In almost any network, a time a number of approaches to characterizing complexity,
link's availability and speed are anything but a constant, many of which are reflected in this article. In a business
which has led to a re-evaluation of the common context, complexity management is the methodology to
understanding of graph problems: how to maintain essential minimize value-destroying complexity and efficiently
information such as a minimum-weight spanning forest control value-adding complexity in a cross-functional
while the graph changes.
• Integer data structures: beating the O(lg n) barrier in
sorting and searching. If you haven't seen this before, beating
O(lg n) may come as a surprise. If you have seen this before,
you might think that it's about a bunch of grudgy bit-tricks.
In fact, it is about fundamental issues regarding information
approach. much time might be needed in the worst case to guarantee
that the algorithm will always finish on time.
Average performance and worst-case performance are
the most used in algorithm analysis. Less widely found is
best-case performance, but it does have uses: for example,
where the best cases of individual tasks are known, they can
be used to improve the accuracy of an overall worst-case
analysis.
The term best-case performance is used in computer
science to describe the way an algorithm behaves under
optimal conditions. For example, the best case for a simple
linear search on a list occurs when the desired element is the
first element of the list.
Figure 2: Web Mining Process Worst-case performance analysis and average case
II. CHOOSING AN ALOGORITHM performance analysis have some similarities, but in practice
usually require different tools and approaches.
• For every problem there is a multitude of algorithms On the other hand some algorithms like hash tables have
that solve the problem. So youhave a choice of algorithms to very poor worst case behaviours, but a well written hash
code up as programs. table of sufficient size will statistically never give the worst
case; the average number of operations performed follows an
• If a program is likely to be used only once on a small exponential decay curve, and so the run time of an operation
amount of data, then you should select the algorithm that is is statistically bounded.
easiest to implement. Code it up correctly, run it and move Linear search on a list of n elements. In the worst case,
on to something else. the search must visit every element once. This happens when
the value being searched for is either the last element in the
• But if the program will be used many times and has a list, or is not in the list. However, on average, assuming the
lifetime that makes maintenance likely, then other factors value searched for is in the list and each list element is
come into play including readability, extensibility, equally likely to be the value searched for, the search visits
portability, reusability, ease of use and efficiency. It is only n/2 elements.
efficiency that we be looking at in this part of the module. Quicksort applied to a list of n elements, again assumed
to be all different and initially in random order. This popular
sorting algorithm has an average-case performance of O(n
III. BEST ,WORST AND AVERAGE CASE
log n), which contributes to making it a very fast algorithm
In computer science, best, worst and average cases of a in practice. But given a worst-case input, its performance
given algorithm express what the resource usage is at least, degrades to O(n2).
at most and on average, respectively. Usually the resource
being considered is running time, but it could also be IV. SORTING ALGORITHM
memory or other resources. In computer science and mathematics, a sorting
algorithm is an algorithm that puts elements of a list in a
certain order. The most-used orders are numerical order and
lexicographical order. Efficient sorting is important to
optimizing the use of other algorithms (such as search and
merge algorithms) that require sorted lists to work correctly;
it is also often useful for canonical data and for producing
human-readable output.
The output is in non decreasing order (each element is no
smaller than the previous element according to the desired
total order).
since the dawn of computing, the sorting problem has
attracted a great deal of research, perhaps due to the
complexity of solving it efficiently despite its simple,
familiar statement. For example, bubble sort was analyzed as
early as 1956.Although many consider it a solved problem,
useful new sorting algorithms are still being invented (for
example, library sort was first published in 2004). Sorting
algorithms are prevalent in introductory computer science
classes, where the abundance of algorithms for the problem
provides a gentle introduction to a variety of core algorithm
concepts, such as big O notation, divide-and-conquer
In real-time computing, the worst-case execution time is algorithms, data structures, randomized algorithms, best,
often of particular concern since it is important to know how
worst and average case analysis, time-space trade offs, and A red-black tree is a binary search tree where each node
lower bounds. has a color attribute, the value of which is either red or black.
In addition to the ordinary requirements imposed on binary
search trees, the following requirements apply to red-black
trees:
A node is either red or black.
The root is black. (This rule is sometimes omitted from
other definitions. Since the root can always be changed from
red to black but not necessarily vice-versa this rule has little
effect on analysis.)
All leaves are black.
Both children of every red node are black.
Every simple path from a given node to any of its
descendant leaves contains the same number of black nodes.
Time complexity
in big O notation
Average Worst case
Space O(n) O(n)
Search O(log n) O(log n)
Insert O(log n) O(log n)
Delete O(log n) O(log n)
Figure 3: Bubble Sort VI. EULER TOUR TREE