0% found this document useful (0 votes)
15 views17 pages

DS Unit IV

The document discusses graphs and algorithms related to graphs. It defines graphs and some important terms related to graphs like vertices, edges, directed graphs, adjacency, etc. It explains how to represent graphs using adjacency matrix and adjacency list. It describes basic search techniques like breadth-first search and depth-first search including pseudocode algorithms. It also discusses minimum spanning tree, shortest path algorithms like Dijkstra's algorithm, and hashing.

Uploaded by

sriaishwarya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

DS Unit IV

The document discusses graphs and algorithms related to graphs. It defines graphs and some important terms related to graphs like vertices, edges, directed graphs, adjacency, etc. It explains how to represent graphs using adjacency matrix and adjacency list. It describes basic search techniques like breadth-first search and depth-first search including pseudocode algorithms. It also discusses minimum spanning tree, shortest path algorithms like Dijkstra's algorithm, and hashing.

Uploaded by

sriaishwarya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA STRUCTURES

SWAMEGA PUBLICATIONS

AND
ALGORITHMS

For M.Sc., M.C.A, & M.Tech


Programmes

Dr. S. RAMAMOORHTY & Dr. S. SATHYALAKSHMI

2014
Data Strcutures & Algorithms
UNIT - IV
GRAPHS:

An undirected graph G = ( V, E ) consists of two finite sets : the vertex set


V = {v1, v2, ....... } , which contains the set of vertices in G; and the edge set E = {e 1, e2, ....}
which is a set of unordered pairs of distinct vertices in G.

Some Important Terms: Directed Graph, Adjacency, Incidence, Degree, Self-loops,


Walk, Path, Circuit, Cycle, Sub-graphs, Tree, Cliques, Knots, Components, Connectivity,
Distance, Colouring, etc are some of the important terms that play vital role in the theory of
Graphs and Networks.

Representation of a Graph: A graph can be represented by (i) Adjacency Matrix and


(ii) Adjacency List. An illustration will help you in this regard.

A given graph v1 v4

v5

v2 v3

Adjacency Matrix

1, if the edge constitute by (i, j) is in the set E


A[i, j] =
0, otherwise 1 2 3 4 5
1 0 1 0 1 0
2 1 0 0 0 1
A = 3 0 0 0 0 1
4 1 0 0 0 1
5 0 1 1 1 0
Adjacency List
V1 4 2
V2 1 5

V3 5
V4 5 1

V5 2 3 4
Basic Search Techniques: A graph can be searched using the basic techniques which
include: (i) Breadth First Search (BFS) and (ii) Depth First Search (DFS). The
corresponding algorithms are discussed below:

Breadth First Search:

BFS (GRAPH G = (V, E) , vertex S) // S is the source – the starting point //

1. for each v € (belongs to) V

2. d[v] ← ∞ // unmark all the vertices //

3. d[S] ← 0 // mark the Source //

4. Enqueue (Q, S) // Entering a node into the Queue //

5. While Empty (Q) = false do

6. v ← Dequeue (Q) // Deleting a node from the Queue //

7. for each u € adjacent [v] do

8. if d[u] = ∞ then // vertex u un unmarked //

9. d[u] = d[u] + 2 // mark vertex u //

10. Enqueue (Q, u)

E.g.,

v1 v2 v3 v1 v2 v3
0 ∞ ∞ 0 1 ∞

v4 v5 v6 v4 v5 v6
∞ ∞ ∞ 1 ∞ ∞
Q : v1 Q : v2, v4

v1 v2 v3 v1 v2 v3
0 1 2 0 1 2

v4 1
v5 2
v6 ∞ v4 1
v5 2
v6 ∞
Q : v4, v5 , v3 Q : Empty

Depth First Search:


DFS ( GRAPH G = (V, E) )

1. for each v € (belongs to) V do

2. d[v] ← 0 // every vertex is undiscovered. ie., unmarked //

3. time ← 0 // time is a global variable //

4. for each S € (belongs to) V do

5. if d[S] = 0 then // is vertex S unmarked //

6. DFS-Visit ( G, S)

DFS-Visit ( GRAPH G = (V, E ), vertex S ) // S is the source //

1. d[S] ← time ← time + 1 // vertex S is marked //

2. for each u € (belongs to) adjacent [S] do

3. if d[u] = 0 then // is verex u unmarked //

4. DFS-Visit (G, u)

5. f[S] ← time ← time + 1 / finished with vertex S //

Eg.,

V1 V2 V3

V4 V5 V6

SHORTEST SPANNING TREE / MINIMUM SPANNING TREE (SST / MST):

Spanning Tree: A tree in a graph that contains all the edges of the graph is called a
spanning tree.

Shortest Spanning Tree / Minimum Spanning Tree ( SST / MST ): In a weighted


graph, if the sum of the weights of the edges of a spanning tree is minimum then that
spanning tree is called a shortest spanning tree / minimum spanning tree. The following
algorithm is a generic version that finds out the shortest (minimum) spanning tree of an
undirected weighted graph. Following which are the algorithms for the same given by
(i) PRIM and (ii) KRUSKAL.

MST-Generic ( UNDIRECTED WEIGHTED GRAPH G = ( V, E )


1. T ← {(a, b)} // (a, b) is the highest weight edge in E //
2. VT ← {a, b}
3. while VT ≠ V do
4. find the lightest edge (u, v) that connects u € (belongs to) VT & v € (belongs to) V - VT
5. T ← T U { (u, v)} // add edge {u, v} to T //
6. VT ← VT U {v} // add vertices u and v to VT //
7. return // T is the minimum spanning tree //

PRIM’s Algorithm:

1. T ← ϕ // start with an empty tree //


2. for each v € (belongs to) V do
3. key [v] ← ∞
4. key [r] ← 0 // r € (belongs to) V is an arbitrary vertex //
5. MakePriorityQueue (P, V) // initialize P with elements of V //
6. while Empty(P)
7. u ← DeleteMin (P)
8. min_wt ← ∞ // reinitialize min_wt //
9. for each v € (belongs to) adjacent [u] do
10. if w(u, v) < key [v] then
11. DecreaseKey(P, v, w(u, v))
12. if w(u, v) < min_wt then
13. min_wt ← w(u, v)
14. vmin ← v
15. T ← T U { ( u, vmin)} // add edge to T //
16. return T // T is Minimum Spanning Tree //

Illustration for PRIM’s Algorithm:

10 14
0 3 9
0 1 ∞
3 0 9 V VII

7 7 ∞ I 4 2 6
∞ ∞
5 4 2 6 IV III VI
7 5
7 5 ∞ II

The numbers indicate the weight of the edges and the Roman Letters indicate the order of
selection of branches(curved edges) of the Spanning Tree and its total weight is 36
(Minimum) – ie, the sum of the weights of the edges that constitute this Spanning Tree.

KRUSKAL’s Algorithm:

KRUSKAL (UNDIRECTED WEIGHTED GRAPH G = (V, E) )


1. T ← ϕ // Start with an empty tree //

2. for each v € (belongs to) V do

3. Makeset (D, v) // Create a collection D of | v | disjoint sets

4. Sort the edges of E by non-decreasing weight

5. for each (u, v) € (belongs to) E in order by non-decreasing weight do

6. if Findset (D, u) ≠ Finishset (D, v) then

7. T ← T U { (u, v) } // add edge to T //

8. Union (D, u, v) // unit the sets containing u and v //

9. return T // T is the Minimum Spanning Tree //

Shortest Path Algorithm:

DIJKSTRA’S Algorithm:

DIJKSTRA (GRAPH G = (V, E), Vertex s)

1. for each v € (belongs to) V do

2. d[v] ← ∞

3. d[s] ← 0

4. MakePriorityQueue (P, V)

5. while empty(P) = false

6. u ← delmin(P)

7. for each v € adjacent to [u] do

8. if d[v] > d[u] + w(u, v)

9. d[v] ← d[v] + w(u, v)

10. Decreasekey(P, v, d[v])

11. return d

HASHING:

A hash function is any function that can be used to map digital data of arbitrary
size to digital data of fixed size, with slight differences in input data producing very big
differences in output data. The values returned by a hash function are called hash values,
hash codes, hash sums, or simply hashes. One practical use is a data structure called a hash
table, widely used in computer software for rapid data lookup. Hash functions accelerate
table or database lookup by detecting duplicated records in a large file. An example is finding
similar stretches in DNA sequences. Hash functions are related to (and often confused with)
checksums, check digits, fingerprints, randomization functions, error-correcting codes, and
ciphers. Although these concepts overlap to some extent, each has its own uses and
requirements and is designed and optimized differently.

USES:

Hash tables

Hash functions are primarily used in hash tables, to quickly locate a data record
(e.g., a dictionary definition) given its search key (the keyword). Specifically, the hash
function is used to map the search key to an index (an ordinal number); the index gives the
place in the hash table where the corresponding record should be stored. Hash tables, in turn,
are used to implement associative arrays and dynamic sets.

Typically, the domain of a hash function (the set of possible keys) is larger than
its range (the number of different table indexes), and so it will map several different keys to
the same index. Therefore, each slot of a hash table is associated with (implicitly or
explicitly) a set of records, rather than a single record. For this reason, each slot of a hash
table is often called a bucket, and hash values are also called bucket indices.

Thus, the hash function only hints at the record's location — it tells where one
should start looking for it. Still, in a half-full table, a good hash function will typically narrow
the search down to only one or two entries.

Caches

Hash functions are also used to build caches for large data sets stored in slow
media. A cache is generally simpler than a hashed search table, since any collision can be
resolved by discarding or writing back the older of the two colliding items. This is also used
in file comparison.

Finding duplicate records

When storing records in a large unsorted file, one may use a hash function to
map each record to an index into a table T, and collect in each bucket T[i] a list of the
numbers of all records with the same hash value i. Once the table is complete, any two
duplicate records will end up in the same bucket. The duplicates can then be found by
scanning every bucket T[i] which contains two or more members, fetching those records, and
comparing them. With a table of appropriate size, this method is likely to be much faster than
any alternative approach (such as sorting the file and comparing all consecutive pairs).

Protecting data

A hash value can be used to uniquely identify secret information. This


requires that the hash function is collision resistant, which means that it is very hard to find
data that generate the same hash value. These functions are categorized into cryptographic
hash functions and provably secure hash functions. Functions in the second category are the
most secure but also too slow for most practical purposes. Collision resistance is
accomplished in part by generating very large hash values. For example SHA-1, one of the
most widely used cryptographic hash functions, generates 160 bit values.

Finding similar records

Hash functions can also be used to locate table records whose key is similar,
but not identical, to a given key; or pairs of records in a large file which have similar keys.
For that purpose, one needs a hash function that maps similar keys to hash values that differ
by at most m, where m is a small integer (say, 1 or 2). If one builds a table T of all record
numbers, using such a hash function, then similar records will end up in the same bucket, or
in nearby buckets. Then one need only check the records in each bucket T[i] against those in
buckets T[i+k] where k ranges between −m and m.

Finding similar substrings

The same techniques can be used to find equal or similar stretches in a large
collection of strings, such as a document repository or a genomic database. In this case, the
input strings are broken into many small pieces, and a hash function is used to detect
potentially equal pieces, as above. The Rabin–Karp algorithm is a relatively fast string
searching algorithm that works in O(n) time on average. It is based on the use of hashing to
compare strings.

Geometric hashing

This principle is widely used in computer graphics, computational geometry


and many other disciplines, to solve many proximity problems in the plane or in three-
dimensional space, such as finding closest pairs in a set of points, similar shapes in a list of
shapes, similar images in an image database, and so on. In these applications, the set of all
inputs is some sort of metric space, and the hashing function can be interpreted as a partition
of that space into a grid of cells. The table is often an array with two or more indices (called a
grid file, grid index, bucket grid, and similar names), and the hash function returns an index
tuple. This special case of hashing is known as geometric hashing or the grid method.
Geometric hashing is also used in telecommunications (usually under the name vector
quantization) to encode and compress multi-dimensional signals.

Hash function algorithms:

For most types of hashing functions the choice of the function depends
strongly on the nature of the input data, and their probability distribution in the intended
application.

Trivial hash function

If the datum to be hashed is small enough, one can use the datum itself
(reinterpreted as an integer) as the hashed value. The cost of computing this "trivial"
(identity) hash function is effectively zero. This hash function is perfect, as it maps each input
to a distinct hash value. The meaning of "small enough" depends on the size of the type that
is used as the hashed value. For example, in Java, the hash code is a 32-bit integer. Thus the
32-bit integer Integer and 32-bit floating-point Float objects can simply use the value
directly; whereas the 64-bit integer Long and 64-bit floating-point Double cannot use this
method.

Other types of data can also use this perfect hashing scheme. For example, when
mapping character strings between upper and lower case, one can use the binary encoding of
each character, interpreted as an integer, to index a table that gives the alternative form of that
character ("A" for "a", "8" for "8", etc.). If each character is stored in 8 bits (as in ASCII or
ISO Latin 1), the table has only 28 = 256 entries; in the case of Unicode characters, the table
would have 17×216 = 1114112 entries. The same technique can be used to map two-letter
country codes like "us" or "za" to country names (26 2=676 table entries), 5-digit zip codes
like 13083 to city names (100000 entries), etc. Invalid data values (such as the country code
"xx" or the zip code 00000) may be left undefined in the table, or mapped to some
appropriate "null" value.

Perfect hashing

A perfect hash function for the four names shown

A hash function that is injective—that is, maps each valid input to a different
hash value—is said to be perfect. With such a function one can directly locate the desired
entry in a hash table, without any additional searching.

Minimal perfect hashing

A minimal perfect hash function for the four names shown


A perfect hash function for n keys is said to be minimal if its range consists of n
consecutive integers, usually from 0 to n−1. Besides providing single-step lookup, a minimal
perfect hash function also yields a compact hash table, without any vacant slots. Minimal
perfect hash functions are much harder to find than perfect ones with a wider range.

Hashing uniformly distributed data

If the inputs are bounded-length strings and each input may independently occur
with uniform probability (such as telephone numbers, car license plates, invoice numbers,
etc.), then a hash function needs to map roughly the same number of inputs to each hash
value. For instance, suppose that each input is an integer z in the range 0 to N−1, and the
output must be an integer h in the range 0 to n−1, where N is much larger than n. Then the
hash function could be h = z mod n (the remainder of z divided by n), or h = (z × n) ÷ N (the
value z scaled down by n/N and truncated to an integer), or many other formulas.

h = z mod n was used in many of the original random number generators, but was found to
have a number of issues. One of which is that as n approaches N, this function becomes less
and less uniform.

Hashing data with other distributions

These simple formulas will not do if the input values are not equally likely, or
are not independent. For instance, most patrons of a supermarket will live in the same
geographic area, so their telephone numbers are likely to begin with the same 3 to 4 digits. In
that case, if m is 10000 or so, the division formula (z × m) ÷ M, which depends mainly on the
leading digits, will generate a lot of collisions; whereas the remainder formula z mod m,
which is quite sensitive to the trailing digits, may still yield a fairly even distribution.

Hashing variable-length data

When the data values are long (or variable-length) character strings—such as
personal names, web page addresses, or mail messages—their distribution is usually very
uneven, with complicated dependencies. For example, text in any natural language has highly
non-uniform distributions of characters, and character pairs, very characteristic of the
language. For such data, it is prudent to use a hash function that depends on all characters of
the string—and depends on each character in a different way.

In cryptographic hash functions, a Merkle–Damgård construction is usually


used. In general, the scheme for hashing such data is to break the input into a sequence of
small units (bits, bytes, words, etc.) and combine all the units b[1], b[2], …, b[m]
sequentially, as follows

S ← S0; // Initialize the state.


for k in 1, 2, ..., m do // Scan the input data units:
S ← F(S, b[k]); // Combine data unit k into the state.
return G(S, n) // Extract the hash value from the state.

This schema is also used in many text checksum and fingerprint algorithms. The
state variable S may be a 32- or 64-bit unsigned integer; in that case, S0 can be 0, and G(S, n)
can be just S mod n. The best choice of F is a complex issue and depends on the nature of the
data. If the units b[k] are single bits, then F(S, b) could be, for instance

if highbit(S) = 0 then
return 2 * S + b
else
return (2 * S + b) ^ P

Here highbit(S) denotes the most significant bit of S; the '*' operator denotes unsigned integer
multiplication with lost overflow; '^' is the bitwise exclusive or (XOR) operation applied to
words; and P is a suitable fixed word.

Special-purpose hash functions

In many cases, one can design a special-purpose (heuristic) hash function that
yields many fewer collisions than a good general-purpose hash function. For example,
suppose that the input data are file names such as FILE0000.CHK, FILE0001.CHK,
FILE0002.CHK, etc., with mostly sequential numbers. For such data, a function that extracts
the numeric part k of the file name and returns k mod n would be nearly optimal. Needless to
say, a function that is exceptionally good for a specific kind of data may have dismal
performance on data with different distribution.

Rolling hash

In some applications, such as substring search, one must compute a hash


function h for every k-character substring of a given n-character string t; where k is a fixed
integer, and n is k. The straightforward solution, which is to extract every such substring s of t
and compute h(s) separately, requires a number of operations proportional to k·n. However,
with the proper choice of h, one can use the technique of rolling hash to compute all those
hashes with an effort proportional to k + n.

Universal hashing

A universal hashing scheme is a randomized algorithm that selects a hashing


function h among a family of such functions, in such a way that the probability of a collision
of any two distinct keys is 1/n, where n is the number of distinct hash values desired—
independently of the two keys. Universal hashing ensures (in a probabilistic sense) that the
hash function application will behave as well as if it were using a random function, for any
distribution of the input data. It will however have more collisions than perfect hashing, and
may require more operations than a special-purpose hash function. See also Unique
Permutation Hashing

Choosing a good hash function


A good hash function and implementation algorithm are essential for good hash
table performance, but may be difficult to achieve.

A basic requirement is that the function should provide a uniform distribution of


hash values. A non-uniform distribution increases the number of collisions and the cost of
resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated
empirically using statistical tests, e.g. a Pearson's chi-squared test for discrete uniform
distributions.

The distribution needs to be uniform only for table sizes that occur in the
application. In particular, if one uses dynamic resizing with exact doubling and halving of the
table size s, then the hash function needs to be uniform only when s is a power of two. On the
other hand, some hashing algorithms provide uniform hashes only when s is a prime number.

For open addressing schemes, the hash function should also avoid clustering,
the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup
cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular
multiplicative hash is claimed to have particularly poor clustering behavior.

Cryptographic hash functions are believed to provide good hash functions for
any table size s, either by modulo reduction or by bit masking. They may also be appropriate
if there is a risk of malicious users trying to sabotage a network service by submitting
requests designed to generate a large number of collisions in the server's hash tables.
However, the risk of sabotage can also be avoided by cheaper methods (such as applying a
secret salt to the data, or using a universal hash function).

Collision resolution

Hash collisions are practically unavoidable when hashing a random subset of a


large set of possible keys. For example, if 2,450 keys are hashed into a million buckets, even
with a perfectly uniform random distribution, according to the birthday problem there is
approximately a 95% chance of at least two of the keys being hashed to the same slot.

Therefore, most hash table implementations have some collision resolution


strategy to handle such events. Some common strategies are described below. All these
methods require that the keys (or pointers to them) be stored in the table, together with the
associated values.

Separate chaining
Hash collision resolved by separate chaining.

In the method known as separate chaining, each bucket is independent, and has
some sort of list of entries with the same index. The time for hash table operations is the time
to find the bucket (which is constant) plus the time for the list operation. (The technique is
also called open hashing or closed addressing.)

In a good hash table, each bucket has zero or one entries, and sometimes two or
three, but rarely more than that. Therefore, structures that are efficient in time and space for
these cases are preferred. Structures that are efficient for a fairly large number of entries per
bucket are not needed or desirable. If these cases happen often, the hashing is not working
well, and this needs to be fixed.

Separate chaining with linked lists

Chained hash tables with linked lists are popular because they require only basic
data structures with simple algorithms, and can use simple hash functions that are unsuitable
for other methods. Chained hash tables remain effective even when the number of table
entries n is much higher than the number of slots. Their performance degrades more
gracefully (linearly) with the load factor. For example, a chained hash table with 1000 slots
and 10,000 stored keys (load factor 10) is five to ten times slower than a 10,000-slot table
(load factor 1); but still 1000 times faster than a plain sequential list, and possibly even faster
than a balanced search tree.

For separate-chaining, the worst-case scenario is when all entries are inserted into
the same bucket, in which case the hash table is ineffective and the cost is that of searching
the bucket data structure. If the latter is a linear list, the lookup procedure may have to scan
all its entries, so the worst-case cost is proportional to the number n of entries in the table.
Separate chaining with list head cells

Hash collision by separate chaining with head records in the bucket array.

Some chaining implementations store the first record of each chain in the slot array itself.
The number of pointer traversals is decreased by one for most cases. The purpose is to
increase cache efficiency of hash table access.

The disadvantage is that an empty bucket takes the same space as a bucket with
one entry. To save space, such hash tables often have about as many slots as stored entries,
meaning that many slots have two or more entries.

Separate chaining with other structures

Instead of a list, one can use any other data structure that supports the required
operations. For example, by using a self-balancing tree, the theoretical worst-case time of
common hash table operations (insertion, deletion, lookup) can be brought down to O(log n)
rather than O(n). However, this approach is only worth the trouble and extra memory cost if
long delays must be avoided at all costs (e.g. in a real-time application), or if one must guard
against many entries hashed to the same slot (e.g. if one expects extremely non-uniform
distributions, or in the case of web sites or other publicly accessible services, which are
vulnerable to malicious key distributions in requests).

Robin Hood hashing

One interesting variation on double-hashing collision resolution is Robin Hood


hashing. The idea is that a new key may displace a key already inserted, if its probe count is
larger than that of the key at the current position. The net effect of this is that it reduces worst
case search times in the table. This is similar to ordered hash tables except that the criterion
for bumping a key does not depend on a direct relationship between the keys. Since both the
worst case and the variation in the number of probes is reduced dramatically, an interesting
variation is to probe the table starting at the expected successful probe value and then expand
from that position in both directions. External Robin Hashing is an extension of this
algorithm where the table is stored in an external file and each table position corresponds to a
fixed-sized page or bucket with B records.

2-choice hashing

2-choice hashing employs 2 different hash functions, h1(x) and h2(x), for the
hash table. Both hash functions are used to compute two table locations. When an object is
inserted in the table, then it is placed in the table location that contains fewer objects (with the
default being the h1(x) table location if there is equality in bucket size). 2-choice hashing
employs the principle of the power of two choices.

Advantages

The main advantage of hash tables over other table data structures is speed. This
advantage is more apparent when the number of entries is large. Hash tables are particularly
efficient when the maximum number of entries can be predicted in advance, so that the
bucket array can be allocated once with the optimum size and never resized.

If the set of key-value pairs is fixed and known ahead of time (so insertions and
deletions are not allowed), one may reduce the average lookup cost by a careful choice of the
hash function, bucket table size, and internal data structures. In particular, one may be able to
devise a hash function that is collision-free, or even perfect. In this case the keys need not be
stored in the table.

Drawbacks

Although operations on a hash table take constant time on average, the cost of a
good hash function can be significantly higher than the inner loop of the lookup algorithm for
a sequential list or search tree. Thus hash tables are not effective when the number of entries
is very small. (However, in some cases the high cost of computing the hash function can be
mitigated by saving the hash value together with the key.)

For certain string processing applications, such as spell-checking, hash tables may
be less efficient than tries, finite automata, or Judy arrays. Also, if each key is represented by
a small enough number of bits, then, instead of a hash table, one may use the key directly as
the index into an array of values. Note that there are no collisions in this case.

The entries stored in a hash table can be enumerated efficiently (at constant cost per
entry), but only in some pseudo-random order. Therefore, there is no efficient way to locate
an entry whose key is nearest to a given key. Listing all n entries in some specific order
generally requires a separate sorting step, whose cost is proportional to log(n) per entry. In
comparison, ordered search trees have lookup and insertion cost proportional to log(n), but
allow finding the nearest key at about the same cost, and ordered enumeration of all entries at
constant cost per entry.

If the keys are not stored (because the hash function is collision-free), there may
be no easy way to enumerate the keys that are present in the table at any given moment.

Although the average cost per operation is constant and fairly small, the cost of a
single operation may be quite high. In particular, if the hash table uses dynamic resizing, an
insertion or deletion operation may occasionally take time proportional to the number of
entries. This may be a serious drawback in real-time or interactive applications.

Hash tables in general exhibit poor locality of reference—that is, the data to be
accessed is distributed seemingly at random in memory. Because hash tables cause access
patterns that jump around, this can trigger microprocessor cache misses that cause long
delays. Compact data structures such as arrays searched with linear search may be faster, if
the table is relatively small and keys are compact. The optimal performance point varies from
system to system.

Hash tables become quite inefficient when there are many collisions. While
extremely uneven hash distributions are extremely unlikely to arise by chance, a malicious
adversary with knowledge of the hash function may be able to supply information to a hash
that creates worst-case behavior by causing excessive collisions, resulting in very poor
performance, e.g. a denial of service attack. In critical applications, universal hashing can be
used; a data structure with better worst-case guarantees may be preferable.

Questions:

2 Marks:

1) Define the term Graph. Give an example.

2) What are the different ways by which a graph can be represented?

3) What is a path in a graph. Give an example.

4) Define “Spanning Tree”. Give an example.

5) What is the weight of a graph?

6) What is hashing?

7) List the different Hashing Techniques.

8) What is a bucket index?

9) What is static hashing?

10) What is dynamic hashing?

5 Marks:

1) How will you choose a good hash function?

2) Illustrate Breadth First Search.


3) Illustrate Depth First Search.

4) Discuss about static hashing.

5) Give an account on dynamic hashing.

8 Marks:

1) Highlight the advantages and drawbacks of Hashing.

2) Illustrate the Prim’s algorithm.

3) Illustrate the Kruskal’s algorithm.

4) Illustrate Dijkstra’s algorithm.

5) Explain separate chaining with linked lists with reference to hashing.

16 Marks:

1) Explain the processing of hashing on different methods.

2) Explain the different types of hashing.

3) Explain the application of Graphs.

4) Discuss the various terms associated with a Graph.

You might also like