0% found this document useful (0 votes)
13 views

Database System Ch-6

The document discusses data storage and querying. It describes primary and secondary storage, with secondary storage including magnetic disks. Disks are organized into platters, tracks, cylinders, and sectors. Records are stored using fixed-length representations with fields of specific data types occupying set numbers of bytes. Files can be organized as unspanned, with each record in one block, or as unordered heap files and ordered sorted files.

Uploaded by

Frtuna Haile
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Database System Ch-6

The document discusses data storage and querying. It describes primary and secondary storage, with secondary storage including magnetic disks. Disks are organized into platters, tracks, cylinders, and sectors. Records are stored using fixed-length representations with fields of specific data types occupying set numbers of bytes. Files can be organized as unspanned, with each record in one block, or as unordered heap files and ordered sorted files.

Uploaded by

Frtuna Haile
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Chapter six

Data Storage and Querying

School of ECE

10/16/2023 School of ECE


Outline
Storage and File Structure
Secondary Storage Device (Disk) Structure
File organization
Indexing and hashing
Query Processing and Optimization

10/16/2023 Data storage and querying 2


Storage And File Structure
A database system is designed to hold large size of data that need to
be physically (permanently) on the storage medium.
The storage medium can be categorized as:
Primary Storage: Storage media that have direct access to the
CPU: the main memory and the cache
Cache is the lowest level in the memory hierarchy that is built
inside the microprocessor chip. Typically the response time is in
nanoseconds.
Main memory is the next level in the hierarchy that provides the
main working environment for the CPU to keep the programs
and data

10/16/2023 Data storage and querying 3


Cont..

Tertiary memory

Incre
ase
Secondary memory spee
d
&
decr
ease
size
Main memory

Primary

Cache Memory Hierarchy

10/16/2023 Data storage and querying 4


Cont..
Secondary storage: Storage media for permanent storage such as:
Magnetic disk and
Optical disk.
Larger in capacity but significantly slower than the primary storages.
Typical response time is in milliseconds.
The secondary storage is used as a virtual memory, disk storage, and
file system.

10/16/2023 Data storage and querying 5


Cont..

Secondary Storage Device (Disk) Structure The disk drive consists of


two movement structure: -
Disk Assembly
Head Assembly
The disk consists of circular platters that are rotating around the
spindle by the disk assembly.
Each platter has surface covered with a thin layer of magnetic
material.

10/16/2023 Data storage and querying 6


Cont..
The platters may be double-sided (dual surface) both upper and lower
or single-sided .
The surfaces are organized into tracks that are concentric circles of
distinct diameter in each platter.
The corresponding tracks in the disc pack (platter) form cylinders
The trackers are further divided into sectors which are segment of the
circle separated by gaps

10/16/2023 Data storage and querying 7


Disk structure

10/16/2023 Data storage and querying 8


Cont..
The head assembly of the disk is placing the disk heads for each
surface closer to the track and the disk assembly rotates the disk to
locate the first sector to be read or written.
The movement of the disk assembly and the head assembly for data
read/write is managed by a processor known as disk controller
Typical size of blocks is in a range form 512 to 4096 bytes

10/16/2023 Data storage and querying 9


Cont..
Example: A disk is having 8 double-sided platters. Each surface is

divided into 214=16384 tracks with 128 sectors. There is 4096 bytes
space per sector. Determine the size of the disk.
 Bytes per sector = 4,096bytes
 Bytes per track = 128*4096 = 524,288bytes
 Bytes per surface = 16384*524288 = 858,9934,592bytes
 Bytes in disk = 16*8589934592 = 137,438,953,472bytes
 Disk Size = 128GB

10/16/2023 Data storage and querying 10


Cont..

Exercise: An HDD (hard disk drive) is labeled with parameters given


below. Determine the permissible sector size.
 30GB
 16383 Cylinders
 16 Heads
 224 Sectors per Track.

10/16/2023 Data storage and querying 11


Cont..

The major components of the latency of the disk are:

 Seek Time : time taken for read/write head to locate the proper

track (cylinder). Typical range for seek time is 7 to 10 millisecond.

 Rotational Latency (Delay) : time taken to locate the sector

containing the first desired block. Typical rotational latency is 1

cycle per 10 milliseconds.

 Transfer Time : time to transfer data to the memory.

10/16/2023 Data storage and querying 12


Data representation

Data is stored in a form of record that consists of a collection related

data times. The data items or values forms sequence of bytes that

corresponds to particular fields. Data type representation:

 INTEGER – 4 Bytes

 FLOAT – 4 or 8 Bytes

 DATETIME – 8 Bytes

10/16/2023 Data storage and querying 13


Data representation

 CHAR(n) – n Bytes; pad character (┴) is used to fill in unused

characters’ bytes.

 VARCHAR(n) – maximum of n+1 Bytes; unused characters’ bytes

are ignored.

 Enumerated types – represent integer codes with the request bytes.

10/16/2023 Data storage and querying 14


Fixed Length Record Representation

Example: Consider the Employees table:

 Employees(EmpId, Name, BDate, Address, Salary)

 EmpId – INTGER – 4 Bytes

 Name – CHAR(30) – 30 Bytes

 BDate – DATETIME – 8 Bytes –

 Address – VARCHAR(50) – 51 Bytes

 Salary – FLOAT – 4 Bytes


10/16/2023 Data storage and querying 15
Cont..

 Thus the record is represented as:

 The record takes 97 Bytes. The number of bytes at which a field


begins is said to be the offset of the field.
 Thus offset of EmpId is 0, Name is 4, BDate is 34, …
 In some machines the offset is required to be a multiple of 4
numbers

The record takes 100 Bytes.

10/16/2023 Data storage and querying 16


File organization

Refers to the method of arranging a data of file into records on


external storage.
It refers to the logical relationship among various records.
Means of identification and access to any required data.

Search In M.M Search In


M.M Disk
CPU Take record DISK
to CPU

10/16/2023 Data storage and querying 17


File organization

Spanned Un Spanned
 A single record can be  A single record can be
placed in multiple block placed in one block
3 byte
Disk

10 Byte in a Block

Disadvantage of spanned: accessing two block is costly


10/16/2023 Data storage and querying 18
Un Spanned Most database will use un
 A single record can be spanned organization
placed in one block Disadvantage: wastage of space
(it is not a problem)

3 byte
Disk

10 Byte in a Block

10/16/2023 Data storage and querying 19


Cont..

There are also organized record in a file:


 Un order (Heap (Random Order)) files also known as Pile Files are
suitable when typical access is a file scan retrieving all records.
Insertion is very efficient: the last disk block of the file is
copied into memory; the new record is added and rewritten back
to the disk.
Searching is expensive: the only search possible is linear
(exhaustive) search of block by block.

R1 R3 ………… R6 R1 R3 ………… R6 R2
2
R

10/16/2023 Data storage and querying 20


Cont..

Deletion requires periodic reorganization: the record to

be deleted is located and the block is fetched to memory

the record is then deleted and the block is rewritten to the

disk.

10/16/2023 Data storage and querying 21


Sorted files(order)

One of its attributes are order. And Also known as Sequential Files .
The records are physically ordered based on the value of the desired
field.
Insertion is expensive: the proper location for the incoming record
needs to be located and space has to be created (may require data
movement) then can only the record be added.

R1 R3 ………… R6 R1 R2 R3 ………… R6
2
R

10/16/2023 Data storage and querying 22


Cont..

 Searching is efficient: binary search is applicable in the ordering

key. But searching with the other criteria is similar to the heap file

organization.

 Deletion is expensive: similar to the insertion operation deletion

may also involve large data movement.

 Update: may require data reorganization if the updated field is the

ordering key.

10/16/2023 Data storage and querying 23


Index

Is a way of optimizing performance of database by minimizing the


number of disk access required.
Is a data structure which used to quickly locate and access the data in
the database table
Search key Data pointer
 Structure of index:
Search key: Contain copy of primary key or candidate key of a
table
Data pointer: contains a set of pointers which holds the address
of disk where a particular key value can be found.

10/16/2023 Data storage and querying 24


Index

Block2
Block1
Index Block11
SK BP 100
101
2 B1
……….. 102
21 B2
.
23 B3
.
. .
.
Block12 .
111
.
. 0 112
12 113
i nd 110
F .
.
92 B10 .
101 B11 .

111 B12
121 B13
120

10/16/2023 Data storage and querying 25


Classification of index

Indexes are auxiliary access structures that are used to speed up the
retrieval of records in response of a certain search condition.
There are two different kinds of indexes:
 Ordered Indexes : Sorted order of the values in a key field.
 Hash Indexes : Uniform distribution of values across a range of
buckets based on a hash function.
 Bucket is nothing one block in a disk

10/16/2023 Data storage and querying 26


Cont..

Ordered Indexes A file with a record structure having several fields


(or attributes) is often accessed through an index structure defined on
a single field of the file called search key or indexing field .
A single file may have several index structures on various search
keys.
If the file is physically organized sequentially in the search key then
the index is said to be Primary Index or Clustering Index
however if the search key specifies an order different from the
sequential order of the file are called Secondary Index or Non-
clustering Index .
10/16/2023 Data storage and querying 27
Primary Index

An index record (or index entry ) is a separate file from the data file
that consists of the search key values and pointers to one or more
records.
The search key should be primary index.
Index entry is created for first record of each block, but we are not
created all records

10/16/2023 Data storage and querying 28


Cont..

There are two types of ordered indexes namely dense index and
sparse index
Dense Index: has an index record for every search key in the data file.
The number of entries in a dense index is equal to the number of
records in the data file.

A) Dense index

10/16/2023 Data storage and querying 29


Cont..

Sparse (Non-dense) Index: has index entry for only the first records in
a block known as anchor record of the block.
The numbers of entries in the index file is equal the number of blocks
for the data file.
NOTE: A single data file can have only one primary or clustering
index

10/16/2023 Data storage and querying 30


Cont..

Example:-

B) Sparse index for primary


index

10/16/2023 Data storage and querying 31


Secondary Index
It provide a secondary way of accessing the data file. Since the data
file is not organized in the search key of the secondary index a block
anchor can not be used for having a sparse index in the secondary
index.

Secondary index

10/16/2023 Data storage and querying 32


Multilevel index

Multilevel Indexes single index can not handle large number of block
which leads to multiple disk access.
Outer Index Inner Index Data Block

Iwo-level Index on a Dense Primary Index


10/16/2023 Data storage and querying 33
N-way tree

It has N way children


It has N-1 keys
20 50
E.g: N=3

70 80
10 15
30 40

It has 2 keys


It has 3 children

10/16/2023 Data storage and querying 34


Cont..

5-way K1 K2 K3 K4

CP CP CP CP CP

RP RP RP RP

There are no guideline to control insert. You can insert as


you want as you like
It consumes a time

10/16/2023 Data storage and querying 35


B-tree index file

Is a N-way search tree with guidelines


 Each non-leaf node in the tree has n/2 children where n is fixed for
a particular tree.
 Root can have minimum two children
 All are at same level
 The creation process is bottom up
 Split is done

10/16/2023 Data storage and querying 36


B-tree index file

Eg: N=4 and keys are 10,20,40,50,60,70,80,30,35,5 and 15


No space to add 50 then split and create
10 20 40 one more node

40

15 30
70

5 10 20 35
50 60 80

Has block pointer and record pointer

10/16/2023 Data storage and querying 37


B-tree index file

10/16/2023 Data storage and querying 38


B+-tree index file

The B+-tree index structure is a form a balanced tree in which every


path from the root of the tree to a leaf of the tree is equal length.
Each non-leaf node in the tree has n/2 children where n is fixed for a
particular tree.
Will not have record pointer from every node
Will have record pointer only leaf node

10/16/2023 Data storage and querying 39


B+-tree index file

Rp Rp Rp Rp Rp

10/16/2023 Data storage and querying 40


B+-tree index file

Example:

B+-Tree Index Structure with n=3


10/16/2023 Data storage and querying 41
Hash Index

It avoids the need for accessing index structure that may


require more disk access (I/O operation).
Using hashing file organization the block of a record is
determined by computing a hash function on the search key.
A storage that can store one or more records having similar
hash function result is referred to as bucket .

10/16/2023 Data storage and querying 42


Cont..

The hash function takes the search keys and uniformly randomizes the
records in the buckets.
 Uniform distribution : the hash function assigns each bucket the
same number of search key values from the set of all possible
search key values.
 Reading assignment:
 A sequential search on the dense primary index search key.
 A sequential search on the sparse primary index search key.
 A binary search on a dense primary index search key
 A binary search on a sparse primary index search key.
 hash index

10/16/2023 Data storage and querying 43


Query processing and optimization

Query Processing refers to the range of activities involved in


extracting data from a database.
The basic steps in query processing are:
Parsing and Translation
 Optimization
 Evaluation

10/16/2023 Data storage and querying 44


Cont..

Parser and Translator: The parser part of the Parser and Translator
phase of the query processing is the one that is responsible for
identifying the language tokens such as
 SQL keywords,
 attribute names,
 Relation names in the text query and checks for the query syntax.
♫ The translator then translates the query blocks from the query data
structure into relational algebra expressions.

10/16/2023 Data storage and querying 45


Cont..

Optimizer: Optimizes the relational algebra expression using various


algorithms for the query blocks and produces an evaluation plan for
execution.
The optimizer evaluates the cost of operations to select the optimized
evaluation plan.
Evaluation Engine: Also known as Query Execution Engine takes a
query evaluation plan from the optimizer, executes the plan, and
returns the answer to the query.

10/16/2023 Data storage and querying 46


Query processing and optimization

Steps in Query Processing


10/16/2023 Data storage and querying 47
Chapter seven
integrity and security

School of ECE

10/16/2023 School of ECE


Fundamental concepts integrity and security

 The two fundamental concepts that need to be considered while

designing database systems are:

 Maintaining the consistency of the database to all the changes, and

 Protecting the database from unauthorized users.

 Integrity constraints: it ensure that the changes made to the database

by authorized users do not result in a loss of data consistency.

10/16/2023 Integrity and security 49


Types of Constraints

♥ Key Constraints (Entity Integrity)


♥ Foreign Key Constraints (Referential Integrity)
♥ Domain Constraints (Domain Integrity)
♥ General Constraints (User Defined Integrity)

10/16/2023 Integrity and security 50


Domain constraint

♦ A domain constraint: every value(attribute) is bound to have


a specific range of values

Eg: name should follow char or varchar but, not integer.

Age can not have negative value or less than zero

Phone number should be 0-9 etc.

10/16/2023 Integrity and security 51


Cont..

♦ Syntax: CREATE DOMAIN <domain_name> <data_type>

CONSTRAINT <constraint_name> CHECK <constraint>

♦ Example: Salary of an employee is a two decimal point numeric field


in a range 150 to 6000

CREATE DOMAIN BasicSalary NUMERIC(9, 2)

CONSTRAINT SalaryRange CHECK (VALUE>=150.00 AND

VALUE<=6000.00)

10/16/2023 Integrity and security 52


General Constraint (A User Defined Constraint)

♦ Is an assertion defined by the user requirement. Means to specifies


integrity constraint
 The syntax for general assertion is: CREATE ASSERTION
<assertion_name> CHECK <predicate>
 The <predicate> is a valid conditional expression similar to the
<condition> in the WHERE clause of the SELECT-FROM-
WHERE statement.
 When an assertion is created the system tests it for validity of the
predicate and if the assertion is valid then can only any future
modification to the database is allowed.
10/16/2023 Integrity and security 53
General Constraint (A User Defined Constraint)

♦ Example: no employee should have salary greater than his manager.

CREATE ASSERTION salarylessthanmanagre CHECK (12<= ALL


(SELECT Eid FROM employee GROUP BY Eid

10/16/2023 Integrity and security 54


Triggers

Triggers are executed automatically in response to the


database object, database, and server events.
Triggers need to specify:

♣ The event that will cause or initiate the trigger execution.


♣ Condition to be specified for the trigger execution to
proceed.

♣ The action to be taken in response.

10/16/2023 Integrity and security 55


Cont..

The trigger action may be used to inform respective administrators to


take actions through email, or it may execute some operation in
response.
The trigger events are: - INSERT, DELETE, UPDATE and SELECT.
The actions for the triggers may be taken:
 After successful completion of the operation (event): AFTER
 Before the execution of the operation (event): BEFORE (INSTEAD
OF)

10/16/2023 Integrity and security 56


Cont..

The syntax for the trigger statement is:


CREATE TRIGGER <trigger_name>
ON {<table>|<view>}
{FOR | AFTER | INSTEAD OF} {[INSERT] | [UPDATE] |
[DELETE] | [SELECT]} AS <SQL_Statement>

10/16/2023 Integrity and security 57


Cont..
CREATE TRIGGER trigger_update ON employee
for update AS
BEGIN
SELECT * from inserted
SELECT * from inserted
END
Update employee set salary =salary+2000
where Eid =23

10/16/2023 Integrity and security 58


Security and Authorization

Database security refers to protection of the database from

malicious access such as:

♫Unauthorized reading of data,


♫ Unauthorized modification of data, and
♫ Unauthorized destruction of data

10/16/2023 Integrity and security 59


Cont..

Some of the threats to the database because of malicious access are:


 Loss of integrity,
 Loss of availability,
 Loss of confidentiality

Security measure levels


 Database System level
 Operating System level
 Network level etc.

10/16/2023 Integrity and security 60


Cont..

Database system security can be implemented with the use of:

♫Account and Role Creation


♫ Privilege granting
♫Privilege revocation and

♫Security level assignment.


Authorization levels in a database system can be set at broad
categories as:
Data Level Authorization – Read, Insert, Update, Delete

10/16/2023 Integrity and security 61


Cont..

Schema Level Authorization:

♫ Index

♫ Resource

♫ Alter

♫ Drop

10/16/2023 Integrity and security 62


Cont..

Giving Privilege syntax as follows:

♥ grant select, update, insert, delete, all on


table_name (column_list) to user_name

♥ Revoke select, update, insert, delete, all on


table_name (column_list) to user_name

♥ Deny select, update, insert, delete, all on


table_name (column_list) to user_name

10/16/2023 Integrity and security 63


Encryption and Authentication

Encryption is a transformation of intelligent (plain text) to

unintelligent massage (cipher text).

10/16/2023 Integrity and security 64


Cont..

Decryption is the reveres process of encryption in which the cipher


text is translated into a plain text

10/16/2023 Integrity and security 65


Cont..

Cryptography is the art or science concerning the principles, means,


and methods for rendering plain information unintelligible, and for
restoring the encrypted information to intelligible form.
Modern Cryptography systems can be broadly classified into
 symmetric-key systems that use a single key that both the sender and
recipient have.
 Asymmetric-key systems also known as public-key systems that use
two keys, a public key known to everyone and a private key that only
the recipient of messages uses.

10/16/2023 Integrity and security 66


Authentication

Is a process of verifying the identity of a user who is claimed to be.

There are two ways of authenticating a user:

 Use of Password: With the use of a password a user is requested for

user name and password upon login to a system.

10/16/2023 Integrity and security 67


Cont..

In a challenge response, the system sends a challenge string

to the user upon login request; then the user encrypts the

message and sends the encrypted message to the system.

The system verifies the user by comparing the originally

send challenge string and decrypted message received from

the user.

10/16/2023 Integrity and security 68


Chapter Eight
Introduction to
distributed and
parallel database

10/16/2023 School of ECE School of ECE


Database performance

Basic performance measure of database:


 Throughput: the number of query(transaction)
that can be completed in a given time interval.
 Response time: the amount of time it takes to
complete a single task from the time it is
submitted .

10/16/2023 Integrity and security 70


Distributed database

A distributed database system allows applications


to access data from local and remote databases.
distributed databases include the following features:
 Location independent.
 Distributed query processing.
 Distributed transaction management.
 Hardware independent.
10/16/2023 Integrity and security 71
Cont..

 Operating system independent.

 Network independent.

 Transaction transparency.

 DBMS independent

Data is physically stored across several


sites(location)
10/16/2023 Integrity and security 72
Cont..

Each site(location) is managed by an independent


DBMS
Multiple sites are working on database which
distributed among the network.
Sites are connected through WAN(internet).
No shared resource management required.

10/16/2023 Integrity and security 73


Cont..

10/16/2023 Integrity and security 74


Parallel database

A parallel database system exploits multiprocessing to


improve performance.
Parallel database architectures can be broadly classified
into three categories:
 Shared memory
 Shared disk, and
 Shared nothing.

10/16/2023 Integrity and security 75


Parallel database

Machines are physically close to each other. Eg: some


server room
multiple processor are handle the database.
Database is shared and portioned in disks.
Multiple processor executes the database operation in
parallel.
Nodes are connected through LAN, so speed is high.
Supports shared resource management.
10/16/2023 Integrity and security 76
Cont..

10/16/2023 Integrity and security 77


e n d fo r
l a s s l u c k
C o d
u g o ! !
h y o i nk !
I wi s r y t h
Ev e

You might also like