0% found this document useful (0 votes)
3 views

Study Guide Math

Uploaded by

SimSim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Study Guide Math

Uploaded by

SimSim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Chapter 4 - Update Operations, Update Anomalies, and Normalization

Update Operations
Update operation terminology note: in practice there are two different uses of the term “update
operation”:

A) Update operation as a collective term for insert, delete and modify operations

B) Update operation as a synonym for the modify operation

In this chapter we will use the term update operation as defined in A)

Insert operation - Used for entering new data in the relation

Delete operation - Used for removing data from the relation

Modify operation - Used for changing the existing data in the relation

Update Anomalies
Anomalies in relations that contain redundant (unnecessarily repeating) data, caused by update
operations

• Insertion anomaly - occurs when inserting data about one real-world entity requires inserting
data about another real-world entity
• Deletion anomaly - occurs when deletion of data about a real-world entity forces deletion of
data about another real-world entity
• Modification anomaly - occurs when, in order to modify one real-world value, the same
modification has to be made multiple times

Functional Dependencies
Functional dependency - occurs when the value of one (or more) column(s) in each record of a relation
uniquely determines the value of another column in that same record of the relation

For example

A→B

ClientID → ClientName

Two functional dependency notations


ClientID → ClientName

Streamlining functional dependencies


Not all functional dependencies need to be depicted

The following types of functional dependencies can be omitted:

Trivial functional dependencies - occurs when A→A


an attribute (or a set of attributes)
A, B → A, B
functionally determines itself or its subset
A, B → A
Trivial functional dependencies are not CampaignMgrID, CampaignMgrName
depicted → CampaignMgrName
Augmented functional dependencies - For example if a functional dependency :
functional dependency that contains an A→B
existing functional dependency exists in a relation, then :
Does not add new information to what is
A, C → B
already described by the existing functional
dependency is an augmented functional dependency
It can be omitted
Equivalent functional dependencies - occurs For example if functional dependencies :
when two columns (or sets of columns) that A→B
functionally determine each other determine
B→A
other columns
exists in a relation, then :
If one of the equivalent functional A→B
dependencies is depicted, the other B→A
equivalent functional dependency can be
are equivalent functional dependencies, and:
omitted
A → B, X
B → A, X
are equivalent functional dependencies, and:
Y,A → B, X
Y,B → A, X
are equivalent functional dependencies
Types of functional dependencies
The functional dependencies that are used as a basis for the typical normalization process can be
classified in one of the three categories:
Partial functional dependency - occurs when a column of a relation is functionally dependent on a
component of a composite primary key

• Only composite primary keys have separate components, while single-column primary keys do
not have separate components
• Hence, partial functional dependency can occur only in cases when a relation has a composite
primary key

Full key functional dependency - occurs when a primary key functionally determines the column of a
relation and no separate component of the primary key partially determines the same column

• If a relation has a single component (non-composite) primary key, the primary key fully
functionally determines all the other columns of a relation
• If a relation has a composite key, and portions of the key partially determine columns of a
relation, then the primary key does not fully functionally determine the partially determined
columns

Transitive functional dependency - occurs when nonkey columns functionally determine other nonkey
columns of a relation

Nonkey column is a column in a relation that is neither a primary nor a candidate key column

Normalization
A process used to improve the design of relational databases

The normalization process involves examining each table and verifying if it satisfies a particular normal
form

If a table satisfies a particular normal form, then the next step is to verify if that relation satisfies the next
higher normal form

If a table does not satisfy a particular normal form, actions are taken to convert the table into a set of
tables that satisfy the particular normal form

Normal form - term representing a set of particular conditions (whose purpose is reducing data
redundancy) that a table has to satisfy

From a lower to a higher normal form, these conditions are increasingly stricter and leave less possibility
for redundant data

There are several normal forms, most fundamental of which are: First normal form (1NF), Second normal
form (2NF) and Third normal form (3NF).

First Normal Form (1NF) - A table is in 1NF if each row is unique and no column in any row contains
multiple values

• 1NF states that each value in each column of a table must be a single value from the domain of
the column
• Every relational table is, by definition, in 1NF
• Related multivalued columns - columns in a table that refer to the same real-world concept
(entity) and can have multiple values per record
• Normalizing to 1NF involves eliminating groups of related multi-valued columns

Second Normal Form (2NF) - A table is in 2NF if it is in 1NF and if it does not contain partial functional
dependencies

• If a relation has a single-column primary key, then there is no possibility of partial functional
dependencies
• Such a relation is automatically in 2NF and it does not have to be normalized to 2NF
• If a relation with a composite primary key has partial dependencies, then it is not in 2NF, and it
has to be normalized it to 2NF
• Normalization of a relation to 2NF creates additional relations for each set of partial
dependencies in a relation
• The primary key of the additional relation is the portion of the primary key that functionally
determines the columns in the original relation
• The columns that were partially determined in the original relation are part of the additional
table
• The original table remains after the process of normalizing to 2NF, but it no longer contains the
partially dependent columns

Third Normal Form (3NF) - A table is in 3NF if it is in 2NF and if it does not contain transitive functional
dependencies

• Normalization of a relation to 3NF creates additional relations for each set of transitive
dependencies in a relation.
o The primary key of the additional relation is the nonkey column (or columns) that
functionally determined the nonkey columns in the original relation
o The nonkey columns that were transitively determined in the original relation are part of
the additional table.
• The original table remains after normalizing to 3NF, but it no longer contains the transitively
dependent columns

Normalization Exceptions
In general, database relations are normalized to 3NF in order to eliminate unnecessary data redundancy
and avoid update anomalies

However, normalization to 3NF should be done judiciously and pragmatically, which may in some cases
call for deliberately not normalizing certain relations to 3NF

Denormalization
Reversing the effect of normalization by joining normalized relations into a relation that is not
normalized, in order to improve query performance

The data that resided in fewer relations prior to normalization is spread out across more relations after
normalization

This has an effect on the performance of data retrievals


Denormalization can be used in dealing with the normalization vs. performance issue

Denormalization is not a default process that is to be undertaken in all circumstances

Instead, denormalization should be used judiciously, after analyzing its costs and benefits

ER-Modeling versus Normalization


ER modeling followed by mapping into a relational schema is one of the most common database design
methods

When faced with a non-normalized table, instead of identifying functional dependencies and going
through normalization to 2NF and 3NF, a designer can analyze the table and create an ER diagram based
on it (and subsequently map it into a relational schema)

Additional Streamlining of Database Content


Designer-added entities (tables) and keys

• Even if a relation is in 3NF additional opportunities for streamlining database content may still
exist
• Designer-added entities (tables) and designer-added keys can be used for additional streamlining
• Augmenting databases with designer added tables and keys is not a default process that is to be
undertaken in all circumstances
• Instead, augmenting databases with designer added tables and keys should be done judiciously,
after analyzing pros and cons for each augmentation

Chapter 5 – SQL
SQL - Structured Query Language

SQL is used for:

• Creating databases
• Adding, modifying, and deleting database structures
• Inserting, deleting, and modifying records in databases
• Querying databases (data retrieval)

SQL functions as a standard relational database language/became the standard language for querying
data contained in a relational database

It can be used (with minor dialectical variations) with most relational DBMS software tools.

RDBMS packages include MYSQL, Microsoft SQL server, and PostgreSQL

Semicolons follow the end of an SQL statement and indicates the end of the SQL command/of each SQL
statement.

SQL keywords, table names and column names in SQL commands are not case sensitive

Though usually broken down, SQL statements can be written as one long sentence in one line of text.
SQL Command Categories
Data Definition Language (DDL) - Used to create and modify the structure of the database. Ex: CREATE,
ALTER, DROP

Data Manipulation Language (DML) - Used to insert, modify, delete, and retrieve data. Ex: INSERT INTO,
UPDATE, DELETE, SELECT

Data Control Language (DCL) - Used for data access control.

Transaction Control Language (TCL) - Used for managing database transactions.

SQL data types


Each column of each SQL created relation has a specified data type

Commonly used SQL data types:

• CHAR(n) – fixed length n-character string


• VARCHAR(n) – variable length character string with a maximum size of n characters
• INT – integer
• NUMERIC(x, y) – number with x digits, y of which are after the decimal point
• DATE – date values (year, month, day)

SQL Commands
Command Description
CREATE TABLE Used for creating and connecting relational tables
DROP TABLE Used to remove a table from the database
INSERT INTO Used to populate the created relations with data
SELECT Used for the retrieval of data from the database
relations
Most commonly issued SQL statement
Basic form:
SELECT <columns>
FROM <table>

In addition to displaying columns, the SELECT


clause can be used to display derived attributes
(calculated columns) represented as expressions
SELECT statement can be structured as follows:
SELECT <columns, expressions>
FROM <table>

The SELECT FROM statement can contain other


optional keywords, such as WHERE, GROUP BY,
HAVING, and ORDER BY, appearing in this order: :
SELECT <columns, expressions>
FROM <tables>
WHERE <row selection condition>
GROUP BY <grouping columns>
HAVING <group selection condition>
ORDER BY <sorting columns, expressions>
DISTINCT Can be used in conjunction with the SELECT
statement
Eliminates duplicate values from a query result
ORDER BY Used to sort the results of the query by one or
more columns (or expressions)
LIKE Used for retrieval of records whose values
partially match a certain criteria
GROUP BY Enables summarizations across the groups of
related data within tables/enable use of
aggregate functions on values in a table.
HAVING Enables summarizations across the groups of
related data within tables
Determines which groups will be displayed in the
result of a query and, consequently, which groups
will not be displayed in the result of the query
A query that contains a HAVING clause must also
contain a GROUP BY clause
IN Used for comparison of a value with a set of
values
JOIN Facilitates the querying of multiple tables
Self-Join
A join statement that includes a relation that
contains a foreign key referring to itself, and joins
a relation with itself in a query.
Inner Join
Regular JOIN
Joins records from two tables where the value of
a specified column in a record in one table
matches the value of a specified column in
another (or same) table
Regular JOIN is also sometimes referred to as an
INNER JOIN, to differentiate it from the OUTER
JOIN
Outer Join
Variation of the JOIN operation that supplements
the results with the records from one relation
that have no match in the other relation
• LEFT OUTER JOIN
• RIGHT OUTER JOIN
• FULL OUTER JOIN
ALTER TABLE tableName (ADD/DROP) Used to change the structure of the relation, once
the relation is already created. Would be used to
drop a column from the table
UPDATE Used to modify the data stored in database
relations/change a value in a record of a table
DELETE Used to delete the data stored in database
relations
WHERE Determines which rows should be retrieved and
consequently which rows should not be retrieved
The logical condition determining which records
to retrieve can use one of the following logical
comparison operators:
= Equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
!= Not equal to
<> Not equal to (alternative notation)
VIEW Mechanism in SQL that allows the structure of a
query to be saved in the RDBMS
Also known as a virtual table
View is not an actual table and does not have any
data physically saved
Every time a view is invoked, it executes a query
that retrieves the data from the actual tables
A view can be used in SELECT statements just like
any other table from a database
IS NULL Used in queries that contain comparisons with an
empty value in a column of a record
EXISTS In queries where the inner query (nested query)
uses columns from the relations listed in the
SELECT part of the outer query, the inner query is
referred to as a correlated subquery
In such cases, the EXISTS operator can be used to
check if the result of the inner correlated query is
empty
NOT Can be used in conjunction with the condition
comparison statements returning the Boolean
values TRUE or FALSE
COUNT If counting records in a column, will not count
NULL value records
Aggregate Functions
For calculating and summarizing values in queries, SQL provides the following aggregate functions:

• COUNT
• SUM
• AVG
• MIN
• MAX

Nested Queries
A query that is used within another query (Yes, a query may contain another query (or queries).)

A nested query is also referred to as an inner querys

The query that uses the nested query is referred to as an outer query

Alias
An alternative and usually shorter name that can be used anywhere within a query instead of the full
relation name

Joining Multiple Relations


A query can contain multiple JOIN conditions, joining multiple relations

Inserting From a Query


A query retrieving the data from one relation can be used to populate another relation

Inappropriate use of Observed Values in SQL


A common beginner’s SQL mistake occurs when novice user creates a simplistic query that produces the
correct result by inappropriately using observed values

SQL standard and SQL syntax differences


Minor SQL syntax differences exist in SQL implementations in various popular RDBMS packages, such as
differences in:

• DATE and TIME data types


• FOREIGN KEY syntax
• Usage of AS keyword with aliases
• ALTER TABLE syntax
• Set operators
• FULL OUTER JOIN implementation
• Constraint management
• GROUP BY restrictions

The differences are minor, and a user of SQL in one RDBMS is be able to switch to another RDBMS with
very little additional learning effort.

Set Operators
Standard set operators: union, intersection, and difference
Used to combine the results of two or more SELECT statements that are union compatible

Two sets of columns are union compatible if they contain the same number of columns, and if the data
types of the columns in one set match the data types of the columns in the other set

The first column in one set has a compatible data type with the data type of the first column in the other
set, the second column in one set has a compatible data type with the data type of the second column in
the other set, and so on.

The set operators can combine results from SELECT statements querying relations, views, or other
SELECT queries.

UNION Used to combine the union compatible results of


two SELECT statements by listing all rows from
the result of the first SELECT statement and all
rows from the result of the other SELECT
statement
If two or more rows are identical only one of
them is shown (duplicates are eliminated from
the result)

INTERSECT Used to combine the results of two SELECT


statements that are union compatible by listing
every row that appears in the result of both of
the SELECT statements
MINUS (EXCEPT) Used to combine the results of two SELECT
statements that are union compatible by listing
every row from the result of the first SELECT
statement that does not appear in the result of
the other SELECT statement

Join Without Using a Primary Key/Foreign Key Combination


It is possible to join two tables without joining a foreign key column in one table with a primary key
column in another table.

A JOIN condition can connect a column from one table with a column from the other table as long as
those columns contain the same values.

Chapter 6 - Database Implementation and Use


Referential Integrity Constraint - In each row of a relation containing a foreign key, the value of the
foreign key EITHER matches one of the values in the primary key column of the referred relation OR the
value of the foreign key is null (empty).

Regulates the relationship between a table with a foreign key and a table with a primary key to which
the foreign key refers
Most RDBMS packages DO NOT implement assertions using CREATE ASSERTION statement.

Referential Integrity Constraint: Delete and Update Implementation Options


Referential integrity constraint - delete and update implementation options

Delete options:

• DELETE RESTRICT - makes it so you are unable to delete instances/rows that are referred to in
another table
• DELETE CASCADE -
• DELETE SET-TO-NULL
• DELETE SET-TO-DEFAULT - makes it so that when a value or instance that is referred to is deleted,
the value(s) that referred to it in the other table are assigned a default value.

Update options:

• UPDATE RESTRICT
• UPDATE CASCADE - makes it so that when you update a value that is referred to in another table
it matches with the updated value. That is to say that the original value and the foreign value
match
• UPDATE SET-TO-NULL - makes it so that when a value or instance that is referred to is updated,
the value(s) that referred to it in the other table are set to the null value
• UPDATE SET-TO-DEFAULT

Implementing User-Defined Constraints


Methods for implementing user-defined constraints include:

• CHECK clause (CHECK - Used to specify a constraint on a particular column of a relation)


• Assertions and triggers
• Coding in specialized database programming languages that combine SQL with additional non-
SQL statements for processing data from databases (such as PL/SQL)
• Embedding SQL with code written in regular programming languages (such as C++ or Java)

In many cases the logic of user-defined constraints is not implemented as a part of the database, but as a
part of the front-end database application

For the proper use of the database, it is important that user-defined constraints are implemented fully

Indexing
INDEX - Mechanism for increasing the speed of data search and data retrieval on relations with a large
number of records

Most relational DBMS software tools enable definition of indexes

The preceding examples provided simplified conceptual illustration of the principles on which an index is
based

Instead of simply sorting on the indexed column and applying binary search, different contemporary
DBMS tools implement indexes using different logical and technical approaches, such as:
• Clustering indexes
• Hash indexes
• B+ trees
• etc.

Each of the available approaches has the same goal – increase the speed of search and retrieval on the
columns that are being indexed

CREATE INDEX Example:


CREATE INDEX custname_index ON customer(custname);

Once this statement is executed, the effect is that the searches and retrievals involving the CustName
column in the relation CUSTOMER are faster

DROP INDEX Example:


DROP INDEX custname_index;

This statement drops the index, and the index is no longer used

Database Front-End
Provides access to the database for indirect use

In most cases, a portion of intended users (often a majority of the users) of the database lack the time
and/or expertise to engage in the direct use of the data in the database. It is not reasonable to expect
every person who needs to use the data from the database to write his or her own queries and other
statements.

A website can be an interface to a database.

Form - a mechanism that provides an interface into a query or relation in a database to input data or
retrieve data for an end-user.

Report - a mechanism that displays data and calculations on data from a database table(s) in a formatted
way to either be seen on a screen or printed out/hard copy

In addition to the forms and reports, database front-end applications can include many other
components and functionalities, such as:

• menus
• charts
• graphs
• maps
• etc.

The choice of how many different components to use and to what extent is driven by the needs of the
end-users

A database can have multiple sets of front-end applications for different purposes or groups of end-users
Data Quality
The data in a database is considered of high quality if it correctly and non-ambiguously reflects the real-
world it is designed to represent

Front-end applications can be accessible separately on their own or via an interface that allows the user
to choose an application that they need.

Data quality characteristics

• Accuracy - the extent to which data correctly reflects the real-world instances it is supposed to
depict
• Uniqueness - requires each real-world instance to be represented only once in the data
collection
o The uniqueness data quality problem is sometimes also referred to as data duplication
• Completeness - the degree to which all the required data is present in the data collection
• Consistency - the extent to which the data properly conforms to and matches up with the other
data
• Timeliness - the degree to which the data is aligned with the proper time window in its
representation of the real world
o Typically, timeliness refers to the “freshness” of the data
• Conformity - the extent to which the data conforms to its specified format

Preventive data quality actions - Actions taken to preclude data quality problems

• Mandating instantaneous entry of all new data


• Using an input mask such as dd/mm/yyyy for entry of a date value is an example of a
preventative data quality action.

Corrective data quality actions - Actions taken to correct the data quality problems

Transaction Management and Concurrency Control


What Is a Transaction?
Logical unit of work that must be either entirely completed or aborted

Successful transaction changes database from one consistent state to another. One in which all data
integrity constraints are satisfied

Most real-world database transactions are formed by two or more database requests. Equivalent of a
single SQL statement in an application program or transaction

When you read from or update a database entry, you create a transaction.

When the end of a program is successfully reached, it is equivalent to the execution of a COMMIT
command.

Evaluating Transaction Results


Not all transactions update database
SQL code represents a transaction because database was accessed

Improper or incomplete transactions can have devastating effect on database integrity. Some DBMSs
provide means by which user can define enforceable constraints. Other integrity rules are enforced
automatically by the DBMS

Transaction Properties
Atomicity - All operations of a transaction must be completed

Consistency - Permanence of database’s consistent state

Isolation - Data used during transaction cannot be used by second transaction until the first is completed

Durability - Once transactions are committed, they cannot be undone

Serializability - Concurrent execution of several transactions yields consistent results

Multiuser databases are subject to multiple concurrent transactions

Transaction Management with SQL


ANSI has defined standards that govern SQL database transactions

Transaction support is provided by two SQL statements: COMMIT and ROLLBACK

Transaction sequence must continue until:

• COMMIT statement is reached


• ROLLBACK statement is reached
• End of program is reached
• Program is abnormally terminated

The Transaction Log


Stores:

• A record for the beginning of transaction


• For each transaction component:
o Type of operation being performed (update, delete, insert)
o Names of objects affected by transaction
o “Before” and “after” values for updated fields
o Pointers to previous and next transaction log entries for the same transaction
• Ending (COMMIT) of the transaction

The information stored in the transaction log is used by the DBMS for a recovery requirement triggered
by a program's abnormal termination or a system failure such as a disk crash.

Concurrency Control
As long as two transactions, T1 and T2 access unrelated data, there is no conflict and the order of
execution is irrelevant to the final outcome.

Coordination of simultaneous transaction execution in a multiprocessing database


Objective is to ensure serializability of transactions in a multiuser environment

Three main problems:

• Lost updates
• Uncommitted data
• Inconsistent retrievals

Lost Updates
One of the three most common data integrity and consistency problems.

Two concurrent transactions update same data element

One of the updates is lost

Overwritten by the other transaction

Uncommitted Data
Two transactions are executed concurrently

First transaction rolled back after second already accessed uncommitted data

Inconsistent Retrievals
First transaction accesses data, Second transaction alters the data, First transaction accesses the data
again

Transaction might read some data before they are changed and other data after changed and thus yields
inconsistent results

A consistent database is one in which all data integrity constraints are satisfied.

The Scheduler
Special DBMS program - Purpose is to establish order of operations within which concurrent transactions
are executed

Interleaves execution of database operations. Ensures serializability and isolation

Serializable schedule - Interleaved execution of transactions yields same results as serial execution

Concurrency Control with Locking Methods


Lock - Guarantees exclusive use of a data item to a current transaction. Required to prevent another
transaction from reading inconsistent data

Lock manager - Responsible for assigning and policing the locks used by transactions

Lock Granularity
Indicates level of lock use

Locking can take place at following levels:

Database-level lock - Entire database is locked


Table-level lock - Entire table is locked

Page-level lock - Entire diskpage is locked (A diskpage or page, is the equivalent of a diskblock)

Row-level lock - Allows concurrent transactions to access different rows of same table, even if rows are
located on same page

Field-level lock - Allows concurrent transactions to access same row. Requires use of different fields
(attributes) within the row

Lock Types
Locks are required to prevent another transaction from reading inconsistent data.

Binary lock. Has two states: locked (1) or unlocked (0)

Exclusive lock. Access is specifically reserved for transaction that locked object. Must be used when
potential for conflict exists

Shared lock. Concurrent transactions are granted read access on basis of a common lock. A shared lock
produces no conflict as long as all the transactions are read only.

Two-Phase Locking to Ensure Serializability


Defines how transactions acquire and relinquish locks

Guarantees serializability, but does not prevent deadlocks

Growing phase - Transaction acquires all required locks without unlocking any data

Shrinking phase - Transaction releases all locks and cannot obtain any new lock

Governed by the following rules:

• Two transactions cannot have conflicting locks


• No unlock operation can precede a lock operation in the same transaction
• No data are affected until all locks are obtained

Deadlocks
Condition that occurs when two transactions wait for each other to unlock data.

When two transactions wait for the other to unlock data and one or both of them want an exclusive lock
on a data item.

Possible only if one of the transactions wants to obtain an exclusive lock on a data item

No deadlock condition can exist among shared locks

Three techniques to control deadlock: Prevention, Detection, Avoidance

Choice of deadlock control method depends on database environment

• Low probability of deadlock; detection recommended


• High probability; prevention recommended
Concurrency Control with Time Stamping Methods
Assigns global unique time stamp to each transaction

Produces explicit order in which transactions are submitted to DBMS

Uniqueness - Ensures that no equal time stamp values can exist

Monotonicity - Ensures that time stamp values always increase

Wait/Die and Wound/Wait Schemes


Wait/die - Older transaction waits and younger is rolled back and rescheduled

Wound/wait - Older transaction rolls back younger transaction and reschedules it

Concurrency Control with Optimistic Methods


Optimistic approach

• Based on assumption that majority of database operations do not conflict


• Does not require locking or time stamping techniques
• Transaction is executed without restrictions until it is committed
• Phases: read, validation, and write

Database Recovery Management


Restores database to previous consistent state

Based on atomic transaction property

• All portions of transaction are treated as single logical unit of work


• All operations are applied and completed to produce consistent database

If transaction operation cannot be completed, the transaction is aborted and changes to database are
rolled back.

Transaction Recovery
Write-ahead-log protocol: ensures transaction logs are written before data is updated

Redundant transaction logs: ensure physical disk failure will not impair ability to recover

Buffers: temporary storage areas in primary memory


Checkpoints: operations in which DBMS writes all its updated buffers to disk

Deferred-write technique - Only transaction log is updated

Recovery process: identify last checkpoint

• If transaction committed before checkpoint, do nothing


• If transaction committed after checkpoint, use transaction log to redo the transaction
• If transaction had ROLLBACK operation, do nothing

Write-through technique - Database is immediately updated by transaction operations during


transaction’s execution

Recovery process: identify last checkpoint

If transaction committed before checkpoint, Do nothing

If transaction committed after last checkpoint, DBMS redoes the transaction using “after” values

If transaction had ROLLBACK or was left active, Do nothing because no updates were made

Summary
Transaction: sequence of database operations that access database

Logical unit of work

No portion of transaction can exist by itself

Five main properties: atomicity, consistency, isolation, durability, and serializability

A single user database system automatically ensures serializability and isolation of the database because
only one transaction is executed at a time.

The implicit beginning of a transaction is when the first SQL statement is encountered.

COMMIT saves changes to disk

ROLLBACK restores previous database state

The rollback segment table space is used for transaction- recovery purposes.

SQL transactions are formed by several SQL statements or database requests

Transaction log keeps track of all transactions that modify database

Concurrency control coordinates simultaneous execution of transactions

Scheduler establishes order in which concurrent transaction operations are executed

Lock guarantees unique access to a data item by transaction

Two types of locks: binary locks and shared/exclusive locks

Serializability of schedules is guaranteed through the use of two-phase locking


Deadlock: when two or more transactions wait indefinitely for each other to release lock

Three deadlock control techniques: prevention, detection, and avoidance

Time stamping methods assign unique time stamp to each transaction

Schedules execution of conflicting transactions in time stamp order

Optimistic methods assume the majority of database transactions do not conflict

Transactions are executed concurrently, using private copies of the data

Database recovery restores database from given state to previous consistent state

Database Performance Tuning and Query Optimization


Database Performance-Tuning Concepts
Goal of database performance is to execute queries as fast as possible

Database performance tuning – a set of activities and procedures designed to reduce response time of
database system

The database performance tuning activities can be divided into those taking place on the client side or
on the server side.

All factors must operate at optimum level with minimal bottlenecks

Good database performance starts with good database design

Performance Tuning: Client and Server


Client side generates the SQL query that returns correct answer in least amount of time, using the
minimum amount of resources at server. SQL performance tuning

Server side is the DBMS environment configured to respond to clients’ requests as fast as possible,
achieving optimum use of existing resources. DBMS performance tuning

DBMS Architecture
All data in database are stored in data files

Data files automatically expand in predefined increments known as extends and grouped in file groups
or table spaces

Table space or file group - Logical grouping of several data files that store data with similar characteristics

Data cache or buffer cache: shared, reserved memory area that stores most recently accessed data
blocks in RAM

SQL cache or procedure cache: stores most recently executed SQL statements but also PL/SQL
procedures, including triggers and functions

DBMS retrieves data from permanent storage and places it in RAM

Input/output request: low-level data access operation to/from computer devices


Data cache is faster than data in data files. DBMS does not wait for hard disk to retrieve data

Majority of performance-tuning activities focus on minimizing I/O operations

Typical DBMS processes: Listener, user, scheduler, lock manager, optimizer

Database Statistics
Measurements about database objects and available resources:

Tables, Indexes, Number of processors used, Processor speed and Temporary space available

Make critical decisions about improving query processing efficiency

Can be gathered manually by DBA (Database Administrator) or automatically by DBMS

Query Processing
DBMS processes queries in three phases

Parsing - DBMS parses the query and chooses the most efficient access/execution plan

Execution - DBMS executes the query using chosen execution plan

Fetching - DBMS fetches the data and sends the result back to the client

Optimization is the central activity during the parsing phase in query processing.

SQL Parsing Phase


Break down query into smaller units

Transform original SQL query into slightly different version of original SQL code that is still fully
equivalent (results are always the same as the original query) but more efficient (will almost always
execute faster than original query).

Query optimizer analyzes SQL query and finds most efficient way to access data

The system table’s tablespace is used to store the data dictionary tables.

The SQL query is Validated for syntax compliance, Validated against data dictionary (Tables, column
names are correct and user has proper access rights), Analyzed and decomposed into components,
Optimized and Prepared for execution

Access plans are DBMS specific. They translate client’s SQL query into a series of complex I/O operations
and are required to read the data from the physical data files and generate result set

DBMS checks if access plan already exists for query in SQL cache

DBMS reuses the access plan to save time

If not, optimizer evaluates various plans, and the chosen plan is placed in SQL cache

SQL Execution Phase & SQL Fetching Phase


All I/O operations indicated in access plan are executed
• Locks acquired
• Data retrieved and placed in data cache
• Transaction management commands processed

Rows of resulting query result set are returned to client

DBMS may use temporary table space to store temporary data

Query Processing Bottlenecks


Delay introduced in the processing of an I/O operation that slows the system: CPU, RAM, Hard disk,
Network, Application code

Indexes and Query Optimization


Indexes are crucial in speeding up data access. They facilitate searching, sorting, and using aggregate
functions as well as join operations. They are an ordered set of values that contains index key and
pointers

More efficient to use index to access table than to scan all rows in table sequentially (full table scan)

Data sparsity - number of different values a column could possibly have. A measure that determines the
need for an index is the data sparsity of the column you want to index.

Indexes implemented using: Hash indexes, B-tree indexes, Bitmap indexes

DBMSs determine best type of index to use

Optimizer Choices
Most DBMSs operate in one of two optimization modes.

Rule-based optimizer – Has preset rules and points. Rules assign a fixed cost to each operation

Cost-based optimizer - Algorithms based on statistics about objects being accessed. Adds up processing
cost, I/O costs, resource costs to derive total cost

Using Hints to Affect Optimizer Choices


Optimizer might not choose best plan

Makes decisions based on existing statistics, but statistics may be old and thus it might choose less-
efficient decisions

Optimizer hints - special instructions for the optimizer embedded in the SQL command text
SQL Performance Tuning
Evaluated from client perspective

Most current relational DBMSs perform automatic query optimization at the server end

Most SQL performance optimization techniques are DBMS specific, Rarely portable

Majority of performance problems are related to poorly written SQL code

Carefully written query usually outperforms a poorly written query

Index Selectivity
Conditional Expressions
Query Formulation
Identify what columns and computations are required

Identify source tables

Determine how to join tables

Determine what selection criteria is needed

Determine in what order to display output

DBMS Performance Tuning


Includes managing DBMS processes in primary memory and structures in physical storage

DBMS performance tuning at server end focuses on setting parameters used for:

• Data cache
• SQL cache
• Sort cache
• Optimizer mode

The data cache is where the data read from the database files are stored after the data have been read
or before the data are written to the database files.

Some general recommendations for creation of databases:

• Use RAID (Redundant Array of Independent Disks) to provide balance between performance and
fault tolerance
• Minimize disk contention
• Put high-usage tables in their own table spaces
• Assign separate data files in separate storage volumes for indexes, system, high-usage tables
• Take advantage of table storage organizations in database
• Partition tables based on usage
• Use denormalized tables where appropriate
• Store computed and aggregate attributes in tables

Summary
Database performance tuning - Refers to activities to ensure query is processed in minimum amount of
time

SQL performance tuning - refers to activities on client side to generate SQL code

• Returns correct answer in least amount of time


• Uses minimum amount of resources at server end

DBMS architecture is represented by processes and structures used to manage a database

Database statistics refers to measurements gathered by the DBMS. Describe snapshot of database
objects’ characteristics

DBMS processes queries in three phases: parsing, execution, and fetching

Indexes are crucial in process that speeds up data access

During query optimization, DBMS chooses: Indexes to use, how to perform join operations, table to use
first, etc.

Hints change optimizer mode for current SQL statement

SQL performance tuning deals with writing queries that make good use of statistics

Query formulation deals with translating business questions into specific SQL code

The Scheduler establishes the order of concurrent transaction operations before they are executed, and
the lock manager assigns and regulates locks used by the transactions. (Not yet graded answer I wrote)

You might also like