0% found this document useful (0 votes)
37 views75 pages

Summaries chapters

summary chapter on healthcare

Uploaded by

Elisha Otibine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views75 pages

Summaries chapters

summary chapter on healthcare

Uploaded by

Elisha Otibine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Chapter 1: Database systems

1. Why Databases
2. Ubiquity and Pervasiveness of Data

Data is described as abundant, global, and everywhere in today's world.

Data is unescapable, prevalent, and persistent nature and it exists from birth to death.

Individuals continuously generate and consume a lot of data throughout their lives.

It starts with birth certificates and extends to death certificates, highlighting the lifelong
data generation process.

3. Importance of Databases

Databases are the optimal solution for storing and managing data effectively.

Databases make data persistent, shareable, and secure, addressing the challenges posed
by the sheer volume of generated data.

4. Business Necessity for Data

Data is considered essential for the survival and prosperity of organizations.

It is impossible to operate a business without crucial data on customers, products,


employees, and financial transactions.

5. Role of Business Information Systems:

Help businesses use information as an organizational resource.

Help in collection, storage, aggregation, manipulation, dissemination, and management


of data.

6. Challenges in Data Management

Businesses are not able to store and retrieve huge collections of data

Databases are the solution to efficiently process, store, and retrieve vast amounts of
data for timely decision-making.

7. Databases as Specialized Structures


Databases are specialized structures allowing computer-based systems to store,
manage, and retrieve data very quickly.

2. Data verses information

Data- raw facts (facts that have not been processed to reveal their meaning)

Information-is the processed raw data


Importance of processing data- helps in decision making

Database- is shared integrated computer structure that stores a collection of data

End user data- raw facts interested to the end user

Metadata- the data characteristics and the set of relationship that links the data found
within the database.

Database management systems- is the collection of programs that manages the


database structure and controls access to the data stored in the database
Roles of DBMS-serves as an intermediary between the user and the database

Advantages of DBMS

 It improves data sharing


 It improves data security
 It helps in better data integration
 It minimizes data inconsistency
 It improves Data access
 It improves decision making
 It increases end user productivity
Types of databases

Single user database- supports only one user at a time

Desktop database- single user database that runs on personal computer

Multiuser database- support multiple users at a time

Workgroup database- when multiuser database supports relatively small number of users

Enterprise database –when the database is used by the entire organization

Centralized database – Database that supports data located on the single site

Distributed database – database that supports data distributed across several different sites
Cloud database- Database that is created and maintained using cloud data services

General purpose database- contains a wide variety of data used in different disciplines

Discipline specific database- contains data focused on specific subject.

Operational database (online transaction processing database, transactional database, or


production database) – It supports day-to day company operation.

Analytical database- stores historical data and business metrics used exclusively for tactical or
strategic decision making.

Database warehouse: stores data in format optimized for decision support.

Online analytical processing: is set of tools that work together to provide an advanced data
analysis environment for retrieving, processing and modelling data from the data warehouse

Unstructured data- data that exists in its raw state

Structured data: formatted raw data to facilitate storage, use and generation of information.

Semi-structured data-data that has already been structured to some extent

Extensible markup Language - is a specialized language used to manipulate data elements in


textual formats

Database design- activities that focus on the design on the design of the database structure that
will be used to store and manage end-user data.

Problems with file data management

 Lengthy development time


 Difficulty in getting quick answers
 Complex system administration
 Lack of security and limited data sharing
 Extensive programing
Structure and Data independence-

Structural dependence- access to the file is dependent on the structure

Structural independence- exist when you can change the file structure without affecting the
applications’ ability to access data

Logical data format – how human being view data

Physical data format- How computers must work with the data
Data redundancy: occurs when the same data is stored unnecessarily at different places

Effects of data Redundancy

 Poor data security


 Data inconsistency
 Data entry errors
 Data integrity problems
Data anomalies- develops when not all of the required changes in the redundant data are made
successfully

Types of data anomalies

 Update anomalies
 Insertion anomalies
 Deletion anomalies
Database system environment- is an organization of components that define and regulate the
collection, storage, management and use of data within database environment.

 Components of system environments:


Hardware i.e. routers, Pcs, tablets, supercomputers
 Software i.e. Linus, UNIX
 People
 Procedure
 Data
Functions of DBMS

 Data dictionary management


 Data storage management
 Data transformation
 Data presentation
Disadvantages of database systems

o Increased cost
o Management complexities
o Maintaining currency
o Vendor dependence
o Needs frequent upgrades
Chapter 2: Data Models
Data model: is the simple representation, usually graphical of more complex real world data
structure

Importance of Data Models


It facilitates interaction among the designer, the application programmer and end user

Balding blocks of data modelling

Entity- a person, place, thing or event about which data will be collected and stored

Attributes – characteristic of an entity

Relationships – describes association among entities. Designers usually use shorthand notations
to represent one-to-many, many-to-many and one-to-one [1: M or M:N or *..* and 1:1 or 1..1
respectively]

Constraints- itis the restriction placed on data. They ensure data integrity and are expressed
inform of rules.

Business rules: is a brief, precise and unambiguous description of policy, procedure, or principle
within specific organization

The main source of business rules are company managers, policy makers, department managers
and written documentation such as company’s procedures, manuals and standards.

Importance of documenting business rules

 It helps to standardize the company’s view of data


 It can be a communication tool between users and designers
 It allows designers to understand the nature, role and scope of data
 It allows designers to understand business processes
 It allows designers to develop appropriate relationship participation rule and constraints
and to create accurate data models
Evolution of data models

The quest for better a better data management model led to the development of several models
Hierarchical model – developed in 1960 to manage large amount of complex data

-the model is represented by an upside down tree

-used by Apollo rocket to land to the moon in 1969

-Has level or segments

Segment: it is the equivalent of the files system’s record type

Network model: represents complex data relationship more effectively than hierarchical model
to improve data performance and impose a database standard

Schema- is the conceptual organization of the entire database as viewed by the database
administrator.

Subschema – it defines the portion of the database “seen” seen by the application programs
that actually produce the desired information from the data within the
database.

Schema data definition language- enables the database administrator to define the schema
components.

A data manipulation language- defines environment in which data can be managed and it is
used to work with the data within the database.

Relational model- introduced in 1970 by E.F Codd

-it was the major breakthrough the designers

-has the ability to hide complexities


Relational diagram- is the representation of the relational database’s entities,
the attributes within those entities and the relationship within those entities.

Parts of any SQL relational database application

 The end user interface


 A collection of tables stored in Database
 SQL engine
Entity relationship model (Components)

 Entity –anything about which data will be collected and stored


 Entity instance or occurrence- Each raw in relational table
 Entity set- a collection of like entities
Object oriented Model (components) – also called sematic data model because it indicates
meaning

Inheritance- is the ability of the object within the class hierarchy to inherit the attributes and
methods of the classes above it.

UML class diagram- are used to represent data and its relationship within the upper UML object
oriented systems modeling language

Object/relational and XMl

Emerging data models [Big data and NoSQL]

Big Data: is the movement to find new and better ways to manage large amount of web-and
sensor generated data and drive business insight from it while simultaneously providing high
performance and scalability at a reasonable cost.

Volume, Velocity and variety (3V’’s format)

Volume- amount of data being stored

Velocity- it is the speed with which data grows and the need to process this data quickly in order
to generate information and insight.

Variety- refers to the data being collected comes in multiple different data formats.

Big data technologies –Hadoop- is java based, open source, high speed, fault tolerant
distributed storage and computational framework

-Uses low cost hardware and it originated from google

- Has two components; Hadoop distributed file system and


MapReduce
a. Hadoop distributed files system-highly distributed, fault tolerant,
high speed, uses write once read many model. Has three nodes; name
node, data node and client node

b. Map reduce – is an open source application programing interface


that provides first data analytics

- works with structure and nanostructured data

-Provides two functions- map and reduce

-NoSQL- a large scale distributed database system

- stores both structural and nonstructural data

-refers to the new generation of databases that addresses


specific challenges of the big data era

Characteristics of No SQL Database

 Not based on the relational model or SQL hence name NoSQL


 Support highly distributed databases
 Highly available, scalable, fault tolerant
 Supports very large amount of sparse data
Chapter 3: The Relational Database Model
Tables and their characteristics
A table- is a two dimensional structure composed of rows and columns

- Also called relation because the relational model creator E.F Codd used the two terms as
synonyms.
Characteristics of relational tables

- perceived as s two dimensional structure composed of rows and columns


- Each row represent a single entity
- Each column represent an attribute and each column has distinct name
- Each intersection of rows and column represent a single data value
- All values in the column must conform to the same data format
- Each column has specific range of values known as attribute domain
- the order of the rows and column is immaterial to the DBMS
- Each table must have an attribute or combination of attributes that uniquely identify each
row
Primary key (PK) - is an attribute or combination of attributes that uniquely identifies any given row.

Keys: - ensure each row in a table is uniquely identifiable

- also used to establish relationship in tables


Dependencies:

Determination is the in which knowing the value of the attribute makes it possible to determine the
value of another.

Functional dependencies- relationship based on determination

Determinant- is the attribute in a functional dependency whose value determines another.

Dependent- is the attribute whose value is determined by the other attribute.

Full functional dependence –is a functional dependencies in which the entire collection of attributes in
the determinant is necessary for the relationship.

Types of Keys

Composite key – key composed of one attribute

Key attributes – an attribute that is part of the key

Super key – a key that can uniquely identify any row in a table
Candidate key- one specific type of super key

Primary key- a candidate key selected to identify all other attribute value in a a given row, cannot
contain null entry

Foreign key- an attribute or combination of attributes in one table whose value must either much the
primary key in other table

Entity integrity- is the condition in which each row in a table has its own unique identity

To ensure entity integrity, the primary key has two requirements:

- All values in the primary key must be unique


- No key attribute in the primary key can contain null
Relational algebra- defines the theoretical way of manipulating table content using relational operators.
Chapter 4: Entity Relationship (ER)
Modeling
 ER model stands for an Entity-Relationship model. This model is used to define the data
elements and relationship for a specified system.

 It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.

 In ER modeling, the database structure is portrayed as a diagram called an entity-relationship


diagram.

 Entity-table that hold specific information (data)


 Entities, defined as tables that hold specific information (data)
 Entities, defined as tables that hold specific information (data)
 Relationships- associations or interactions between entities
 Attribute-characteristics of entities.

Types of attributes

 Required attribute- attribute that must have a value


 Optional attribute- an attribute that must not have a value thus can be left empty
 The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined
 An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.
 An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.
 Single valued attribute- an attribute that can only have one single value.
 Multivalued Attribute- An attribute can have more than one value. These attributes are known
as a multivalued attribute. The double oval is used to represent multivalued attribute.
 An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.
Domain- is a set of possible value for a given attribute

3. Relationship-A relationship is used to describe the relation between entities.

Connectivity and cardinality


Connectivity- describes the relationship classification
Cardinality- expresses the minimum and maximum number of entity occurrence associated with one occurrence of
the related entity.

Existence dependence
Existence dependence – occurs when an entity is associated with another entity occurrence

Existence independence – occurs when an entity is existing apart from all of its related entities. It is also called
strong or regular entity.

Relationship strength
 Relationships are the glue that holds the tables together. They are used to connect related information
between tables.
 Relationship strength is based on how the primary key of a related entity is defined.
 A weak, or non-identifying, relationship exists if the primary key of the related entity does not contain a
primary key component of the parent entity
 A strong, or identifying, relationship exists when the primary key of the related entity contains the
primary key component of the parent entity
Relationship participation
Optional relationship- one entity occurrence does not require the corresponding entity occurrence in
particular relationship.

Mandatory relationship- one entity occurrence requires the corresponding entity occurrence in a
particular relationship.

Relationship Degree
 Ternary relationship: a relationship type that involves many to many relationships between
three tables.
 Binary relationship: occurs when two entities are associated in a relationship
 Unary relationship: one in which a relationship exists between occurrences of the same entity
set.
 Recursive relationship- is one in which relationships can exist between occurrence of the same
entities.
Database design Challenges

Design standards- the database design must conform to the to the design standards

Processing speed- the processing speed must be higher to minimize access time

Information requirements
Chapter 5: Advanced Data Modeling
Extended entity relationship model

 Result of adding more semantic constructs to original entity relationship (ER) model
 Diagram using this model is called an EER diagram (EERD)
Entity Supertypes and Subtypes

Entity supertype: -Generic entity type related to one or more entity subtypes

-Contains common characteristics

Entity subtypes – Contains unique characteristics of each entity subtype

Specialization Hierarchy

 Depicts arrangement of higher-level entity supertypes and lower-level entity


subtypes
 Relationships described in terms of “IS-A”relationships
 Subtype exists only within context of supertype
 Every subtype has only one supertype to which it is directly related
 Can have many levels of supertype/subtype relationships
Inheritance

 Enables entity subtype to inherit attributes and relationships of supertype


 All entity subtypes inherit their primary key attribute from their supertype
 At implementation level, supertype and its subtype(s) maintain a 1:1 relationship
 Entity subtypes inherit all relationships in which supertype entity participates
 Lower-level subtypes inherit all attributes and relationships from all upper level-supertypes
Subtype Discriminator

Attribute in supertype entity

 Determines to which entity subtype each supertype occurrence is related


 Default comparison condition for subtype discriminator attribute is equality comparison
 Subtype discriminator may be based on other comparison condition
Disjoint and Overlapping Constraints

 Disjoint subtypes – Also known as non-overlapping subtypes – Subtypes that contain unique
subset of supertype entity set
 Overlapping subtypes – Subtypes that contain nonunique subsets of supertype entity set
Completeness Constraint

 Specifies whether entity supertype occurrence must be a member of at least one subtype
 Partial completeness – Symbolized by a circle over a single line – Some supertype occurrences
that are not members of any subtype
 Total completeness – Symbolized by a circle over a double line – Every supertype occurrence
must be member of at least one subtype
Specialization and Generalization

Specialization

 Identifies more specific entity subtypes from higher-level entity supertype


 Top-down process
 Based on grouping unique characteristics and relationships of the subtypes
Generalization

 Identifies more generic entity supertype from lower-level entity subtypes


 Bottom-up process
 Based on grouping common characteristics and relationships of the subtypes
Entity Clustering

 “Virtual” entity type used to represent multiple entities and relationships in ERD
 Considered “virtual” or “abstract” because it is not actually an entity in final ERD
 Temporary entity used to represent multiple entities and relationships
 Eliminate undesirable consequences – Avoid display of attributes when entity clusters are used
Entity Integrity: Selecting Primary Keys

 Primary key most important characteristic of an entity – Single attribute or some combination of
attributes
 Primary key’s function is to guarantee entity integrity
 Primary keys and foreign keys work together to implement relationships
 Properly selecting primary key has direct bearing on efficiency and effectiveness
Natural Keys and Primary Keys

 Natural key is a real-world identifier used to uniquely identify real-world objects – Familiar to
end users and forms part of their day-to-day business vocabulary
Primary Key Guidelines

 Attribute that uniquely identifies entity instances in an entity set – Could also be combination of
attributes
 Main function is to uniquely identify an entity instance or row within a table
 Guarantee entity integrity, not to “describe” the entity
 Primary keys and foreign keys implement relationships among entities – Behind the scenes,
hidden from user
When to Use Composite Primary Keys

Composite primary keys useful in two cases: – As identifiers of composite entities


 Where each primary key combination allowed once in M:N relationship – As identifiers of weak
entities Where weak entity has a strong identifying relationship with the parent entity
 Automatically provides benefit of ensuring that there cannot be duplicate values
When To Use Surrogate Primary Keys

 Especially helpful when there is:


 No natural key
 Selected candidate key has embedded semantic contents
 Selected candidate key is too long or cumbersome
Design Cases

• Four special design cases that highlight:

–Importance of flexible design

– Proper identification of primary keys

– Placement of foreign keys

a. Design Case #1: Implementing 1:1 Relationships


 Foreign keys work with primary keys to properly implement relationships in relational
model
 Put primary key of the “one” side on the “many” side as foreign key – Primary key:
parent entity – Foreign key: dependent entityIn 1:1 relationship two options: – Place a
foreign key in both entities (not recommended) – Place a foreign key in one of the
entities
 Primary key of one of the two entities appears as foreign key of other
b. Design Case #2: Maintaining History of Time-Variant Data
 Existing attribute values replaced with new value without regard to previous value Time-
variant data: – Values change over time – Must keep a history of data changes
 Keeping history of time-variant data equivalent to having a multivalued attribute in your
entity
 Must create new entity in 1:M relationships with original entity
 New entity contains new value, date of change
Design Case #3: Fan Traps

 Design trap occurs when relationship is improperly or incompletely identified – Represented in a


way not consistent with the real world
 Most common design trap is known as fan trap
 Fan trap occurs when one entity is in two 1:M relationships to other entities – Produces an
association among other entities not expressed in the model
Design Case #4: Redundant Relationships

 Redundancy is seldom a good thing in database environment


 Occur when there are multiple relationship paths between related entities
 Main concern is that redundant relationships remain consistent across model
 Some designs use redundant relationships to simplify the design
Chapter 6: Database Tables and

Normalization

• Normalization: evaluating and correcting table structures to minimize data redundancies

Normal forms

 First normal form (1NF)


 Second normal form (2NF)
 Third normal form (3NF)

Structural point of view of normal forms

• Higher normal forms are better than lower normal forms


• Properly designed 3NF structures meet the requirement of fourth normal form (4NF)

• Denormalization: produces a lower normal form • Results in increased performance but


greater data redundancy

The Need for Normalization

• Used while designing a new database structure or when adding to an existing structure

• Analyzes the relationship among the attributes within each entity

• Determines if the structure can be improved through normalization

• Improves the existing data structure and creates an appropriate database design

The Normalization Process

Objective is to ensure that each table conforms to the concept of well-formed

relations

• Each table represents a single subject


• Each row/column intersection contains only one value and not a group of values

• No data item will be unnecessarily stored in more than one table

• All nonprime attributes in a table are dependent on the primary key

• Each table has no insertion, update, or deletion anomalies

Ensures that all tables are in at least 3NF

• Higher forms are not likely to be encountered in business environment

• Works one relation at a time

• Identifies the dependencies of a relation (table)

• Progressively breaks the relation up into a new set of relations

Conversion to First Normal Form (1NF)

Repeating group: group of multiple entries of same type can exist for any single

key attribute occurrence

• Reduces data redundancies

• Three step procedure

• Eliminate the repeating groups

• Identify the primary key (may be composite)

• Identify all dependencies

• Dependency diagram: depicts all dependencies found within given table

structure

• Helps to get an overview of all relationships among table’s attributes

• Makes it less likely that an important dependency will be overlooked

• 1NF describes tabular format in which:

• All key attributes are defined

• There are no repeating groups in the table

• All attributes are dependent on the primary key


• All relational tables satisfy 1NF requirements

• Some tables contain partial dependencies

• Update, insertion, or deletion

Conversion to First Normal Form (1NF)

• The relation or table or report given (called a relation from now on) should have NO
repeating groups and should identify any multi-valued attributes (these will be broken out later
after the normalization process)

• Normalizing the relation will help to reduce or eliminate possible data redundancies
and anomalies

• 1NF is the first step in the normalization process … it requires three (3) steps:

Step 1: Eliminate (fill in) any Repeating Groups (nulls) and identify any multi-valued
attributes in given relation

• See next view graph where we have filled-in the repeating groups (nulls) in the given
relation

• We have identified emp_name as a multi-valued attribute … we will leave it as is for


now (it will be broken out into a new table/relation later after the normalization process)

Step 2: Identify the Primary Key (PK) or composite PK by analyzing the existing data

• Attribute(s) chosen must uniquely identify each row in the given relation

• The composite or singular PK must be determined from existing attributes

Step 3: Define any Partial and/or Transitive Dependencies

• All dependencies need to be shown on a 1NF dependency diagram

Conversion to Second Normal Form (2NF)

• Conversion to 2NF occurs only when the 1NF has a composite primary key

• If the 1NF has a single-attribute primary key, then the table is automatically in 2NF

• The 1NF-to-2NF conversion is simple

• Make new tables to eliminate partial dependencies

• Reassign corresponding dependent attributes

• Table is in 2NF when it:


• Is in 1NF

• Includes no partial dependencies

Conversion to Third Normal Form (3NF)

The data anomalies created by the database organization shown in Figure 6.4 are easily
eliminated

• Make new tables to eliminate transitive dependencies

• Reassign corresponding dependent attributes

• Table is in 3NF when it:

• Is in 2NF

• Contains no transitive dependencies

Improving the Design

Normalization is valuable because its use helps eliminate data redundancies

• Evaluate PK assignments and naming conventions

• Refine attributes atomicity

• Identify new attributes and new relationships

• Refine primary keys as required for data granularity

• Maintain historical accuracy and evaluate using derived attributes

Denormalization

Denormalization produces a lower normal form

-Results in increased performance but greater data redundancy

• Creation of normalized relations is an important database design goal

• Processing requirements should also be a goal

• If tables are decomposed to conform to normalization requirements:

-Number of database tables expands

Joining the larger number of tables reduces system speed

• Conflicts are often resolved through compromises that may include denormalization
• Defects of unnormalized tables:

-Data updates are less efficient because tables are larger

-Indexing is more cumbersome

-No simple strategies for creating virtual tables known as views


Chapter 7: Introduction to Structured Query
Language (SQL)
Data definition command

Data manipulation command


Tasks to be completed before Using a New RDBMS

• Create database structure


• RDBMS creates physical files that will hold database
• Differs from one RDBMS to another
• Authentication: Process DBMS uses to verify that only registered users access the data
• Required for the creation tables
• User should log on to RDBMS using user ID and password created by database
administrator

The Database Schema

• Logical group of database objects related to each other


• Command
• CREATE SCHEMA AUTHORIZATION {creator};
• Seldom used directly

Common SQL Data Types


Numeric
NUMBER(L,D) or NUMERIC(L,D)

Character
CHAR(L)
VARCHAR(L) or VARCHAR2(L)

Date
DATE

Creating Table Structures


• Use one line per column (attribute) definition
• Use spaces to line up attribute characteristics and constraints
• Table and attribute names are capitalized
• Features of table creating command sequence
• NOT NULL specification
• UNIQUE specification
• Syntax to create table
• CREATE TABLE table name();

Primary Key and Foreign Key


• Primary key attributes contain both a NOT NULL and a UNIQUE specification
• RDBMS will automatically enforce referential integrity for foreign keys
• Command sequence ends with semicolon
• ANSI SQL allows use of following clauses to cover CASCADE, SET NULL, or SET DEFAULT
• ON DELETE and ON UPDATE

SQL Constraints
SQL Indexes

When primary key is declared, DBMS automatically creates unique index

Composite index:

• Is based on two or more attributes


• Prevents data duplication
• Syntax to create SQL indexes
• CREATE INDEX indexname ON tablename();
• Syntax to delete an index
• DROP INDEX indexname;

Data Manipulation Commands


Data Manipulation Commands

Inserting Table Rows with a SELECT Subquery


 Syntax
• INSERT INTO tablename SELECT columnlist FROM tablename
• Used to add multiple rows using another table as source
• SELECT command - Acts as a subquery and is executed first
• Subquery: Query embedded/nested inside another query

Selecting Rows Using Conditional Restrictions


 Following syntax enables to specify which rows to select
 SELECT columnlist
 FROM tablelist
 [WHERE conditionlist];
 Used to select partial table contents by placing restrictions on the rows
 Optional WHERE clause
 Adds conditional restrictions to the SELECT statement
Comparison Operators

 Add conditional restrictions on selected table contents


 Used on:
 Character attributes
 Dates

Comparison Operators

Comparison Operators: Computed Columns and Column Aliases


 SQL accepts any valid expressions/formulas in the computed columns
 Alias: Alternate name given to a column or table in any SQL statement to improve the
readability
 Computed column, an alias, and date arithmetic can be used in a single query

Arithmetic operators

 The Rule of Precedence: Establish the order in which computations are completed
 Perform:
 Operations within parentheses
 Power operations
 Multiplications and divisions
 Additions and subtractions

The Arithmetic Operators

Special Operators
Advanced Data Definition Commands

 ALTER TABLE command: To make changes in the table structure


 Keywords use with the command
 ADD - Adds a column
 MODIFY - Changes column characteristics
 DROP - Deletes a column
 Used to:
 Add table constraints
 Remove table constraints

Changing Column’s Data Type

 ALTER can be used to change data type

 Some RDBMSs do not permit changes to data types unless column is empty

 Syntax –

 ALTER TABLE tablename MODIFY (columnname(datatype));

Changing Column’s Data Characteristics

 Use ALTER to change data characteristics


 Changes in column’s characteristics are permitted if changes do not alter the existing data
type
 Syntax
 ALTER TABLE tablename MODIFY (columnname(characterstic));

Adding Column, Dropping Column

 Adding a column
o Use ALTER and ADD
 Do not include the NOT NULL clause for new column
 Dropping a column
o Use ALTER and DROP
 Some RDBMSs impose restrictions on the deletion of an attribute

Advanced Data Updates

 UPDATE command updates only data in existing rows


 If a relationship is established between entries and existing columns, the relationship can assign
values to appropriate slots
 Arithmetic operators are useful in data updates
 In Oracle, ROLLBACK command undoes changes made by last two UPDATE statements

Copying Parts of Tables

 SQL permits copying contents of selected table columns


 Data need not be reentered manually into newly created table(s)
 Table structure is created
 Rows are added to new table using rows from another table

Adding Primary and Foreign Key Designations

 ALTER TABLE command


 Followed by a keyword that produces the specific change one wants to make
 Options include ADD, MODIFY, and DROP
 Syntax to add or modify columns
 ALTER TABLE tablename
 {ADD | MODIFY} ( columnname datatype [ {ADD | MODIFY} columnname datatype] ) ;
 ALTER TABLE tablename
 ADD constraint [ ADD constraint ] ;

Deleting a Table from the Database

 DROP TABLE: Deletes table from database


 Syntax - DROP TABLE tablename;
 Can drop a table only if it is not the one side of any relationship
 RDBMS generates a foreign key integrity violation error message if the table is dropped
Chapter 8: Advanced SQL
Data Definition Commands:

Creating the database

• Before a new RDBMS can be used, the database structure and the tables that will hold the
end-user data must be created

• The database schema- Logical group of database objects—such as tables and indexes—that
are related to each other

• Data types- Character, numeric, and date

Creating table structures

SQL constraints

• FOREIGN KEY

• NOT NULL

• UNIQUE

• DEFAULT

• CHECK

• Create a table with a SELECT statement

 Rapidly creates a new table based on selected columns and rows of an existing table
using a subquery
 Automatically copies all of the data rows returned

• SQL indexes

 CREATE INDEX improves the efficiency of searches and avoids duplicate column values
 DROP Index deletes an index.

Altering table structures

 All changes in the table structure are made by using the ALTER TABLE command followed by a
keyword that produces the specific change you want to make
 ADD, MODIFY, and DROP
 Changing a column’s data type
 ALTER
 Changing a column’s data characteristics
 If the column to be changed already contains data, you can make changes in the column’s
characteristics if those changes do not alter the data type
 Adding a column
 You can alter an existing table by adding one or more columns
 Be careful not to include the NOT NULL clause for the new column
 Adding primary key, foreign key, and check constraints
 Primary key syntax:

ALTER TABLE PART

ADD PRIMARY KEY (PART_CODE);

 Foreign key syntax:

ALTER TABLE PART

ADD FOREIGN KEY (V_CODE) REFERENCES

VENDOR;

 Check constraint syntax:


ALTER TABLE PART

ADD CHECK (PART_PRICE >= 0);

 Dropping a column

Syntax:

ALTER TABLE VENDOR

DROP COLUMN V_ORDER;

 Deleting a table from the database

Syntax:

DROP TABLE PART;

Data manipulation command


 Adding table rows
INSERT command syntax:

INSERT INTO tablename VALUES (value1, value2, …, valuen)

 Inserting rows with null attributes: use NULL entry


 Inserting rows with optional attributes: indicate attributes that have required values

• Inserting table rows with a SELECT subquery

• Add multiple rows to a table, using another table as the source, at the same time

• SELECT syntax:

INSERT INTO target_tablename[(target_columnlist)]

SELECT source_columnlist

FROM source_tablename;

• Saving table changes

• COMMIT command syntax:

COMMIT [WORK]

Updating table rows

• UPDATE command is used to modify data in a table

• UPDATE syntax:

UPDATE tablename

SET columnname = expression [, columnname =expression]

[WHERE conditionlist ];

• Deleting table rows

DELETE statement syntax:

DELETE FROM tablename

[WHERE conditionlist ];

• Restoring table contents

ROLLBACK command is used restore the database to its previous condition ROLLBACK;

Virtual Tables creating view


• View: virtual table based on a SELECT query
Base tables: tables on which the view is based

• CREATE VIEW statement: data definition command that stores the subquery specification in the data
dictionary

CREATE VIEW command syntax:

CREATE VIEW viewname AS SELECT query

Updating views

• Used to update attributes

• Batch update routine: pools multiple transactions into a single batch to update a master table
field in a single operation

• Updatable view restrictions

 GROUP BY expressions or aggregate functions cannot be used


 Set operators cannot be used
 Most restrictions are based on the use of JOINs or group operators in views

Sequences

• Many similarities in the use of sequences across these DBMSs

 Independent object in the database


 Have a name and can be used anywhere a value expected
 Not tied to a table or column
 Generate a numeric value that can be assigned to any column in any table
 Table attribute with an assigned value can be edited and modified

Procedural SQL

 Performs a conditional or looping operation by isolating critical code and making all application
programs call the shared code

Better maintenance and logic control

 Persistent stored module (PSM): block of code

Contains standard SQL statements and procedural extensions that is stored and executed at
the DBMS server

 Procedural Language SQL (PL/SQL)

• Use and storage of procedural code and SQL statements within the database

• Merging of SQL and traditional programming constructs

 Procedural code is executed as a unit by DBMS when invoked by end user

• Anonymous PL/SQL blocks


• Triggers

• Stored procedures

• PL/SQL functions

Triggers

 Procedural SQL code automatically invoked by RDBMS when given data manipulation event
occurs
 Parts of a trigger definition

• Triggering timing: indicates when trigger’s PL/SQL code executes

• Triggering event: statement that causes the trigger to execute

- Triggering level: statement- and row-level

- Triggering action: PL/SQL code enclosed between the BEGIN and END keywords

 DROP TRIGGER trigger_name command

• Deletes a trigger without deleting the table

• Trigger action based on conditional DML predicates

• Actions depend on the type of DML statement that fires the trigger

Stored Procedures

• Named collection of procedural and SQL statements

• Stored in the database

• Can be used to encapsulate and represent business transactions

 Advantages

• Reduce network traffic and increase performance

• Decrease code duplication by means of code isolation and code sharing

PL/SQL Processing with Cursors

Cursor: special construct used to hold data rows returned by a SQL query

• Implicit cursor: automatically created when SQL statement returns only one value

• Explicit cursor: holds the output of a SQL statement that may return two or more rows

• Syntax:

CURSOR cursor_name IS select-query;


• Cursor-style processing involves retrieving data from the cursor one row at time

• Current row is copied to PL/SQL variables

PL/SQL Stored Functions


Stored function: named group of procedural and SQL statements that returns a value

• Indicated by a RETURN statement in its program code

• Can be invoked only from within stored procedures or triggers

• Cannot be invoked from SQL statements unless the function follows some very specific
compliance rules

Embedded SQL

SQL statements contained within an application programming language

Host language: any language that contains embedded SQL statements

• Differences between SQL and procedural languages

• Run-time mismatch

- SQL is executed one instruction at a time

- Host language runs at client side in its own memory space

• Processing mismatch

- Conventional programming languages process one data element at a time

- Newer programming environments manipulate data sets in a cohesive manner

• Data type mismatch - Data types provided by SQL might not match data types used in different host
languages

• Embedded SQL framework defines:

• Standard syntax to identify embedded SQL code within the host language

• Standard syntax to identify host variables

• Communication area used to exchange status and error information between SQL and host
language
Chapter 9: Database design
The Information System

 A database is a carefully designed collection of facts within a larger system called an information
system.

 The information system collects, stores, transforms, and retrieves data, helping to manage both
data and information.

 People, hardware, software, databases, application programs, and procedures are all part of a
complete information system.

 Systems analysis establishes the need for an information system, while systems development is
the process of creating it.

 Information systems today should be aligned with strategic business goals and integrated with
the company’s wider information systems architecture.

 Applications within the system turn data into useful information for decision-making through
reports, tabulations, and graphics.

 The performance of an information system depends on database design, application design, and
administrative procedures.

 Database design should focus on creating complete, normalized, and integrated models that are
flexible and scalable over time.

 Procedures for database development are applicable across different types of information
systems, but the scale may vary.

 The Systems Development Life Cycle (SDLC) helps understand the activities required to develop
and maintain information systems.

 Different methodologies like Unified Modeling Language (UML), Rapid Application Development
(RAD), and Agile Software Development offer alternative approaches but work within the same
framework.
The Systems Development Life Cycle

 The Systems Development Life Cycle (SDLC) guides the history of an information system.

 SDLC offers a comprehensive view for designing and developing databases and applications.

 The traditional SDLC includes five phases: planning, analysis, detailed systems design,
implementation, and maintenance.

 Planning phase involves assessing the company's objectives and considering whether to
continue, modify, or replace the existing system.

 Feasibility study in planning phase addresses technical, cost, and operational aspects of the new
system.

 Analysis phase examines user requirements, existing systems, and creates a logical system
design.

 Detailed Systems Design phase completes the design process and plans for system conversion
and training.

 Implementation phase involves installing hardware, software, and application programs, testing,
debugging, and training end-users.

 Maintenance phase handles changes and updates to the system, including corrective, adaptive,
and perfective maintenance activities.

 CASE tools aid in software engineering, making systems more structured, documented, and
standardized, thus extending their operational life.

The Database Life Cycle

• The Database Life Cycle (DBLC) comprises six phases: database initial study, database design,
implementation and loading, testing and evaluation, operation, and maintenance and evolution.
• The initial study phase involves analyzing the company situation, defining problems and
constraints, defining objectives, and setting scope and boundaries.
• Analyzing the company situation includes understanding the company's operational
components, structure, and mission.
• Defining problems involves collecting information on how the existing system functions and
identifying areas of inefficiency or failure.
• Defining objectives aims to address major problems identified during the initial study, focusing
on creating efficient solutions.
• Setting scope and boundaries determines the extent of the design according to operational
requirements and external constraints like time, budget, and existing hardware and software.
Database Design
• The second phase of the Database Life Cycle (DBLC) focuses on database design.
• This phase ensures the final product meets user and system requirements.
• Two views of data are considered: the business view and the designer's view.
• Database design is an iterative process with three essential stages: conceptual, logical, and
physical design.
• Most designs and implementations are based on the relational model.
• The design process includes selecting the DBMS, creating the database, and loading or
converting the data.
• Implementation may involve installing the DBMS, creating the database, and loading data.
• Cloud-based database services may incur additional costs for data loading due to network traffic
charges.
Testing and Evaluation
• In the implementation phase, decisions from the design phase are put into action for integrity,
security, performance, and recoverability.
• The DBA tests and fine-tunes the database, ensuring its performance aligns with expectations,
often in conjunction with application programming.
• Database testing ensures data integrity, security, and adherence to management policies.
• Testing also addresses broader security concerns like physical security, password security,
access rights, audit trails, data encryption, and diskless workstations.
• Database performance is evaluated based on various factors such as hardware, software
environment, data characteristics, and configuration parameters.
• Evaluation includes broader system tests, integration issues, deployment plans, user training,
and finalizing system documentation.
• Backup and recovery plans are tested to protect against data loss, often employing fault-tolerant
components and automated backup functions.
• Recovery procedures involve restoring the database from backups following hardware or
software failures, with recovery processes varying based on the extent of the failure.
• Testing, evaluation, and modification iteratively continue until the system is certified for
operational use.
Maintenance and Evolution
 Routine maintenance tasks are crucial for database administrators.
 Periodic maintenance activities include:
 Preventive maintenance (backup)
 Corrective maintenance (recovery)
 Adaptive maintenance (performance enhancement, adding entities and attributes)
 Assignment and maintenance of access permissions
 Generating database access statistics for audits and performance monitoring
 Conducting periodic security audits
 Creating system usage summaries for internal billing or budgeting purposes
Conceptual Design
• Second phase of DBLC
• Comprises conceptual, logical, and physical design stages
• Aimed at software and hardware-independent database design
Conceptual Design:
• Goal: Develop database independent of software and physical details
• Output: Conceptual data model describing entities, attributes, relationships, constraints
• Utilizes data modeling for abstract database structure representing real-world objects
• Flexibility needed for future hardware and database model choices
Minimal Data Rule:
• All needed data must be in the model, all data in the model must be needed
• Focus on future data needs to ensure flexibility and endurance of investment
Steps in Conceptual Design:
• Data analysis and requirements
• Entity relationship modeling and normalization
• Data model verification
• Distributed database design
Data Analysis and Requirements

 Information Needs:

 Output requirements (reports, queries)

 End-user data views

 Information sources and extraction methods

 Data elements, attributes, relationships, volume, usage, transformations

 Data Gathering Methods:

 Developing end-user data views

 Observing existing system

 Interfacing with systems design group

 Business Rules:

 Derived from operations description

 Define entities, attributes, relationships, constraints

 Critical for database design accuracy and effectiveness

Slide 3: Entity Relationship Modeling and Normalization

 Standards Enforcement:

 Use of diagrams, symbols, writing style, layout

 Ensures effective communication and documentation

 Steps in Developing ER Model:

 Business rule analysis

 Main entity identification

 Relationship definition
 Attribute, primary key, and foreign key definition

 Entity normalization

 ER diagram completion

 Model validation against end-user requirements

 Iterative Process:

 Continuous refinement until model accurately reflects system demands

 Activities often parallel and iterative

Slide 4: Data Model Verification

 Verification Process:

 Test against end-user data views, transactions, security, constraints

 Modules defined based on business functions

 Modules verified against system requirements

 Iterative process for each module

 Merge into Enterprise ER Model:

 Merge module ER fragments into single model

 Validate entities and attributes

 Ensure consistency and accuracy in design

 Iterative Nature:

 Continuous verification against business transactions and requirements

 Sequential process for each module

This presentation covers the conceptual design phase of the Database Life Cycle (DBLC), focusing
on creating a software and hardware-independent database structure. Here's a summary of each slide:
Conceptual Design Overview:

 Introduction to the second phase of DBLC.

 Explanation of the conceptual design stage and its goals.

 Importance of flexibility for future hardware and database model choices.

Data Analysis and Requirements:

 Methods for gathering information needs from end users.

 Importance of deriving accurate business rules for effective database design.

Entity Relationship Modeling and Normalization:

 Importance of enforcing standards for effective communication.

 Detailed steps in developing the Entity Relationship (ER) model.

 Explanation of the iterative nature of the design process.

Data Model Verification:

 Verification process against end-user requirements and system constraints.

 Explanation of module-based design and verification.

 Importance of merging module ER fragments into a cohesive enterprise model for


consistency and accuracy.

Distributed Database Design:

 Some databases may need distribution across multiple locations.

 Database fragments, subsets of a database, may be stored at various locations.

 Design ensures integrity, security, and performance of distributed database fragments.

 Detailed in Chapter 12 of Distributed Database Management Systems.

DBMS Software Selection:


 Critical for smooth system operation.

 Considerations include cost, features, underlying model, portability, and hardware


requirements.

 Tools like query by example (QBE), data dictionaries, and security influence selection.

 End users must be aware of limitations.

Logical Design:

 Second stage in database design process.

 Aims to create an enterprise-wide database based on specific data model.

 Involves mapping conceptual model to logical constructs.

 Steps include mapping entities, relationships, and constraints; validating model; ensuring
normalization.

Database Design Strategies:

 Top-down approach: Identifies data sets, defines data elements, suitable for larger, complex
systems.

 Bottom-up approach: Identifies data elements, groups them into sets, suitable for smaller
databases.

 Often complementary, selection depends on problem scope and personal preference.

Centralized Versus Decentralized Design:

 Centralized design: All decisions made centrally by small group, suitable for small-scale
problems.

 Decentralized design: Complex projects divided into modules, each designed independently,
suitable for large, distributed systems.

 Requires precise definition of boundaries and interrelations among data subsets.


Chapter 10: Transaction Management
and Concurrency Control
What Is a Transaction?

 A transaction in a database involves actions like writing new invoices, updating


inventory, and modifying account transactions.

 Transactions include various SQL statements like SELECT, UPDATE, INSERT, or a


combination of these.

 A transaction is a logical unit of work that must be entirely completed or entirely


aborted, ensuring database consistency.

 Database integrity is maintained by executing transactions in a consistent state,


adhering to data integrity constraints.

Evaluating Transaction Results

 Transactions can involve single or multiple SQL statements, each constituting a database
request.

 Even read-only transactions, like SELECT queries, are considered transactions because
they access the database.

 The text provides an example transaction involving sales, showing how different tables
are affected and how the database state changes.

 Successful transactions are finalized with a COMMIT statement, ensuring changes are
permanently saved.

Handling Transaction Failures

 Incomplete transactions due to system failures can leave the database in an inconsistent
state.
 Sophisticated DBMSs support transaction management to handle such failures and roll
back to a consistent state.

 Users are responsible for ensuring that transactions accurately represent real-world
events to maintain database integrity.

 DBMSs can enforce integrity rules like primary key constraints automatically to validate
transactions and prevent errors.

Transaction Properties

Atomicity: Ensures all parts of a transaction are treated as a single unit of work. If any part fails,
the entire transaction is aborted.

Consistency: Maintains database integrity by transitioning it from one consistent state to


another. If any part of a transaction violates integrity constraints, the entire transaction is
aborted.

Isolation: Prevents concurrent transactions from accessing data used by another transaction
until it completes, ensuring data integrity.

Durability: Guarantees that once changes are committed, they cannot be lost, even in the event
of system failure.

Serializability: Ensures that the results of concurrent transaction execution are consistent with
those of a serial execution, crucial in multiuser and distributed databases.

Transaction Management with SQL: ANSI standards govern SQL transactions, supported by
COMMIT and ROLLBACK statements.

• COMMIT: Permanently records changes within the database, ending the


transaction. ROLLBACK: Aborts changes and restores the database to its
previous state.
• Transactions begin implicitly with the first SQL statement, although specific
implementations may have transaction management statements.

Transaction Log
 Maintains a record of all transactions updating the database, crucial for recovery.

 Stores transaction details like operation type, object names, before-and-after values, and
transaction boundaries.

 Used by the DBMS for recovery in cases of ROLLBACK, abnormal termination, or system failure.

Concurrency Control

 Manages simultaneous transaction execution in multiuser environments to ensure serializability.

 Aims to preserve isolation property to prevent issues like lost updates, uncommitted data, and
inconsistent retrievals.

 Problems like lost updates occur when concurrent transactions overwrite each other's changes.

 Uncommitted data arises when a transaction accesses data rolled back by another transaction.

 Inconsistent retrievals happen when a transaction reads data before and after updates by other
transactions, leading to erroneous results.

Introduction to Transaction Consistency

• The concept of database transactions, emphasizing the need for consistency before and
after transaction execution
• It introduces the idea of temporary inconsistency during transaction execution,
especially when multiple tables and rows are updated.

Role of the Scheduler

• The scheduler, a special component of the DBMS, determines the order of operations
within concurrent transactions to ensure serializability and isolation.
• It achieves this by using concurrency control algorithms like locking or time stamping.

Efficient Resource Utilization

• The scheduler also ensures efficient utilization of CPU and storage resources by
interleaving transaction operations effectively.
• Without scheduling, transactions would be executed on a first-come, first-served basis,
leading to inefficiencies.

Data Isolation

• The scheduler ensures that transactions do not update the same data simultaneously,
thus preventing conflicts.
• Various conflict scenarios are discussed, highlighting the importance of proper
scheduling methods.

Concurrency Control with Locking Method

• Locking methods, including pessimistic locking, are introduced as common techniques to


manage concurrency.
• Locks ensure exclusive access to data items by current transactions, preventing
inconsistencies.

Lock Granularity

• Lock granularity refers to the level at which locks are applied, such as database, table,
page, row, or field.
• Different levels have varying degrees of restrictiveness and efficiency, with page-level
locks being most commonly used in multiuser DBMSs.

Database-Level Lock

Locks the entire database, suitable for batch processes but not for multiuser DBMSs due to slow
data access.

Table-Level Lock

Locks entire tables, causing traffic jams in multiuser environments and delaying transactions
unnecessarily

Page-Level Lock: Locks entire disk pages, allowing concurrent access to different parts of the
same table.

It's the most frequently used method for multiuser DBMSs.


Row-Level Lock: Less restrictive than table or page-level locks, allows concurrent access to
different rows on the same page, but increases overhead due to managing individual row locks.

Field-Level Lock: Allows concurrent access to different fields within the same row, but rarely
implemented due to high overhead and the practicality of row-level locks.

Concurrency Control with Time Stamping

 Time stamping assigns a unique global time stamp to each transaction.

 Time stamps must be both unique and monotonically increasing.

 All operations within a transaction share the same time stamp.

 Conflicting operations are executed in time stamp order to ensure serializability.

 Disadvantages include increased memory needs and processing overhead.

Wait/Die and Wound/Wait Schemes

 Two schemes for managing concurrent transactions: wait/die and wound/wait.

 In wait/die, older transactions wait for younger ones to complete; younger transactions may die.

 In wound/wait, older transactions preempt younger ones, or wait for them to complete.

 Deadlocks can occur, so transactions have associated time-out values.

Concurrency Control with Optimistic Methods

 Optimistic approach assumes most operations don't conflict.

 Transactions proceed through read, validation, and write phases.

 Changes are made to a private copy of the database until validation.

 If validation fails, transaction restarts; if successful, changes are applied permanently.

ANSI Levels of Transaction Isolation

 ANSI SQL standard defines isolation levels: Read Uncommitted, Read Committed, Repeatable
Read, Serializable.
 Each level controls what data transactions can see during execution.

 Oracle and SQL Server provide additional isolation levels.

 Transaction isolation levels affect performance and data consistency.

Database Recovery Management:

 Recovery restores a database from an inconsistent to a consistent state.

 Recovery techniques are based on atomic transaction property.

 Hardware/software failures, human-caused incidents, and natural disasters can cause critical
errors.

 Recovery techniques also apply to system and database after critical errors occur.

Transaction Recovery- Transaction recovery involves restoring a database to a consistent state after
a failure.

Write-Ahead-Log Protocol: Ensures transaction logs are written before database data is updated.

Redundant Transaction Logs: Multiple copies of transaction logs prevent data loss due to disk
failures.

Database Buffers: Temporary storage in primary memory to speed up disk operations.

Database Checkpoints:

Operations where updated buffers are written to disk, ensuring database and log synchronization.

Deferred Write Technique: Transaction operations update transaction log first, then database after
commit.

Write-Through Technique: Transactions immediately update database, even before commit.

Checkpoint: Operation writing updated buffers to disk in transaction management.

Deferred Update: Transaction operations don't immediately update physical database.

Immediate Update: Database immediately updated by transaction operations.


Summary of Transaction Management: Covers transaction properties, SQL support, concurrency
control, and serialization.

Concurrency Control: Techniques to coordinate simultaneous transaction execution.

Locks: Ensure exclusive access to data items during transactions.

Serializability: Ensured by two-phase locking, preventing deadlock.

Time Stamping Methods: Assigns unique time stamps to transactions to resolve conflicts.

Optimistic Methods: Assumes most transactions don't conflict, updates private copies of data.

Database Recovery: Restores database to consistent state after critical events like hardware errors.
Chapter 11: Database Performance Tuning
and Query Optimization
Database Performance-Tuning Concepts

• Goal of database performance is to execute queries as fast as possible

• Database performance tuning

– Set of activities and procedures designed to reduce response time of database system

• All factors must be checked to ensure that each one operates at its optimum level and has
sufficient resources to minimize occurrence of bottlenecks

• Good database performance starts with good database design

• All factors must be checked to ensure that each one operates at its optimum level and has
sufficient resources to minimize occurrence of bottlenecks

• Good database performance starts with good database design

• All factors must be checked to ensure that each one operates at its optimum level and has
sufficient resources to minimize occurrence of bottlenecks

• Good database performance starts with good database design


DBMS Architecture

• All data in database are stored in data files

• Data files

– Automatically expand in predefined increments known as extends

– Generally grouped in file groups of table spaces

• Table space or file group is logical grouping of several data files that store data with similar
characteristics

• DBMS retrieve data from permanent storage and place it in RAM


• Data cache or buffer cache is shared, reserved memory area that stores most recently accessed
data blocks in RAM
• SQL cache or procedure cache is shared, reserved memory area that stores most recently
executed SQL statements or PL/SQL procedures, including triggers and functions.
Database Statistics
• Refers to number of measurements about database objects and available resources
– Tables
– Indexes
– Number of processors used
– Processor speed
– Temporary space available
• Make critical decisions about improving query processing efficiency
• Can be gathered manually by DBA or automatically by DBMS
Query Processing
• DBMS processes queries in three phases
– Parsing
– Execution
– Fetching
SQL Parsing Phase

• Breaking down (parsing) query into smaller units and transforming original SQL query into
slightly different version of original SQL code
– Fully equivalent

• Optimized query results are always the same as original query

– More efficient

• Optimized query will almost always execute faster than original query

SQL Execution Phase

• All I/O operations indicated in access plan are executed

SQL Fetching Phase

• Rows of resulting query result set are returned to client

• DBMS may use temporary table space to store temporary data


Indexes and Query Optimization

• Indexes

– Crucial in speeding up data access

– Facilitate searching, sorting, and using aggregate functions as well as join operations

– Ordered set of values that contains index key and pointers

• More efficient to use index to access table than to scan all rows in table sequentially

Optimizer Choices

• Rule-based optimizer

– Uses set of preset rules and points to determine best approach to execute query

• Cost-based optimizer

– Uses sophisticated algorithms based on statistics about objects being accessed to


determine best approach to execute query

SQL Performance Tuning

• Evaluated from client perspective

– Most current-generation relational DBMSs perform automatic query optimization at the


server end

– Most SQL performance optimization techniques are DBMS-specific and are rarely
portable

Index Selectivity

• Indexes are likely used when:

– Indexed column appears by itself in search criteria of WHERE or HAVING clause

– Indexed column appears by itself in GROUP BY or ORDER BY clause

– MAX or MIN function is applied to indexed column

– Data sparsity on indexed column is high

• Measure of how likely an index will be used

• Indexes are likely used when:


• Indexed column appears by itself in search criteria of WHERE or HAVING clause

• Indexed column appears by itself in GROUP BY or ORDER BY clause

• MAX or MIN function is applied to indexed column

• Data sparsity on indexed column is high

• Measure of how likely an index will be used

Conditional Expressions

• Normally expressed within WHERE or HAVING clauses of SQL statement

• Restricts output of query to only rows matching conditional criteria

Query Formulation

• Identify what columns and computations are required

• Identify source tables

• Determine how to join tables

• Determine what selection criteria is needed

• Determine in what order to display output

DBMS Performance Tuning

• Includes global tasks such as managing DBMS processes in primary memory and structures in
physical storage

• Includes applying several practices examined in previous section

DBMS Performance Tuning

• DBMS performance tuning at server end focuses on setting parameters used for:

– Data cache

– SQL cache

– Sort cache

– Optimizer mode

DBMS Performance Tuning

• Some general recommendations for creation of databases:


– Use RAID (Redundant Array of Independent Disks) to provide balance between
performance and fault tolerance

– Minimize disk contention

– Put high-usage tables in their own table spaces


Chapter 12: Distributed Database
Management Systems

The Evolution of Distributed Database Management Systems

• Distributed database management system (DDBMS)

– Governs storage and processing of logically related data over interconnected computer
systems in which both data and processing functions are distributed among several
sites

• Centralized database required that corporate data be stored in a single central site

• Dynamic business environment and centralized database’s shortcomings spawned a demand for
applications based on data access from different sources at multiple locations (PDAs for
example)

Distributed Processing and Distributed Databases

• Distributed processing

– Database’s logical processing is shared among two or more physically independent sites

– Connected through a network

– For example, the data input/output (I/O), data selection, and data validation might be
performed on one computer, and a report based on that data might be created on
another computer

• Distributed database

– Stores logically related database over two or more physically independent sites

– Database composed of database fragments


DDBMS Advantages

• Advantages include:

– Data are located near “greatest demand” site

– Faster data access

– Faster data processing

– Growth facilitation: New sites can be added to the network without affecting the
operations of other sites.

– Improved communications: Because local sites are smaller and located closer to
customers

– Reduced operating costs: Add workstation not mainframe

– User-friendly interface

– Less danger of a single-point failure

– Processor independence: end user is able to access any available copy of the data, and
an end user’s request is processed by any processor at the data location.

DDBMS Disadvantages
• Disadvantages include:

– Complexity of management and control

– Security

– Lack of standards

– Increased storage requirements: Multiple copies of data are required at different sites

– Increased training cost

Characteristics of Distributed Management Systems

• Application interface: interact with the end user, application programs, and other DBMSs

• Validation: to analyze data requests for syntax correctness

• Transformation: to decompose complex requests into atomic data request components

• Query optimization: to find the best access strategy

• Mapping: to determine the data location of local and remote fragments

• I/O interface: to read or write data from or to permanent local storage

• Formatting: to prepare the data for presentation to the end user or to an application program

• Security: to provide data privacy at both local and remote databases

• Backup and recovery: to ensure the availability and recoverability of DB in case of a failure

• DB administration

• Concurrency control: to manage simultaneous data access and to ensure data consistency

• Transaction management: to ensure that the data moves from one consistent state to another

• Must perform all the functions of centralized DBMS

• Must handle all necessary functions imposed by distribution of data and processing

• Must perform these additional functions transparently to the end user

Mon 8-7 DDBMS Components

• Must include (at least) the following components:

– Computer workstations
– Network hardware and software

– Communications media

– Transaction processor (application processor, transaction manager)

Software component found in each computer that requests data application’s data requests

Single-Site Processing, Single-Site Data (SPSD)

• All processing is done on single CPU or host computer (mainframe, midrange, or PC)

• All data are stored on host computer’s local disk

• Processing cannot be done on end user’s side of system. several processes to run concurrently
on a host computer accessing a single DP

• Typical of most mainframe and midrange computer DBMSs

• DBMS is located on host computer, which is accessed by dumb terminals connected to it

Multiple-Site Processing, Single-Site Data (MPSD)

• Multiple processes run on different computers sharing single data repository

• MPSD scenario requires network file server running conventional applications that are accessed
through LAN

• Many multiuser accounting applications, running under personal computer network, fit such a
description

Multiple-Site Processing, Multiple-Site Data (MPMD)

• Fully distributed database management system with support for multiple data processors and
transaction processors at multiple sites

• Classified as either homogeneous or heterogeneous

• Homogeneous DDBMSs

– Fully distributed database management system with support for multiple data
processors and transaction processors at multiple sites

– Classified as either homogeneous or heterogeneous

– Homogeneous DDBMSs

– Integrate only one type of centralized DBMS over a network


Integrate only one type of centralized DBMS over a network

• Allow end user to feel like database’s only user

• Features include:

– Distribution transparency

– Transaction transparency

– Failure transparency

– Performance transparency

– Heterogeneity transparency

Distribution Transparency: Allows management of physically dispersed database as though it were


a centralized database

Transaction Transparency

• Ensures database transactions will maintain distributed database’s integrity and consistency

• Ensures transaction completed only when all database sites involved complete their part

• Distributed database systems require complex mechanisms to manage transactions

• To ensure consistency and integrity

Distributed Requests and Distributed Transactions

• Remote request: single SQL statement accesses data from single remote database

• Remote transaction: accesses data at single remote site

• Distributed transaction: requests data from several different remote sites on network

• Distributed request: single SQL statement references data at several DP sites

Performance Transparency

• Objective of query optimization routine is to minimize total cost associated with execution of
request

• Costs associated with request are function of:

– Access time (I/O) cost

– Communication cost
– CPU time cost

• Must provide:

– distribution transparency: Allows management of physically dispersed database as


though it were a centralized database

Distributed Database Design

• Design concepts for centralized Database:

– The Relational Database Model

– Entity Relationship Modeling; and

– Normalization of Database Tables

• Three new issues for distributed Database:

– Data fragmentation

• How to partition database into fragments

– Data replication

• Which fragments to replicate

– Data allocation

• Where to locate those fragments and replicas

Data Fragmentation

• Breaks single object ( Db or table) into two or more segments or fragments

• Each fragment can be stored at any site over computer network

• Information about data fragmentation is stored in distributed data catalog (DDC), from which it
is accessed by TP to process user requests

• Breaks single object ( Db or table) into two or more segments or fragments

• Each fragment can be stored at any site over computer network

• Information about data fragmentation is stored in distributed data catalog (DDC), from which it
is accessed by TP to process user requests

Data Replication

• Storage of data copies at multiple sites served by computer network


• Fragment copies can be stored at several sites to serve specific information requirements

– Can enhance data availability and response time

– Can help to reduce communication and total query costs

• Replication scenarios

– Fully replicated database

• Stores multiple copies of each database fragment at multiple sites

• Can be impractical due to amount of overhead

– Partially replicated database

• Stores multiple copies of some database fragments at multiple sites

• Most DDBMSs are able to handle the partially replicated database well

– Unreplicated database

• Stores each database fragment at single site

• No duplicate database fragments

Data Allocation

• Deciding where to locate data: which data to locate where

• Data distribution over computer network is achieved through data partition, data replication, or
combination of both

• Allocation strategies

– Centralized data allocation

• Entire database is stored at one site

– Partitioned data allocation

• Database is divided into several disjointed parts (fragments) and stored at


several sites

– Replicated data allocation

• Copies of one or more database fragments are stored at several sites

Client/Server vs. DDBMS


• Way in which computers interact to form system

• Features user of resources, or client, and provider of resources, or server

• Can be used to implement a DBMS in which client is the TP and server is the DP

• Client/server advantages

• Less expensive than alternate minicomputer or mainframe solutions

• Allow end user to use microcomputer’s GUI, thereby improving functionality and
simplicity

• More people in job market have PC skills than mainframe skills

• PC is well established in workplace

• Numerous data analysis and query tools exist to facilitate interaction with DBMSs
available in PC market

• Considerable cost advantage to offloading applications development from mainframe to


powerful PCs

• Client/server disadvantages

• Creates more complex environment

• Different platforms (LANs, operating systems, and so on) are often difficult to
manage

• An increase in number of users and processing sites often paves the way for security
problems

• Possible to spread data access to much wider circle of users

• Increases demand for people with broad knowledge of computers and software

• Increases burden of training and cost of maintaining the environment


Chapter 13: Business Intelligence and Data
Warehouses
The Need for Data Analysis
• Managers track daily transactions to evaluate how the business is performing

• Strategies should be developed to meet organizational goals using operational databases

• Data analysis provides information about short-term tactical evaluations and strategies

Business Intelligence
• Comprehensive, cohesive, integrated tools and processes

– Capture, collect, integrate, store, and analyze data

– Generate information to support business decision making

• Framework that allows a business to transform:

– Data into information

– Information into knowledge

– Knowledge into wisdom

Business Intelligence Architecture

• Composed of data, people, processes, technology, and management of components

• Focuses on strategic and tactical use of information

• Key performance indicators (KPI)

– Measurements that assess company’s effectiveness or success in reaching goals

• Multiple tools from different vendors can be integrated into a single BI framework

Business Intelligence Benefits


• Main goal: improved decision making

• Other benefits
– Integrating architecture

– Common user interface for data reporting and analysis

– Common data repository fosters single version of company data

– Improved organizational performance

Business Intelligence Technology Trends


• Data storage improvements

• Business intelligence appliances

• Business intelligence as a service

• Big Data analytics

• Personal analytics

Decision Support Data


• BI effectiveness depends on quality of data gathered at operational level

• Operational data seldom well-suited for decision support tasks

• Need reformat data in order to be useful for business intelligence

Decision Support Database Requirements

• Specialized DBMS tailored to provide fast answers to complex queries

• Three main requirements

– Database schema

– Data extraction and loading

– Database size

The Data Warehouse

• Integrated, subject-oriented, time-variant, and nonvolatile collection of data

– Provides support for decision making

• Usually a read-only database optimized for data analysis and query processing

• Requires time, money, and considerable managerial effort to create


Data Marts

• Small, single-subject data warehouse subset

• More manageable data set than data warehouse

• Provides decision support to small group of people

• Typically lower cost and lower implementation time than data warehouse

Star Schemas

• Data-modeling technique

– Maps multidimensional decision support data into relational database

• Creates near equivalent of multidimensional database schema from relational data

• Easily implemented model for multidimensional data analysis while preserving relational
structures

• Four components: facts, dimensions, attributes, and attribute hierarchies

Facts

• Numeric measurements that represent specific business aspect or activity

– Normally stored in fact table that is center of star schema

• Fact table contains facts linked through their dimensions

• Metrics are facts computed at run time

Attributes

• Use to search, filter, and classify facts

• Dimensions provide descriptions of facts through their attributes

• No mathematical limit to the number of dimensions

Attribute Hierarchies

• Provide top-down data organization

• Two purposes:

– Aggregation

– Drill-down/roll-up data analysis


• Determine how the data are extracted and represented

• Stored in the DBMS’s data dictionary

• Used by OLAP tool to access warehouse properly

Star Schema Representation

• Facts and dimensions represented in physical tables in data warehouse database

• Many fact rows related to each dimension row

– Primary key of fact table is a composite primary key

– Fact table primary key formed by combining foreign keys pointing to dimension tables

• Dimension tables are smaller than fact tables

Performance-Improving Techniques for the Star Schema

• Four techniques to optimize data warehouse design:

– Normalizing dimensional tables

– Maintaining multiple fact tables to represent different aggregation levels

– Denormalizing fact tables

– Partitioning and replicating tables

Data Analytics

• Subset of BI functionality

• Encompasses a wide range of mathematical, statistical, and modeling techniques

– Purpose of extracting knowledge from data

• Tools can be grouped into two separate areas:

– Explanatory analytics

– Predictive analytics

Data Mining

• Data-mining tools do the following:

– Analyze data
– Uncover problems or opportunities hidden in data relationships

– Form computer models based on their findings

– Use models to predict business behavior

• Runs in two modes

– Guided

– Automated

Predictive Analytics

• Employs mathematical and statistical algorithms, neural networks, artificial intelligence, and
other advanced modeling tools

• Create actionable predictive models based on available data

• Models are used in areas such as:

– Customer relationships, customer service, customer retention, fraud detection, targeted


marketing, and optimized pricing

Multidimensional Data Analysis Techniques

• Data are processed and viewed as part of a multidimensional structure

• Augmented by the following functions:

– Advanced data presentation functions

– Advanced data aggregation, consolidation, and classification functions

– Advanced computational functions

– Advanced data modeling functions

Advanced Database Support

• Advanced data access features include:

– Access to many different kinds of DBMSs, flat files, and internal and external data
sources

– Access to aggregated data warehouse data

– Advanced data navigation

– Rapid and consistent query response times


– Maps end-user requests to appropriate data source and to proper data access language

– Support for very large databases

OLAP Architecture

• Three main architectural components:

– Graphical user interface (GUI)

– Analytical processing logic

– Data-processing logic
Chapter 14: Big Data Analytics and NoSQL
 Big Data-Refers to data with volume, velocity, and variety that challenges traditional database
management.

 Volume: Quantity of data to be stored, requiring large storage capacities.

 Scaling Up vs. Scaling Out:

 Strategies to handle increased data volume by upgrading systems or distributing


workload across clusters.

 Velocity: Rate at which new data enters and must be processed.

 Stream Processing: Analyzing data as it enters the system to decide what to keep and discard.

 Feedback Loop Processing: Analyzing data to produce actionable results, focusing on both
inputs and outputs.

Variety: Refers to the diverse formats and structures of data in Big Data.

Data can be structured, unstructured, or semistructured.

Structured Data: Organized to fit a predefined data model, typical in relational databases.

Unstructured Data: Not organized to fit into a predefined data model, includes various formats
like text, images, and videos.

Semistructured Data: Combines elements of both structured and unstructured data.

Variability: Refers to changes in the meaning of data over time, relevant in sentiment analysis.

Veracity: Concerns the trustworthiness of data and the accuracy of information generated from
it.

Value: Relates to the meaningful insights derived from analyzed data that can impact
organizational behavior.

Visualization: Graphically presenting data to enhance understanding and gain insights.

Hadoop: A Java-based framework for distributed storage and processing of large datasets.

HDFS (Hadoop Distributed File System): Designed for high volume, write-once, read-many
access, streaming, and fault tolerance.

HDFS Assumptions: Large file sizes, write-once model, streaming access, fault tolerance through
replication.
HDFS Nodes: Client nodes, NameNode, and DataNodes manage data storage and retrieval.

MapReduce: Computing framework for processing large datasets across clusters.

o Divides tasks into smaller subtasks, processes them concurrently, and combines results.

Map Function: Sorts and filters data into key-value pairs, performed by a mapper program.

Reduce Function: Summarizes key-value pairs with the same key into a single result, performed
by a reducer program.

MapReduce Implementation: Pushes copies of the program to nodes containing data instead of
transferring data to a central node.

Hadoop Structure: Composed of a job tracker (JobTracker) and task trackers (TaskTrackers).

Hadoop Workflow: Client node submits MapReduce job to job tracker, which communicates
with name node to locate data nodes.

 Job tracker determines available task trackers, assigns tasks, and manages failures.

Hadoop Ecosystem: Collection of related applications around Hadoop for easier use and
accessibility.

MapReduce Simplification Applications:

 Tools like Hive and Pig simplify creating MapReduce jobs, especially for users without
extensive programming skills.

Hive:

 Data warehousing system on HDFS with HiveQL language for ad hoc queries, processed
into MapReduce jobs.

Batch Processing: Runs from start to finish without user interaction, often used for tasks
requiring extended time or system resources.

NoSQL:

 NoSQL refers to non-relational database technologies developed to address Big Data


challenges.
 The term "NoSQL" doesn't describe what the technologies are but what they aren't.
 Originally coined as a Twitter hashtag to discuss non-relational databases used by tech
giants like Google, Amazon, and Facebook
 Recent interpretations suggest "NoSQL" could mean "Not Only SQL."
 Focus should be on understanding the range of technologies rather than the name itself.

Types of NoSQL Databases:


Key-Value Databases:

 Simplest NoSQL model storing data as key-value pairs, organized into buckets.
 Only supports basic operations like get, store, and delete.

Document Databases:

 Store data as tagged documents in key-value pairs, allowing more structured data
storage.
 Schema-less but rely on tags for querying.

Column-Oriented Databases:

 Store data by column instead of by row, suitable for systems requiring queries over few
columns but many rows.
 Can refer to traditional relational databases or NoSQL databases like Cassandra and
HBase.
 In NoSQL, organizes data into key-value pairs where the value is a set of columns varying
by row.
 Supports super columns grouping logically related columns together.

Graph Databases:

 Graph databases store data based on graph theory, focusing on relationships between nodes.

 They excel in relationship-rich environments like social networks, logistics, and identity
management.

 Components: nodes (representing entities), edges (representing relationships), and properties


(attributes of nodes or edges).

 Queries in graph databases are called traversals, focusing on relationships between nodes.

 Graph databases perform best in centralized or lightly clustered environments.

NewSQL Databases:

 Aim to bridge the gap between traditional RDBMS and NoSQL.

 Provide ACID-compliant transactions over distributed infrastructure.

 Examples include ClusterixDB and NuoDB.

 Support SQL and distributed clusters like NoSQL, but use in-memory storage, impacting
durability.

 Limited scalability compared to NoSQL databases.


Data Analytics:

 Subset of business intelligence (BI) for extracting knowledge from data.

 Includes explanatory analytics (explaining past and present) and predictive analytics (forecasting
future).

 Explanatory analytics discovers relationships and patterns in existing data.

 Predictive analytics uses advanced statistical tools to predict future outcomes accurately.

 Predictive analytics models are used in customer relationships, fraud detection, marketing, etc.

Data Mining:

 Analyzing large datasets to uncover hidden trends, patterns, and relationships.

 Four phases: data preparation, analysis, knowledge acquisition, and prognosis.

 Uses algorithms like neural networks, decision trees, and regression to create predictive models.

 Helps in understanding customer behavior, improving product development, and more.


Chapter 15: Database Connectivity and
Web Technologies

You might also like