Database Unit 1
Database Unit 1
The collection of data usually referred to as the database which contains information
relevant to an enterprise
DBMS
A Database management system (DBMS) is a collection of interrelated data and a
set of programs to access those data
Goals of DBMS
Providing a way to store and retrieve database information that is both convenient and
efficient.
Databases provide better data organization, reduced redundancy, easier data access, and
improved data integrity compared to simple file storage.
Every organization needs a good database. Databases support internal business processes
and record communications with suppliers and customers. They include more specialized data,
including economic or technical models, as well as administrative data. Systems for digital
libraries, vacation reservations, and inventory are a few examples. Databases are important for
the reasons listed below:
✧ Efficient scaling: Database applications may scale to billions of data, making them crucial
for digital data storage.
✧ Data integrity: Data consistency can be maintained via database rules and conditions.
✧ Data security: Databases support privacy and compliance requirements associated with any
data.
Data are raw facts and figures without context, while information is processed data that is
meaningful and useful for decision-making.
1. Data
Data refers to raw, unprocessed facts and figures without context or meaning. Data can be in
the form of numbers, text, images, or sounds, but on its own, it doesn’t provide any clear
understanding.
Examples of Data:
✧ A list of numbers: 30,35,77,70
✧ A set of names: sakshi, yogi, ruthvik, arun
✧ A series of dates: 2023-12-15, 2023-12-16, 2023-12-17
✧ Individual sensor readings (e.g., temperature in degrees or heartbeat in beats per minute).
Types of Data Stored in a Database
1. Structured Data
Data that is organized into predefined formats, such as rows and columns in tables.
2. Unstructured Data
Data that doesn’t fit neatly into tables.
Examples include images, videos, documents, and social media posts.
3. Semi-Structured Data
Data that doesn’t fit fully into a table structure but contains tags or markers to separate
elements, such as XML files or JSON.
2. Information
Information is data that has been processed or organized in a way that it is meaningful and
useful to the recipient. Information provides context, relevance, and purpose to data, enabling
people to make decisions, draw conclusions, or take actions.
Examples:
A database is a structured collection of data that is stored and managed in a way that allows
efficient retrieval, updating, and manipulation.
1. Database:
A database is a structured collection of data that is stored and managed in a way that
allows efficient retrieval, updating, and manipulation.
2. DBMS:
Database Management System (DBMS) is software that provides a systematic and
efficient way to store, retrieve, and manage data in databases.
3. Data Model:
A framework that defines how data is structured, stored, and related (e.g., relational,
hierarchical).
4. Schema:
The logical structure or blueprint of a database, defining tables, fields, and relationships.
5. Instance:
The actual data stored in the database at a particular moment.
6. Data Independence:
The ability to change the schema at one level without affecting other levels. Includes:
o a. Physical Data Independence
o b. Logical Data Independence
7. Keys:
Attributes that uniquely identify records, e.g., primary key, foreign key.
8. Normalization:
Process of organizing data to reduce redundancy and improve data integrity.
9. Query Language:
Language used to interact with the database, such as SQL (Structured Query Language).
10.Transaction:
A unit of work performed on the database ensuring ACID properties (Atomicity,
Consistency, Isolation, Durability).
11.Concurrency Control:
Mechanisms to manage simultaneous data access without conflicts.
12.Backup and Recovery:
Methods to safeguard data and restore it after failures.
13. Data Integrity:
Ensuring accuracy and correctness of data using constraints.
14.Views:
Virtual tables derived from base tables to provide customized data access.
15.Security:
Protecting data from unauthorized access through authentication and authorization.
Hierarchical databases
Network databases
Object-oriented databases
Relational databases
NoSQL databases
1. Hierarchical Databases:
Just as in any hierarchy, this database follows the progression of data being categorized in
ranks or levels, wherein data is categorized based on a common point of linkage. As a result,
two entities of data will be lower in rank and the commonality would assume a higher rank.
2. Network Databases:
A network database is a hierarchical database, but with a major tweak. The child records
are given the freedom to associate with multiple parent records. As a result, a network or net of
database files linked with multiple threads is observed.
3. Object-Oriented Databases:
Those familiar with the Object-Oriented Programming Paradigm would be able to relate to
this model of databases easily. Information stored in a database is capable of being represented
as an object which responds as an instance of the database model. Therefore, the object can
be referenced and called without any difficulty. As a result, the workload on the database is
substantially reduced.
4. Relational Databases:
Considered the most mature of all databases, these databases lead in the production line
along with their management systems. In this database, every piece of information has a
relationship with every other piece of information. This is on account of every data value in the
database having a unique identity in the form of a record.
5. NoSQL Databases:
Advantages of NoSQL –
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
Disadvantages of NoSQL –
DBMS helps in efficient organization of data in a database, which has the following
advantages over a typical file system:
database management system (DBMS) is a collection of programs that manages the database
structure and controls access to the data stored in the database.
The number of users determines whether the database is classified as single user or multiuser.
A single-user database supports only one user at a time. In other words, if user A is using the
database, users B and C must wait until user A is done. A single-user database that runs on a
personal computer is called a desktop database.
Multiuser database supports multiple users at the same time. When the multiuser database
supports a relatively small number of users (usually fewer than 50) or a specific department
within an organization, it is called a workgroup database.
When the database is used by the entire organization and supports many users (more than 50,
usually hundreds) across many departments, the database is known as an enterprise database.
Location might also be used to classify the database. For example, a database that supports data
located at a single site is called a centralized database.
A database that supports data distributed across several different sites is called a distributed
database.
A cloud database is a database that is created and maintained using cloud data services, such
as Microsoft Azure or Amazon’s AWS
1. general-purpose databases
2. discipline-specific databases
General-purpose databases
Discipline-specific databases contain data focused on specific subject areas. The data in this
type of database are used mainly for academic or research purposes within a small set of
disciplines. Examples of discipline-specific databases include medical databases that store
confidential medical history data.
A database that is designed primarily to support a company’s day-to-day operations is
classified as an operational database, also known as an online transaction processing
(OLTP), transactional, or production database.
an analytical database focuses primarily on storing historical data and business metrics used
exclusively for tactical or strategic decision making.
The data warehouse is a specialized database that stores data in a format optimized for
decision support. The data warehouse contains historical data obtained from the operational
databases as well as data from other external sources
Online analytical processing (OLAP) is a set of tools that work together to provide an
advanced data analysis environment for retrieving, processing, and modeling data from the
data warehouse.
Extensible Markup Language (XML) is a special language used to represent and manipulate
data elements in a textual format.
An XML database supports the storage and management of semi structured XML data.
1.4 FILE SYSTEMS
A file system is a way of storing and organizing files on storage devices, where data is stored in separate files
without any structured relationship between them.
A manual file system is a way of organizing and storing data that does not involve the use of automated
tools or software. The file system can be saved to a local or external hard disk, flash drive, or other storage
device.
Examples:
Filing Cabinets
Address Books
Yellow Pages
Telephone Directories
Diaries
Guest Lists
Portfolios, etc.
Advantages:
Disadvantages:
Data and information can be stored and arranged on a computer using computerized file systems. They offer a
method for classifying, searching, and retrieving data, which simplifies handling large volumes of data.
There are several types of Computerized file systems, including:
Hierarchical File System (HFS): Apple’s Macintosh operating system uses a file system called HFS.
Each directory can hold additional subdirectories and files, and it arranges files and directories in a
structure like a tree.
New Technology File System (NTFS): Microsoft Windows operating systems use the NTFS file
system. It offers advanced functions like encryption, compression, and permissions for files and
folders.
Extended File System (EXT): Many Linux-based operating systems employ the EXT file system type.
Because it is a journaling file system, it keeps track of all file system modifications, enhancing stability
and data integrity.
Data – Raw facts, such as a telephone number, a birth date, a customer name, and a year-to-date (YTD)
sales value.
Field – A character or group of characters (alphabetic or numeric) that has a specific meaning. A field
is used to define and store data.
Record – A logically connected set of one or more fields that describes a person, place, or thing. For
example, the fields that constitute a record for a customer might consist of the customer’s name,
address, phone number, date of birth, credit limit, and unpaid balance.
File – A collection of related records. For example, a file might contain data about the students
currently enrolled at Gigantic University.
Structural dependence: Changing the database schema requires changes to all access programs.
Structural independence: Changing the database schema does not affect data access.
Data dependence: A data condition in which data representation and manipulation are dependent on
the physical data characteristics.
Data independence: A condition in which data access is unaffected by changes in the physical data
storage characteristics.
o Physical Data format: Modify physical schema without affecting the schema or logical data.
o Logical Data format: Modify logical schema without affecting the programs.
The term data redundancy describes the unnecessary duplication of information in a database. This usually
happens when the same piece of data is kept in several tables or locations.
Data anomalies
Lower query performance
Raised storage costs
Difficulty maintaining consistency
Poor database design: When tables are not normalized properly, the same data can end up in multiple
tables.
Lack of proper relationship between tables: For example, storing customer information in multiple
places (orders, payments, and customer profiles).
Imagine a database that stores information about customers, orders, and products.
In this example, the customer information (Name, Address) is repeated for each order they place. This is a
classic case of data redundancy, where Akshay’s name and address appear twice, and Banu’s name and
address appear twice.
Note:
Data integrity refers to the accuracy and consistency of data. In other words, data integrity means that:
Data anomalies refer to inconsistencies, inaccuracies, or unexpected results that occur when performing
operations like insert, update, or delete on a database. Data anomalies are often caused by redundant data or
poor database design.
1. Insert Anomaly
2. Update Anomaly
3. Delete Anomaly
1. Insert Anomaly
An insert anomaly occurs when we are unable to add data to the database due to how tables are designed.
This often happens when the database schema does not allow certain records unless other data is also present.
In the unnormalized table below, adding a new customer who has not placed any order would be problematic,
as the customer's data would still need to be inserted alongside an empty order:
If a new customer, Chandru, is added but has not yet placed any order, we would have to insert dummy values into
the ProductName and OrderDate columns, which is problematic and results in insert anomalies
Solution:
Normalization eliminates this problem by ensuring that only necessary data is entered in each table.
For example, you can add a Customers table and an Orders table. If Chandru has no orders yet, his record can
be added in the Customers table without needing to enter data in the Orders table.
2. Update Anomaly
An update anomaly occurs when data is not updated consistently in all places where it appears.
This typically happens in situations where data is redundant.
If Akshay moves to a new address and we only update one of the records, the other record will still have his
old address, leading to an inconsistent state in the database.
Solution:
To resolve this, the database should be normalized so that each customer’s information is stored only once.
Customers Table:
Orders Table:
1 1 Laptop 2024-12-15
2 1 Smartphone 2024-12-15
Updating Akshay’s address in the Customers table ensures that all of Akshay’s orders reflect the correct
address.
3. Delete Anomaly
A delete anomaly occurs when deleting a piece of data inadvertently leads to the loss of other valuable
data.
If we delete Akshay’s Laptop order (OrderID 1), we also delete Akshay’s address and name because the
customer’s information is tied to the order in the same table.
In this case, deleting one order could remove valuable information about Akshay.
Solution:
Normalization prevents this problem by ensuring that customer data and order data are stored in separate
tables.
Customers Table:
CustomerID Name Address
Orders Table:
OrderID CustomerID ProductName OrderDate
1 1 Laptop 2024-12-15
2 1 Smartphone 2024-12-15
Now, deleting Akshay’s Laptop order (OrderID 1) does not affect his name and address in the Customers
table.
1.6 DATABASE SYSTEMS
1.6.1 The Database System Environment
A database system is a grouping of elements that specify and control how data is gathered, stored, managed,
and used in a database environment.
1. Hardware
Includes:
Computers (PCs, tablets, workstations, servers, and supercomputers),
Storage devices, printers, network devices (hubs, switches, routers, fiber optics),
and other devices (automated teller machines, ID readers, etc.)
All are considered hardware.
2. Software
Three different types of software are required for a database system to function properly.
Even though DBMS software is the most recognized, the full list includes:
Examples:
Microsoft Windows, Linux, Mac OS, UNIX, and MVS
DBMS Software
Examples:
Microsoft’s SQL Server, Oracle Corporation’s Oracle, Oracle’s MySQL, and IBM’s DB2.
The most popular way to access data in a database and create reports, tabulations, and other
information for decision-making is through application programs.
Utilities
Utilities are software tools used to help manage the database system’s computer components.
For example:
Major DBMS vendors now provide graphical user interfaces (GUIs) to:
3. People
1. System Administrators
o Manage the general operations of the database system.
2. Database Administrators (DBAs)
o Manage the DBMS
o Ensure the database operates correctly
o Work with database designers to create the database structure
o Often referred to as database architects
3. Database Designers (mentioned in the DBAs section)
o Responsible for designing the structure of the database.
4. System Analysts and Programmers
o Create and carry out application programs
o Design and develop:
Procedures
Reports
Data-entry screens used by end users
5. End Users (mentioned indirectly in context)
o Use applications and interfaces developed by analysts/programmers to interact with the database
End Users
End users are individuals who use application programs to carry out the day-to-day operations of
the organization.
Examples include:
o Managers
o Directors
o Supervisors
o Sales clerks
4. Procedures
Procedures are the guidelines and directives that control how the database system is designed and
used.
While sometimes overlooked, procedures are:
o Essential to system functionality
o Enforce business standards
o Ensure systematic operations
They help:
5. Data
Integrity
Consistency
...of data through several core functions, which are usually transparent to end users:
✧ Security Management
Sets of rules and tools for enabling software applications to communicate with one another
✧ Database Communication Interfaces
Implementing a database system (vs. a file system) allows for the application of rigorous policies and
guidelines.
Focus shifts from:
o Programming tasks (in file systems)
o To resource management and software administration in database systems
✔️Advantages
⛔ Increased costs
⛔ Management complexity
⛔ Maintaining currency
⛔ Vendor dependence
⛔ Frequent upgrade/replacement cycles
✔️Characteristics
Data models are essential for effective database design and implementation. Their importance is highlighted in
the following ways:
1. Entities
2. Attributes
3. Relationships
4. Constraints
These components form the foundation of data modeling and help define how data is structured and connected
within a database.
1.9.1 Entities
Definition: An entity is a key concept representing a distinct object, concept, or thing that exists in the
domain being modeled.
Example (E-Commerce System):
o Customer: Represents a customer in the system.
o Product: Represents a product available for purchase.
o Order: Represents a customer order.
1.9.2 Attributes
1.9.3 Relationships
Definition: Relationships in data models define how two or more entities are connected.
Example: A Customer can place multiple Orders → This is a 1:N (One-to-Many) relationship.
Types of Relationships:
1. One-to-One (1:1):
o One entity is related to only one other entity.
o Example: One person has one passport.
2. One-to-Many (1:N):
o One entity is related to many other entities.
o Example: One department has many employees.
1.9.4 Constraints
Definition: A brief, simple, and unambiguous description that defines a policy, procedure, or idea
within an organization.
Examples:
Importance:
Ensures:
o Data integrity
o Consistency
o Compliance with organizational processes
Helps the database reflect real-world operations accurately.
Objective: To ensure the database properly reflects business logic, restrictions, and relationships.
Entity Names:
Attribute Names:
Relationship Names:
Avoid Abbreviations:
Consistency:
Follow a consistent pattern throughout the database schema. If you start using underscores (e.g., First_Name),
continue using them throughout your database.
1. Understanding the physical properties of data storage is necessary for complex implementation.
2. Navigational systems require an understanding of the hierarchical path and result in complex
application development, management, and use.
3. All application programs must adapt to structural changes.
4. There are restrictions on implementation (no M:N or multiparent relationships).
5. The DBMS lacks a language for data definition and data manipulation.
6. Standards are lacking.
An early data model that represented data as a collection of record types in 1:M relationships.
The network model allows a record to have more than one parent. It is a graph-like structure.
The schema is the conceptual organization of the entire database as viewed by the database
administrator.
The subschema defines the portion of the database “seen” by the application programs that actually
produce the desired information from the data within the database.
A data manipulation language (DML) defines the environment in which data can be managed and is
used to work with the data in the database.
Advantages
Disadvantages
1. Understanding the physical properties of data storage is necessary for complex implementation.
2. Navigational systems require an understanding of the hierarchical path and result in complex
application development, management, and use.
3. All application programs must adapt to structural changes.
4. There are restrictions on implementation (no M:N or multiparent relationships).
5. The DBMS lacks a language for data definition and data manipulation.
6. Standards are lacking.
1.11.2 Network Models
An early data model that represented data as a collection of record types in 1:M relationships.
The network model allows a record to have more than one parent. It is a graph-like structure.
The schema is the conceptual organization of the entire database as viewed by the database
administrator.
The subschema defines the portion of the database “seen” by the application programs that actually
produce the desired information from the data within the database.
A data manipulation language (DML) defines the environment in which data can be managed and
is used to work with the data in the database.
A schema data definition language (DDL) enables the database administrator to define the schema
components.
Example: CODASYL DBTG model.
Advantages
Disadvantages
1. The system is still a navigational system, but its efficiency is limited by its complexity.
2. Complex application development, management, and implementation result from navigational systems.
3. All application programs must be modified in order to implement structural change
1.11.3 Relational Models
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and
represents data as independent relations.
Each relation (table) is conceptually represented as a two-dimensional structure of intersecting rows and
columns.
The relations are related to each other through the sharing of common entity characteristics (values in
columns).
table (relation):
A logical construct perceived to be a two-dimensional structure composed of intersecting rows (entities) and
columns (attributes) that represents an entity set in the relational model.
tuple:
In the relational model, a table row.
Advantages