Everything About DBMS Database Management System
Everything About DBMS Database Management System
What is DBMS?
● What is Data?
○ A collection of information obtained through observations, measurements, study, or analysis is referred to
as data. It could include information such as facts, numbers, figures, names, or even basic descriptions of
objects.
○ Data can be numerical, text-based, or images and is used to analyze patterns and trends, make decisions,
and develop strategies.
○ A data set is an ordered collection of data.
● What is Database?
○ A database is an organized collection of structured information, or data, typically stored electronically in
a computer system.
○ A database management system (DBMS) is the software that interacts with end users, applications, and
the database itself to capture and analyze the data.
● Types of Data?
○ Static data is data that does not often change, such as a list of countries or products.
○ Dynamic data is data that frequently changes, such as a list of customers or a list of orders.
● Database Management System was first introduced in the year 1960 by American computer scientist Charles W.
Bachman.
○ In 1960 he joined General Electric, where by 1963 he developed the Integrated Data Store (IDS).
○ Examples of open source DBMS - MySQL, PostgreSQL, MariaDB, SQLite, MongoDB
1
What is RDBMS?
● RDBMS stands for Relational Database Management System.
● It is an information management system that is oriented on a data model. Here all the information is properly
stored as tables.
● RDBMS e.g. – Microsoft SQL Server, Oracle, MySQL, MariaDB, and SQLite.
● Edward Frank Codd invented the relational database management system while working for IBM.
● In a relational database, each row in the table is a record with a unique ID called the key.
○ Keys play an important role in the relational database. Keys are used to identify a row (tuple) in a relation
(table).
○ It is also used to establish and identify relationships between tables.
○ e.g.
2
○ e.g.
● Query by Example (QBE) is a database query language for relational databases. It was devised by Moshé M.
Zloof at IBM Research during the mid-1970s, in parallel to the development of SQL.
○ It is the first graphical query language, using visual tables where the user would enter commands
● Types of Keys
○ Primary Key
■ The column with the primary key in a table that uniquely identifies every row in that table.
■ The Primary Key can’t be a duplicate meaning the same value can’t appear more than once in the
table. i.e., the column with the primary key does not have duplicate values.
■ A table cannot have more than one primary key and it can not have null value.
■ In the example, Aadhaar number and roll number can be made primary keys
3
○ Candidate Key
■ Sometimes in a table, there are a group of columns having unique values. All such columns have
a candidate key and thus the primary key is a subset of the candidate key.
■ Every table must have at least a single candidate key. A table can have multiple candidate keys
but only a single primary key.
■ For example: In the following example, the Aadhaar number & Roll Number has the candidate
key meaning records of these columns do not have any duplicate value and it can be used to
identify data of the table. And, any one of the two can be taken as the primary key. It may contain
null values.
○ Alternate Key
■ All columns having a candidate key except the primary key have an alternate key.
■ For example: In the above example, the columns — Aadhaar number has alternate key,
considering Roll number has the primary key and vice versa.
○ Super Key
■ Might have a single key attribute or a group of multiple keys that can uniquely identify tuples in
a table.
■ Example: (Roll number) or (Aadhaar number) can have various unnecessary attributes like: (Roll
number, Name), (Roll number, Location), (Roll number, Stream), (Roll number, Aadhaar
number, name, location, stream) etc.
4
■ It consists of redundant attributes that might not be important for identifying tuples.
■ Super key is a superset of a candidate key. It may contain null values.
○ Compound Key
■ Compound key has two or more attributes that allow you to uniquely recognize a specific record.
It is possible that each column may not be unique by itself within the database.
■ However, when combined with the other column or columns the combination of composite keys
becomes unique. The purpose of the compound key in the database is to uniquely identify each
record in the table.
■ For example: In the example, name and stream can’t be a primary key as it does not uniquely
identify a record. However, a compound key of name and stream could be used as it uniquely
identifies each record.
○ Foreign Key
■ Foreign Key is the column (attribute) of a table that is used to uniquely identify the rows of
another table. Thus, when an attribute of one table acts as a Primary Key for another table, it is
termed a Foreign Key.
■ Foreign Key helps us in establishing relationships with other tables. Foreign Key points to the
Primary Key of another table
■ Example: Roll No of Hostel Table is the Primary Key of the Student Table
● Student Table
5
● Hostel Table
● This example showcases that Hostel will be allocated to those students who are studying
in that college. Reference of the Roll No is taken to verify the availability in that college.
○ Natural Key
■ A natural key is a key which aims to uniquely identify each record and data is somehow related
to real-world applications.
■ For example, Aadhaar Card Number column has a natural key as it can be uniquely identified as
well as has some correlation with the real-world.
6
Non-Relational Database
● A non-relational database is any database that does not use the tabular schema of rows and columns like in
relational databases. Rather, its storage model is optimized for the type of data it’s storing.
● Non-relational databases are also known as NoSQL databases which stands for “Not Only SQL.” Where
relational databases only use SQL, non-relational databases can use other types of query language.
● NoSQL or non-relational databases examples:
○ MongoDB, Apache Cassandra, Redis, Couchbase and Apache HBase.
● In a non-relational database, one piece of stored data might have different fields or attributes from the piece of
data next to it in the same database.
● Non-relational databases are really helpful when there are a lot of unknowns about exactly what data you need to
store, or when there is a large volume of data that might hold different attributes but that you still need to
compare side-by-side.
Normalization in DBMS
● Normalization in DBMS design method avoids data duplication and removes undesired traits like Insertion,
Update, and Deletion Anomalies.
● SQL normalization serves the dual purpose of removing unnecessary (repetitive) data and ensuring logical data
storage.
7
ACID (Atomicity, Consistency, Isolation, Durability)
● In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties of database
transactions intended to guarantee data validity despite errors, power failures, and other mishaps.
● In the context of databases, a sequence of database operations that satisfies the ACID properties (which can be
perceived as a single logical operation on the data) is called a transaction.
● For example, a transfer of funds from one bank account to another, even involving multiple changes such as
debiting one account and crediting another, is a single transaction.
● In 1983, Andreas Reuter and Theo Härder coined the acronym ACID
● Atomicity
○ Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely
or fails completely: if any of the statements constituting a transaction fails to complete, the entire
transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each
and every situation, including power failures, errors, and crashes.
● Consistency
○ Data is in a consistent state when a transaction starts and when it ends. For example, in an application
that transfers funds from one account to another, the consistency property ensures that the total value of
funds in both the accounts is the same at the start and end of each transaction.
● Isolation
○ This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of the database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that particular
change in that transaction is written to memory or has been committed.
● Durability
○ Durability ensures that once a transaction is committed, its changes are permanent and will survive any
subsequent system failures. The transaction’s changes are saved to the database permanently, and even if
the system crashes, the changes remain intact and can be recovered.
8
Network Model
● This model was introduced by CODASYL (Conference on Data System Languages) in 1969.
● The data elements (nodes) are connected through links. But unlike the hierarchical model, links not only connect
similar nodes but also connect two records. In a way, the records now act as the nodes. Thus, the nodes are linked
to other nodes without any hierarchy. Because of this, any node can be accessed from any one of the many paths.
This forms a graph-like structure for the network model. Since the network model follows a graph-like structure,
the possibility of a child node having many parent nodes is high. So technically this model symbolizes a
many-to-many relationship.
9
● Tree-like vs Graph-like Structure
DBMS Architecture
● In one Tier Architecture the database is directly available to the user, the user can directly sit on the DBMS and
use it, that is, the client, server, and Database are all present on the same machine.
● Two-Tier Architecture
○ Here the end-user can establish a direct connection with the database using Application programming
interface (APIs). For this, ODBC (Open Database Connectivity) drivers are used. These drivers create an
interface between the database (present on the server-side) and application program (present on the
client-side). Once the client/server connection is secured, DBMS functionalities are ready to use. The
user can now operate, manipulate, or optimize data.
● Three-Tier Architecture
○ In this type of architecture, another layer is sandwiched between the client-side and the server-side. This
intermediate layer is called the Application Layer. This layer is responsible for storing connectivity
software and holds data transfer limitations. This layer makes sure that the right amount of data is being
processed and transferred from the database to the client-side.
Components of DBMS
● Hardware - Hardware refers to the physical parts of the computer and It also includes various storage devices like
hard disks and input-output devices like monitors, printers etc. The hardware is the actual computer system used
for keeping and accessing the database.
● Software - Software is a collection or set of programs or instructions that tell a computer what to do. The software
comprises the entire set of programs, procedures, and routines associated with the operation of a computer
system.
● Data - Collection of facts stored in the database.
● Procedures - Procedures refer to general instructions to use a database management system. This includes
procedures to set up and install a DBMS, To login and logout of DBMS software, manage databases, take
backups, generate reports etc.
● Database Access Language - Database Access Language is a simple language that allows users to write
commands to perform the desired operations on the data that is stored in the database.
○ Database Access Language is a language used to write commands to access, insert, and delete data stored
in a database.
10
Structured Query Language SQL (DDL, DQL, DML, DCL and TCL Commands)
A. Data Definition Language (DDL)
● By using DDL we can change the structure of our tables. Since all the commands of DDL are auto
committed it permanently saves all the changes in the database.
B. Data Manipulation Language (DML)
● By using DML we can modify, retrieve, delete and update the data in our database.
C. Transaction Control Language (TCL)
● TCL commands are used to maintain consistency of our databases and for management of transactions
made by DML commands. We can only use TCL commands with DML commands like INSERT,
DELETE and UPDATE
D. Data Control Language (DCL)
● By using DCL we can permit a user to access, modify or work on the different privileges in order to
control the database.
E. Data Query Language (DQL)
● By using DQL we can fetch data from the database.
Database Schema
● A database schema is the logical representation of a database, which shows how the data is stored logically in the
entire database.
● A database schema is a blueprint or architecture of how our data will look.
● The data is physically stored in files that may be in unstructured form, but to retrieve it and use it, we need to put
it in a structured form. To do this, a database schema is used.
● Data definition language (DDL) creates the framework of the database by specifying the database schema, which
is the structure that represents the organization of data. Using the DDL statements, you can create the skeleton of
the database.
11
Metadata
● Metadata is 'data about data’.
● It's the context for the information you’re about to consume. Nearly everything technical has it; when you take a
picture on your smartphone, it stores the date and time the photo was taken, and some kind of generic name. If
you have location services installed, it can even add where the picture was taken to its internal clipboard.
● In a database, data is typically organized into a hierarchical structure, with several levels of organization. The
main levels of this hierarchy are:
○ Bit
■ A bit is the smallest unit of data in a computer, representing either 0 or 1.
○ Byte
■ A byte is a collection of 8 bits, which can represent a single character or number in a computer
system.
○ Field
■ A field is a specific piece of data within a record, such as a name, address, or phone number.
○ Record
■ A record is a collection of related fields, such as all the fields that make up a single customer’s
information.
○ File
■ a group of related records.
○ Database
■ is an integrated collection of logically related files.
12
ER Model
● ER model stands for an Entity-Relationship model. This model is used to define the data elements and
relationship for a specified system.
● Peter Chan introduced it in 1976.
● Let’s say that we are designing the database of a company.
○ Here an employee is an entity, the attributes will be like name, address, age etc
○ The address of the employee can be another entity with attributes like city, street name, pin code, etc and
there will be a relationship between them.
● Entity: An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles. For example, a box is a real-world object which can be described in terms of shape, size, and color.
Here, shape, size, and color are features of the box, the Entity.
● Attributes: The attributes are the characteristics that describe the entity of the database. For example, shape, size,
and color.
● Relationship: Relationship is the logical binding between the different entities which exist in the Database.
Suppose, two entities are Employee and Company. Then, the relationship would be ‘works in.’ It means the
employee works in the company.
● Weak Entity: A weak entity is dependent on a strong entity to ensure its existence. Unlike a strong entity, a weak
entity does not have any primary key. A weak entity is represented by a double rectangle. The relation between
one strong and one weak entity is represented by a double diamond.
● Strong Entity: A strong entity is not dependent on any other entity in the schema. A strong entity will always
have a primary key. Strong entities are represented by a single rectangle. The relationship of two strong entities is
represented by a single diamond.
● Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a student's
phone number is an atomic value of 10 digits.
● Composite attribute − Composite attributes are made of more than one simple attribute. For example, a student's
complete name may have a first name and last name.
● A multivalued attribute of an entity is an attribute that can have more than one value associated with the key of
the entity.
○ Example - Phone number of a student: Landline and mobile.
13