MCA S3 SQL for Data Science U1
MCA S3 SQL for Data Science U1
Unit-01
Introduction to RDBMS and SQL
Semester-03
Master of Computer Application 2
UNIT
Names of Sub-Units
Relational Database Management System (RDBMS), Structure of RDBMS, Table, Entity Relationship(ER)
Diagrams, Data Types, Database Languages, DBMS Examples
Overview
This unit discusses the concept of relational database management systems (RDBMS). Next, the unit
discusses the structure of RDBMS. Further, the unit discusses the concept of table, entities, attributes, and
relationships. The unit also discusses the data types and database languages. Towards the end,the unit
discusses the concept of DBMS examples.
Learning Objectives
3
Learning Outcomes
http://www.svecw.edu.in/docs/2020/IT-II-II-CO3.pdf
1.1 INTRODUCTION
A Relational Database Management Systems (RDBMS) is a set of applications and features that allowIT
professionals and others to develop, edit, manage, and interact with relational databases. It’s an
application that lets us build, remove, and update relational databases. In relational database, data is
stored and retrieved in the tabular format (form of rows and columns).
The RDBMS is the most widely used database system in businesses all over the world. It offers a stable
means of storing and retrieving massive amounts of data, as well as a balance of system performanceand
ease of use.
Some of the available RDBMSs include Oracle, MySQL, IBM DB2 and Microsoft Access.Some
major functions of an RDBMS are as follows:
Security: This is one of the most significant functions of a relational database management system.The
database access rules are set by security management. This function also limits the amount ofdata that
any user can see or write.
Accuracy: Using primary and foreign key concepts, RDBMS links together multiple table which makes
data non-repetitive. This eliminates the possibility of data duplication which means that the veracity of
RDBMS is high.
4
Flexible: Data may be updated in one place rather than having to make changes to many files.
Databases can be simply expanded to accommodate additional records, increasing scalability. It also
makes it easier to use SQL queries.
Users: Client-side architecture is supported by RDBMS, which allows numerous users to be stored
together. Allows enormous amounts of data to be stored and retrieved with ease.
Data Handling Made Simple: The relational architecture allows for faster data retrieval. Due to keys,
indexes, and normalisation principles, data redundancy or duplication is avoided. BecauseRDBMS is
built on ACID principles for data transactions, data consistency is ensured.
Integrity: The three limitations are enforced via data integrity. The primary key should be presentin a
table to ensure entity integrity.
Fault Tolerance: Database replication allows for concurrent access and aids in the recovery of the
system in the different situations, such as a power outage or an abrupt shutdown.
Consistency: For keeping data between application and database copies, the relational model’s data
consistency is the best in RDBMS.
Provides the ability to view the content of the database according to different views as required by
users
Manages large amounts of data related to the organisation more easily and efficiently
Provides the ability to access, update and share data
Ensures data security and integrity
Ensures reduction in data redundancy
Provides support for normalisation
Provides facilities for analysing data and producing reports accordingly
Provides compatibility with a number of third-party tools
5
Figure 1 shows the structure of RDBMS:
Users Programs
Database
Scheme Programs
DML
rocessor
Manager Control
Database Manager
Transaction
Scheduler Manager
Buffer Data
Manager Manager Manager
Data Files
Text
Figure 1: Structure of RDBMS
Query processor, storage manager, and disk storage are the three components that make up the
database system.
The description of the different components is as follows:
Query processor: This component converts end-user requests (queries) into instructions via an
application software. It also performs the user request that the DML compiler sends it.
The components of the query processor are as follows:
DML compiler: It converts DML assertions into machine language instructions so that they maybe
executed.
DDL interpreter: This programme converts DDL statements into a collection of tables that
include meta information (data about data).
Contained DML pre-compiler: It converts DML statements embedded in a programme into
procedural calls.
Query optimizer: This component performs the DML compiler’s instructions.
Storage manager: Storage manager is a programme that connects the data in the database to the
queries that are sent to it. Database Control System is another name for it. By imposing constraintsand
executing DCL commands, it ensures the database’s reliability and accuracy. It is in charge of the
database’s updating, storing, removing, and retrieving data.
6
It is made up of the following elements:
Authorization manager: This component enables role-based access control, i.e., it determines
whether or not a certain user is authorised to do the requested action.
Integrity manager: When the database is changed, it verifies the integrity constraints.
Data dictionary: It is used to store the information about the data that are stored in the database.
Thus, a data dictionary defines the actual database with its schema structure. It keeps track ofall
the tables and their constraints in case of a relational database. This repository of meta data
provides the following information of a database:
I. Provides a logical abstraction of the actual database by storing the information of its physical
structure
II. Stores the information about the number of tables and their constraints (that is, primary key,
composite key, foreign key, rules, and indices)
III. Provides description about the whole schemas, mapping, and the knowledge of application
program
IV. Stores the database user information in the data dictionary for reference to the application
program that checks the integrity constraint applied to any database tables for a particularuser
Each time an application program wants to fetch the database content, it checks the data
dictionary to retrieve information.
Indic: It is used to fetch the data from the database efficiently and uniquely. Indices store one or two
columns of a data table to provide a unique means to speed up the retrieval of data from the database.
It takes a storage area less than that required to store a table since the index of a tableonly contains
the key columns. There are two types of indices: clustered index and non-clustered index, which are
explained as follows:
Clustered index: Clustered index is generally implemented on the basis of the key field of a table.
It provides a sequential manner to fetch information from the database. It arranges the
internal files of a database table in a sequential manner to speed up the operation. Thistype of
indexing is proved to be very useful when a user wants to fetch the data in either sequential
manner or reverse manner. These indices are also 28 X Chapter 2 called primary indices
because in most of the real-life implementation, clustered indices are created uponprimary key
field for its uniqueness. We can have a few clustered indices because using one clustered index,
we can find a range of records in a sequential manner.
Non-clustered index: Non-clustered index is generally used on the records that do not
maintain any particular order. This type of indexing can be done by using a tree representation
based on any non-key column of the table. In case of a clustered index, it only stores some of the
key values since all the records are stored sequentially, and it is easy to find any desired
record. However, in case of secondary or non-clustered index, it is required to store all the
distinct values of the column on which indexing has been applied.
7
1.4 TABLE
A database table is a collection of columns (also known as fields or attributes) and rows (also known as
tuples or records) in which data is actually stored within a database. Two columns in a table cannot begiven
the same name. Besides, the data in each column must be of the same type. We can also say thateach column
has a specific category of information stored in it; for example, a table named Customershas a unique ID
stored in the CustomerID column for the customers, and the name of the companies is stored in the
CompanyName column. A database usually has some system tables or special tables with the help of which
SQL Server works with the database.
You can see the structure of a table named Customers in Figure 1:
The prefix applied to lookup tables should be the name of the table they relate to; for example,
products_types.
Attributes
Relationships
An ER diagram uses different shapes to represent each one of these elements. To represent an entity a
rectangular box is used, for a relationship a diamond is used, and for an attribute an oval is used. Weshall
now discuss each of these in detail.
8
1.5.1 Entities
Entities are easily recognisable and unrecognisable things (either living or non-living). They are anything that
needs to be reflected in the database within the company. They can be a tangible object, a truth about the
company, or an event that occurs in the world. A symbol/shape- rectangle is used to represent entities.
Furthermore, rectangles in the diagram contain the entity set they represent.
A place, person, object, event, or idea that saves data in a database is known as an entity. An attributeand
a unique key are required for each entity in the ER diagram. Every entity is made up of a collectionof
‘attributes’ that define it. Figure 2 shows the employee and project entities:
Employee
1.5.2 Attributes
An attribute can be defined as a property or descriptor of an entity. Examples of attributes of the entity,
employee are EmployeeID, Name, Address, and Phone No. An ER diagram uses an oval shape to representan
attribute, as shown in Figure 3:
Name
Employee
Derived Attribute: In an ER diagram, this seems to be an attribute that can be derived from the
entity type’s other attributes.
1.5.3 Relationships
A relationship defines a connection between two entities. A verb defines the name of a relationship;it
connects two parts of a sentence; consider the sentence - an employee works on a project. In this
example, works defines the relationship between the Employee and Project entities. An ER diagram usesa
diamond shape to represent the relationship between two entities, as shown in Figure 4:
9
Project ID Project Name
Many-to-many (m:m)
One-To-One Relationship
In this kind of relationship, one instance of a certain entity relates to only one instance of the other one.For
example, a person can possess only one passport and this is an example of one-to-one relationshipbetween
the entities, person and passport. An ER diagram to show the one-to-one relationship is shown in Figure 5:
Figure 5: Representing One-To-One Relationship
Person Has
One-To-Many Relationship
A one-to-many relationship means the case where an instance of one entity can relate to multiple
instances of another entity. For example, if a class has many students, the relationship can be representedwith
the help of an ER diagram as shown in Figure 6:
10
Many-To-Many Relationship
In a many-to-many relationship, multiple instances of one entity can relate to multiple instances of
another entity. Consider an example in which a customer can purchase more than one book; or if morethan
one customer can buy many copies of a book. This kind of relationship can be represented with the help of an
ER diagram as shown in Figure 7:
Int Stores integer values in the range from -2^31 (-2,147,483,648) to 2^31 – 1
(2,147,483,647).The int data type is also represented with the name, integer. The storage
size for this data type is 4 bytes.
Smallint Stores integer values in the range from -2^15 (-32768) to 2^15 (32767). The storage size for
this data type is 2 bytes.
Tinyint Stores integer values in the range from 0 to 255. The storage size for this data type is 1
byte.
Bigint Stores integer values in the range from -2^63 (-9,223,372,036,854,775,808) to 2^63 –
1(9,223,372,036,854,775,807). The storage size for this data type is 8 bytes.
decimal/numeric Stores fixed-point numbers, i.e., floating point numbers without rounding. Dependingon
the precision used, the storage size for this data type ranges from 5 to 17 bytes. Thedec
data type is also represented with the name, decimal.
11
Approximate numerics: Approximate numerics are the data types that store numbers with fractionalor
decimal units, such as 12.50 and 225.75. Table 2 lists data types belonging to the approximate
numeric category:
real Stores floating point values, similar to the float data type. The range of positive values
supported by this data type is approximately 1.18E-38 to 3.40E+38; and the range of negative
values is approximately -3.40E+38 to -1.18E-38. It can also store the zero value. The storage size for
this data type is 4 bytes (for precision < 25) and 8 bytes (for precision >= 25).
Date and Time: It provides various data types in which date and time values are stored. Table 3 liststhe
data types that fall in the date and time category:
small date time Stores date and time values, with date values in the range from January 1, 1900 to June6,
2079 and time values with an accuracy of 1 minute. The storage size for this data typeis 2
bytes.
Date Stores date values from January 1, 0001 to December 31, 9999, with an accuracy of 1
minute. The storage size for this data type is 3 bytes.
Time Stores time values with an accuracy of 100 nanoseconds. The storage size for this data
type is 5 bytes.
datetime2 Stores high-precision date and time values. The storage size for this data type varies
from 6 to 8 bytes.
date time offset Stores date and time values similar to the datetime2 data type, except that the
datetimeoffset data type allows you to specify time zone offsets.
Character strings: Character strings are data types that store character data which is non-Unicode.
Table 4 lists the data types belonging to the character strings category:
12
Unicode character strings: Unicode character strings are data types storing Unicode character
data. Table 5 lists the data types belonging to the Unicode character strings category:
Nvarchar Stores variable-length strings of Unicode characters. The nvarchar data type and the varchardata
type differ in the sense that the nvarchar data type uses 2 bytes of storage to store each
character, whereas the varchar data type uses 1 byte of storage to store each character. Justlike
in case of the nchar data type, the maximum number for characters in a column definedwith the
nvarchar data type is 4000.
Ntext Stores variable-length Unicode data with a maximum length of 2^30 - 1 (1,073,741,823)
characters.
Binary strings: Binary strings are the data types that store binary data streams. Table 6 lists data
types belonging to the binary strings category:
You have studied about some data types up till now. The SQL also supports some other data types that donot
fall in the data type categories that have already been mentioned. Following list shows some otherdata
types used in SQL:
varchar(max)/nvarchar(max)
varbinary(max)
uniqueidentifier
sql_variant
Hierarchyid
Timestamp
Xml
Geometry
Geography
13
such as table, reports and forms in the database. The SQL provides various commands for creating and
making modifications in the database objects.
The SQL commands can be categorized into different categories:
Data Definition Language (DDL): This language comprises SQL commands that are used for defining
database schema. The commands in DDL can also specify the structure of database objects presentin
database. Some DDL commands are as follows:
CREATE: This command is used in SQL to create database or tables in a database.
DROP: This command is used to delete different objects present in a database.
ALTER: This command is used to modify the structure of a database.
TRUNCATE: This command is used to delete all records from a table.
COMMENT: This command is used to add comments in the data dictionary.
RENAME: This command is used for renaming an object in the existing database.
Data Manipulation Language (DML): The commands in the data manipulation language are usedfor
manipulating data present in existing database. Some DML commands are as follows:
SELECT: This command is used for retrieving data from a database.
INSERT: This command is used for inserting data into a table.
UPDATE: This command is used for updating existing data within a table.
DELETE: This command is used for deleting records from a database table.
Data Control Language (DCL): The commands in this language are used to provide rights and
permissions to users for making operations in the database. Some DCL commands are as follows:
GRANT: This command is used for providing access privileges to database.
REVOKE: This command is used for withdrawing user’s access privileges.
Transaction Control Language (TCL): The commands in TCL mainly deal with the transactions thattake
place in the database. Some TCL commands are as follows:
COMMIT: This command is used for committing a transaction.
ROLLBACK: This command is used to rollback a transaction in case of error.
SAVEPOINT: This command is used for setting a savepoint within a transaction.
SET TRANSACTION: This command is used for specifying characteristics for a transaction.
MySQL: MySQL is a RDBMS that is free and open-source . A relational database’s task is to organise data
into one or more data tables wherein data types can be related to one another, and this helpsto allow
the data to be structured. SQL is a programming language that allows programmersto create,
change, and extract data from relational databases, as well as control user access. An RDBMS like
MySQL, in addition to relational databases and SQL, works with an operating system to build a relational
database in a computer’s storage system, organizes and controls users, enablesfor network access,
and makes database integrity testing and backup generation easier.
14
PostgreSQL: PostgreSQL is an open-source object-relational database system that has a good image
for stability, feature stability, and efficiency after more than 30 years of active development.
PostgreSQL is a high-performance, enterprise-class open-source relational database that allows both
SQL and JSON querying. It is a remarkable and reliable database management system withmore than
20 years of community work to thank for its high levels of resilience, integrity and veracity. Many web,
mobile, geospatial, and analytics applications use PostgreSQL as their primary data storage or data
warehouse.
Microsoft Access: Microsoft Access is a database application that allows you to store data in the form
of various objects, such as tables, reports, and forms. These objects allow you to access, manage, and
update the data easily and effectively. For example, by storing data in tables, wherethe data is actually
stored in the form of rows and columns, you can easily search a specific recordfrom a large collection of
data. Microsoft Access (MS Access) allows you to store simple data, such as name and address, as well as
complex data, such as pictures, sounds, and videos. With the growth in technology, the improvised versions
of Access are released from time to time, latest being MS Access 2019.
SQL Server: Microsoft SQL Server is a relational database management system that Microsoft has
created. It is a database server, which is a software package whose principal role is to store and
retrieve data as required by other software programmes, which may operate on the same computeror
on a networked computer (including the Internet). Microsoft SQL Server is available in at least adozen
distinct versions, each geared at a particular audience and suited to varied workloads ranging from
modest single-machine applications to huge Internet-facing systems with many concurrentusers.
FileMaker: Claris International, an Apple Inc. company, produces FileMaker, a cross-platform
relational database programme. It combines a database engine with a graphical user interface (GUI)
and security mechanisms to permit users to edit databases by moving new pieces into layouts, screens, or
forms. It’s compatible in desktop, server, iOS, and web-delivery modes.
Oracle DB: It is a RDBMS which was created in 1977 by Lawrence Ellison and other engineers. It is
among the most extensively used relational database engines which can be used for storing,
organising, and retrieving data by type while maintaining relationships between them.
The system is based on a relational database foundation that allows users (or an application frontend)
to access data objects directly using structured query language (SQL). Oracle is a relational database
design that is completely scalable, and it is frequently used by worldwide organisationsto handle and
analyse data across broad and local area networks. The Oracle database contains its own network
component that allows for cross-network connectivity.
dBASE: DBase is a Database Management System (DBMS) for microcomputers that runs on the
Windows operating system. DBase is unusual in that it enables the development of a wide range of
programs, such as middleware, Web apps running on Windows servers, and Windows rich client
applications, with ease. DBase is a relational database management system. It is a very excellent
debugger and a flexible third-generation language with non-procedural functionality.
Apache Cassandra: Cassandra is a distributed, wide-column store, NoSQL database management
system that can handle enormous volumes of data across multiple commodity servers while
maintaining high availability and avoiding single points of failure. Cassandra supports multi-
datacenter clusters, with asynchronous masterless replication allowing all clients to operate withlow
latency. Cassandra was created to integrate the distributed storage and replication techniques of
Amazon’s Dynamo with the data and storage engine concept of Google’s Bigtable.
15
Conclusion 1.9 CONCLUSION
A Relational Database Management Systems (RDBMS) is a set of applications and features that
allow IT professionals and others to develop, edit, manage, and interact with relational databases.
A Relational Database Management System (RDBMS) is based on the relational model as introducedby
E. F. Codd.
A Database Management Systems (DBMS) is a piece of software that allows you to access data
contained in a database
Query processor converts end-user requests (queries) into instructions via an application software.It
also performs the user request that the DML compiler sends it.
Storage manager is a programme that connects the data in the database to the queries that are
sent to it.
A database table is a collection of columns (also known as fields or attributes) and rows (also knownas
tuples or records) in which data is actually stored within a database.
An ER diagram is a diagram representing the inter-relationship among entities present in a
database.
Entities are easily recognisable and unrecognisable things (either living or non-living). They are
anything that needs to be reflected in the database within the company.
An attribute can be defined as a property or descriptor of an entity.
A relationship defines a connection between two entities. A verb defines the name of a relationship;it
connects two parts of a sentence; consider the sentence - an employee works on a project.
Any object’s data type is specified by the Data Type property. Each column, variable, and expressionhave
a corresponding data type.
Database languages are used to perform different types of operations on existing database.
Structured Query Language (SQL) is a database language that provides commands using which
changes can be done in the database.
PostgreSQL is an open-source object-relational database system that has a good image for stability,
feature stability, and efficiency after more than 30 years of active development.
Microsoft SQL Server is a database server, which is a software package whose principal role is tostore
and retrieve data as required by other software programmes, which may operate on the samecomputer
or on a networked computer (including the Internet).
1.10 GLOSSARY
16
1.11 SELF-ASSESSMENT QUESTIONS
PostgreSQL
1.12
17
1.13
https://www.journaldev.com/16774/sql-data-types
Discuss with your classmates about the RDBMS. Also, discuss the different types of data types and
database languages used in SQL.
18
19