Chapter 3 New
Chapter 3 New
Centralized Database
In a centralized database there is a single database file at one location
in the network.
Multiple users can access this single database via a computer network (LAN,
WAN, etc.)
This type of database is mainly used by institutions or organizations.
Classification of DBMS based on Number of
Databases
Distributed Database
Distributed database is basically a type of database which consists of
multiple databases that are connected with each other and are spread across
different physical locations
The communication b/n databases at different physical location is thus done
by a computer network.
Types of DDB system
Distributed database systems are classified into two types. These
are,
1. Homogeneous distributed database system
– In a homogeneous distributed database system, all sites have
identical database management system software, are aware of one
another, and agree to cooperate in processing users’ requests.
2. Heterogeneous distributed database system
– in a heterogeneous distributed database, different sites may use
different schemas, and different database-management system
software.
– The sites may not be aware of one another, and they may provide
only limited facilities for cooperation in transaction
processing.
– The differences in schemas are often a major problem for query
processing, while the divergence in software becomes a hindrance
for processing transactions that access multiple sites.
Relational Data Model
The relational data model first introduced in 1970 by a computer
scientist and mathematician named Dr. Edgar Frank Codd.
The Relational Database Management System (RDBMS) has become the
dominant data-processing software in use today.
In the relational model, all data is logically structured within
relations (tables).
Each relation has a name and is made up of named attributes (columns)
of data. Each tuple (row) contains one value per attribute.
A great strength of the relational model is this simple logical
structure.
Yet, behind this simple structure is a sound theoretical foundation
that is lacking in the first generation of DBMSs (the network and
hierarchical DBMSs).
Most modern database management systems like MS SQL Server, ORACLE,
My-SQL and etc. are based on RDBMS.
Terminologies in RDBMS
Relation
A relation is a table with columns and rows
The RDBMS database uses tables to store data.
A table is a collection of related data entries and contains rows
and columns to store data.
Each table represents some real-world objects such as person, place,
or event about which information is collected.
A relation has the following properties:
– Each relation has a unique name by which it is identified in the
database.
– Relation does not contain duplicate tuples.
– The tuples of a relation have no specific order.
– All attributes in a relation are atomic, i.e., each cell of a
relation contains exactly one value.
Terminologies in RDBMS
Attribute
An attribute is a named column of a relation.
A column is a vertical entity in the table which contains all information
associated with a specific field in a table.
Properties of an Attribute:
– Every attribute of a relation must have a name.
– Null values are permitted for the attributes.
– Default values can be specified for an attribute automatically inserted
if no other value is specified for an attribute.
– Attributes that uniquely identify each tuple of a relation are the
primary key.
Terminologies in RDBMS
Tuple
A tuple is a row of a relation.
The elements of a relation are the rows or tuples in the table
Tuples can appear in any order and the relation will still be the same
relation, and therefore convey the same meaning.
Properties of a row:
No two tuples are identical to each other in all their entries.
All tuples of the relation have the same format and the same number of entries.
The order of the tuple is irrelevant. They are identified by their content, not by
their position.
Degree
The degree of a relation is the number of attributes it contains.
A relation with only one attribute would have degree one and be called a
unary relation or one-tuple. A relation with two attributes is called binary,
one with three attributes is called ternary, and after that the term n-ary is
usually used.
Terminologies in RDBMS
Cardinality
The cardinality of a relation is the number of tuples it contains.
the number of tuples is called the cardinality of the relation and this
changes as tuples are added or deleted
data item/Cells
The smallest unit of data in the table is the individual data item. It
is stored at the intersection of tuples and attributes.
Relational Database
Relational Database is a collection of normalized relations with
distinct relation names.
Terminologies in RDBMS
The terminology for the relational model can be quite confusing
The following table summarizes the different terms used in
relational model.
Terminologies in RDBMS
The following figure shows examples of terminologies used in
RDBMS
Relational Keys
Keys are very important part of Relational database model. They are used
to uniquely identify any record or row of data inside a table and
establish relationships between tables.
A Key can be a single attribute or a group of attributes. When a key
consists of more than one attribute, we call it a composite key.
There are two main types of database keys:
1. Primary Key
Since a relation has no duplicate tuples, it is always possible to identify each row
uniquely. This means that a relation always has a primary key.
key that is selected to identify tuples uniquely within the relation.
In the worst case, the entire set of attributes could serve as the primary key, but
usually some smaller subset is sufficient to distinguish the tuples.
2. Foreign Key
Foreign keys are used to define relationships between tables and to enforce
referential integrity in a database by ensuring that each value in the foreign key
column is actually a valid entry in the primary key column of another table.
When an attribute appears in more than one relation, its appearance usually
represents a relationship between tuples of the two relations.
Relational Keys
Building Blocks of Relational Data Model
Now, in the table above, the tuple with Student ID = 4 and name
= “Ahmed Ali” has marks = A.
This is not an integer or float value. So, the domain
constraint is violated here.
Constraints
Entity Integrity Constraint
The entity integrity constraint states that primary key value can't be null.
This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify
those rows.
A table can contain a null value other than the primary key field.
Example: Consider the following relation
Here, the second tuple has null value stored in the Student_ID attribute
which is a Primary Key.
Hence, the Entity Integrity constraint is violated here.
Constraints
Key Constraint
This is the same as the Entity Integrity constraint.
The key constraint states that the Primary Key attributes should be unique
and must not contain null values.
However, Entity Integrity Constraint states that any attribute of a Primary
key must not be null.
Example: Consider the following relation
60
Data Definition Language (DDL)
Data Definition Language commands are used to creating and altering the
database and database objects in the relational database management system.
• Data Definition Language (DDL) is a set of special commands that allows us to
define and modify the structure and the metadata of the database.
• These commands can be used to create, modify, and delete the database
structures such as schema, tables, indexes, etc.
• these commands are normally not used by an end-user (someone who is accessing
the database via an application).
The most used DDL commands are
A. Create Command:- used to create database and database objects like
a table, index, view,…
B. Alter Command:- used to modify the definition of the existing
objects
C. Drop Command:- used to remove existing database and other database
objects
D. Truncate Table command:- used to remove all the data from table.
• Note: the purpose of drop table and truncate table command is different. Drop table
command will remove the structure of a table along with its content but truncate table
command remove data inside a table but not a table it self. 61
Data Manipulation Language
• Data Manipulation Language (DML) is a set of special commands
that allows us to access and manipulate data stored in
existing database objects.
• These commands are used to perform certain operations such as
insertion, deletion, updating, and retrieval of the data from
the database.
• The most widely used DML includes
A. Select Command:- used to retrieve one or more rows from table
B. Insert Command:- used to insert one or more rows into a table
C. Update Command:- used to change/ update existing data in the
database table.
D. Delete Command:- used to remove one or more rows from a table
or view.
62
Data Control Language
(DCL)
Data control language commands are used to control the
access of data stored in database and provide data
security.
The most widely used DCL includes
A. Grant Command:- used to give access right/ privilege on
database objects
B. Revoke Command:- used to cancel/ take away privileges
that were given on database objects using grant command
63
Transaction Control Language
(TCL)
Transaction Control Language commands are used to manage
changes made by DML commands.
It allows statements to be grouped together into one
logical transaction.
The most widely used TCL,
A. Begin_transaction command:- marks the starting point of a
transaction.
B. Commit_transaction command:- marks the end of the
successful transaction
C. Rollback_transaction command:- rollbacks transaction to the
beginning of the transaction includes
64
Introduction to Big Data
Big data refers to datasets that are too large or complex for
traditional data-processing software to handle.
Big data refers to extremely large and diverse collections of
structured, unstructured, and semi-structured data that
continues to grow exponentially over time.
These datasets are so huge and complex in volume, velocity,
and variety, that traditional data management systems cannot
store, process, and analyze them.
Big data describes large and diverse datasets that are huge in
volume and also rapidly grow in size over time. Big data is
used in machine learning, predictive modeling, and other
advanced analytics to solve business problems and make
informed decisions
Characteristics of Big data
5 V’s of Big Data
5 V’s of Big Data
Volume
It refers to the size of Big Data.
Data can be considered Big Data or not is based on the volume.
Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2
billion GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of
data.
Velocity
It refers to the speed at which the data is getting accumulated.
Big data velocity deals with the speed at the data flows from sources like application
logs, business processes, networks, and social media sites, sensors, mobile devices,
etc.
Example:- In the year 2000, Google was receiving 32.8 million searches per day. As
for 2018, Google was receiving 5.6 billion searches per day!
Veracity
It refers to the assurance of quality/integrity/credibility/accuracy of the data.
Since the data is collected from multiple sources, we need to check the data for
accuracy before using it for business insights.
5 V’s of Big data
Value:
True to its name, Value refers to the actionable insight that can be
derived from big data sets.
The bulk of Data having no Value is of no good to the company, unless you
turn it into something useful.
Data in itself is of no use or importance but it needs to be converted into
something valuable to extract Information.
Variety
It refers to heterogeneous sources.
Variety is basically the arrival of data from new sources that are both
inside and outside of an enterprise.
It can be structured, semi-structured and unstructured.
END