0% found this document useful (0 votes)
3 views

DBMS

Uploaded by

milirana17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DBMS

Uploaded by

milirana17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Denormalization in Databases

When we normalize tables, we break them into multiple smaller tables. So when we want to
retrieve data from multiple tables, we need to perform some kind of join operation on them. In
that case, we use the denormalization technique that eliminates the drawback of normalization.

Denormalization is a technique used by database administrators to optimize the efficiency of


their database infrastructure. This method allows us to add redundant data into a normalized
database to alleviate issues with database queries that merge data from several tables into a single
table. The denormalization concept is based on the definition of normalization that is defined as
arranging a database into tables correctly for a particular purpose.

For Example, We have two table students and branch after performing normalization. The
student table has the attributes roll_no, stud-name, age, and branch_id.

Additionally, the branch table is related to the student table with branch_id as the student table's
foreign key.

A JOIN operation between these two tables is needed when we need to retrieve all student names
as well as the branch name. Suppose we want to change the student name only, then it is great if
the table is small. The issue here is that if the tables are big, joins on tables can take an
excessively long time.
In this case, we'll update the database with denormalization, redundancy, and extra effort to
maximize the efficiency benefits of fewer joins. Therefore, we can add the branch name's data
from the Branch table to the student table and optimizing the database.

Normalization vs. denormalization


Denormalization addresses a fundamental fact in databases: Read and join operations are slow.

In a fully normalized database, each piece of data is stored only once, generally in separate
tables, with a relation to one another. For this information to become useable it must be read out
from the individual tables, as a query, and then joined together. If this process involves large
amounts of data or needs to be done many times a second, it can quickly overwhelm the
hardware of the database and slow performance -- or even crash the database.

As an example, imagine a fruit seller has a daily list of what fruit is in stock in their stand and a
daily list of the market prices for all fruits and vegetables. This would be two separate tables in a
normalized database. If a customer wanted to know the price of an item, the seller would need to
check both lists to determine if it is in stock and at what price. This would be slow and annoying.

Therefore, every morning, the seller creates another list with just the items in stock and the daily
price, combining the two lists as a quick reference to use throughout the day. This would be a
denormalized table for speeding up reading the data.

An important consideration for normalizing data is if the data will be read heavy or write heavy.
Because data is duplicated in a denormalized database, when data needs to be added or modified,
several tables will need to be changed. This results in slower write operations.

Therefore, the fundamental tradeoff becomes fast writes and slow reads in normalized databases
versus slow writes and fast reads in denormalized.

For example, imagine a database of customer orders from a website. If customers place many
orders every second but each order is only read out a few times during order processing,
prioritizing write performance may be more important (a normalized database). On the other
hand, if each order is read out hundreds of times per second to provide a 'based on your order
recommendations' list or is read by big data trending systems, then faster read performance will
become important (a denormalized database).

Another important consideration in a denormalized system is data consistency. In a normalized


database, each piece of data is stored in one place; therefore, the data will always be consistent
and will never produce contradictory results. Since data may be duplicated in a denormalized
database, it is possible that one piece of data is updated while another duplicated location is not,
which will result in a data inconsistency called an update anomaly. This places extra
responsibility on the application or database system to maintain the data and handle these errors.

Denormalization has become commonplace in database design. Advancing technology is


addressing many of the issues presented by denormalization, while the decrease in cost of both
disk and RAM storage has reduced the impact of storing redundant data for denormalized
databases. Additionally, increased emphasis on read performance and making data quickly
available has necessitated the use of denormalization in many databases.

DATA SECURITY
Data security refers to protective digital privacy measures that are applied to prevent
unauthorized access to computers, databases and websites. Data security also protects data from
corruption. Data security is an essential aspect of IT for organizations of every size and type.

Data security is also known as information security (IS) or computer security.


In simple terms, data security is the practice of keeping data protected from corruption and
unauthorized access. The focus behind data security is to ensure privacy while protecting
personal or corporate data.

Data is the raw form of information stored as columns and rows in our databases, network
servers and personal computers. This may be a wide range of information from personal files and
intellectual property to market analytics and details intended to top secret. Data could be
anything of interest that can be read or otherwise interpreted in human form.

However, some of this information isn't intended to leave the system. The unauthorized access of
this data could lead to numerous problems for the larger corporation or even the personal home
user. Having your bank account details stolen is just as damaging as the system administrator
who was just robbed for the client information in their database.

There has been a huge emphasis on data security as of late, largely because of the internet. There
are a number of options for locking down your data from software solutions to hardware
mechanisms. Computer users are certainly more conscious these days. The following are the
essential guidelines to secure your sensitive information.

Encryption

Encryption has become a critical security feature for thriving networks and active home users
alike. This security mechanism uses mathematical schemes and algorithms to scramble data into
unreadable text. It can only by decoded or decrypted by the party that possesses the associated
key.
Strong User Authentication

Authentication is another part of data security that we encounter with everyday computer usage.
Just think about when you log into your email or blog account. That single sign-on process is a
form authentication that allows you to log into applications, files, folders and even an entire
computer system. Once logged in, you have various given privileges until logging out. However,
it requires individuals to login using multiple factors of authentication. This may include a
password, a one-time password, a smart card or even a fingerprint.

Backup Solutions

Data security wouldn't be complete without a solution to backup your critical information.
Though it may appear secure while confined away in a machine, there is always a chance that
your data can be compromised. You could suddenly be hit with a malware infection where a
virus destroys all of your files. Someone could enter your computer and thieve data by sliding
through a security hole in the operating system. A reliable backup solution will allow you to
restore your data instead of starting completely from scratch.

THREATS TO DATA SECURITY


There are many different threats to computer systems and the data stored on them. These threats
increased considerably when computers started to be networked but with the Internet, they have
become one of the most important considerations in managing a computer system.

Hackers

Unless they are protected, computer systems are vulnerable to anyone who wants to edit, copy or
delete files without the owner’s permission. Such individuals are usually called hackers.

Malware

Malware, short for malicious software, is software designed to gain access to a computer system
without the owner's consent. The expression is a general term used by the computer industry to
mean a variety of forms of hostile, intrusive, or annoying software. These things are sometimes,
incorrectly, referred to as a computer virus. Software is considered malware based on the
perceived intent of the creator rather than any particular features.

Virus

A computer virus is a piece of software that is designed to disrupt or stop the normal working of
a computer. They are called viruses because like a biological virus, they are passed on from one
infected machine to another. Downloading software from the Internet, attachments to emails or
using USB memory sticks are the most common ways of a virus infecting your computer.
Worms

A computer worm is a self-replicating program. It uses a computer network to send copies of


itself to computers on the network and it may do so without any user intervention. It is able to do
this because of security weaknesses on the target computer. Unlike a virus it does not need to
attach itself to an existing program. Worms almost always cause at least some harm to the
network, if only by consuming bandwidth whereas viruses almost always corrupt or modify files
on a targeted computer.

Trojan Horses

Trojan horses are designed to allow a hacker remote access to a target computer system. Once a
Trojan horse has been installed on a target computer system, it is possible for a hacker to access
it remotely and perform various operations. The operations that a hacker can perform are limited
by user privileges on the target computer system and the design of the Trojan horse.

Spyware

Spyware is a type of malware that is installed on computer and collects little bits of information
at a time about users without their knowledge. It can be very difficult for a user to tell if spyware
is present on a computer. Sometimes however, spywares such as key loggers are installed by a
company, or on a public computer such as in a library in order to secretly monitor other users.

While the term spyware suggests that software that secretly monitors the user's computing, the
functions of spyware extend well beyond simple monitoring. Spyware programs can collect
various types of personal information, such as Internet surfing habits and sites that have been
visited, but can also interfere with user control of the computer in other ways, such as installing
additional software and redirecting Web browser activity. Spyware is known to change computer
settings, resulting in slow connection speeds, different home pages, and/or loss of Internet or
functionality of other programs. Spyware is also known more formally as privacy-invasive
software.

Adware
Adware, or advertising-supported software, is any software package that automatically plays,
displays, or downloads advertisements to a computer after the software is installed on it or while
the application is being used. Common forms of this type of malware are on websites where
popup windows appear when you land on the website. Some types of adware are also spyware.
.Crimeware
Crimeware is a class of malware designed specifically to automate cybercrime. Its purpose is to
carry out identity theft. It is most often targeted at financial services companies such as banks
online retailers etc. for the purpose of taking funds from those accounts or making unauthorized
transactions to benefit the thief controlling the crimeware.

Spam

Spam is the abuse of electronic messaging systems to send unsolicited bulk messages
indiscriminately. While the most widely recognized form of spam is e-mail spam, the term is
applied to similar abuses in other media: instant messaging spam web search engine spam and
social networking spam for example.

Phishing

Phishing is an e-mail fraud method in which the criminal sends out legitimate-looking email in
an attempt to gather personal and financial information from recipients. Typically, the messages
appear to come from well-known and trustworthy Web sites. Web sites that are frequently
spoofed by phishers include PayPal, eBay, MSN and Yahoo. A phishing expedition, like the
fishing expedition it's named after, is a speculative venture: the phisher puts the lure hoping to
fool at least a few of the prey that encounter it, take the bait. The criminal could then use the
information to take money from the persons account for example.

DATA BACKUP
In a computer system we have primary and secondary memory storage. Primary memory storage
devices - RAM is a volatile memory which stores disk buffer, active logs, and other related data
of a database. It stores all the recent transactions and the results too. When a query is fired, the
database first fetches in the primary memory for the data, if it does not exist there, then it moves
to the secondary memory to fetch the record. Fetching the record from primary memory is
always faster than secondary memory. If the primary memory crashes, all the data in the primary
memory is lost and we cannot recover the database.

In such cases, we can follow any one the following steps so that data in the primary memory are
not lost.

● We can create a copy of primary memory in the database with all the logs and buffers,
and are copied periodically into database. So in case of any failure, we will not lose all
the data. We can recover the data till the point it is last copied to the database.
● We can have checkpoints created at several places so that data is copied to the
database.
Suppose the secondary memory itself crashes. Then all the data are lost and we cannot recover.
We have to think of some alternative solution for this because we cannot afford for loss of data in
huge database.

There are three methods used to back up the data in the secondary memory, so that it can be
recovered if there is any failure.

● Remote Backup: - Database copy is created and stored in the remote network. This
database is periodically updated with the current database so that it will be in sync with
data and other details. This remote database can be updated manually called offline
backup. It can be backed up online where the data is updated at current and remote
database simultaneously. In this case, as soon as there is a failure of current database,
system automatically switches to the remote database and starts functioning. The user
will not know that there was a failure.
● In the second method, database is copied to memory devices like magnetic tapes and
kept at secured place. If there is any failure, the data would be copied from these tapes
to bring the database up.
● As the database grows, it is an overhead to backup whole database. Hence only the log
files are backed up at regular intervals. These log files will have all the information
about the transaction being made. So seeing these log files, database can be recovered.
In this method log files are backed up at regular intervals, and database is backed up
once in a week.

There are two types of data backup – physical data backup and Logical data backup. The
physical data backup includes physical files like data files, log files, control files, redo- undo logs
etc. They are the foundation of the recovery mechanism in the database as they provide the
minute details about the transactions and modification to the database

Logical backup includes backup of logical data like tables, views, procedures, functions etc.
Logical data backup alone is not sufficient to recover the database as they provide only the
structural information. The physical data back actually provides the minute details about the
database and is very much important for recovery.

RECOVERY

A database is a very huge system with lots of data and transaction. The transaction in the
database is executed at each seconds of time and is very critical to the database. If there is any
failure or crash while executing the transaction, then it expected that no data is lost. It is
necessary to revert the changes of transaction to previously committed point. There are various
techniques to recover the data depending on the type of failure or crash.

● Transaction Failure: - This is the condition in the transaction where a transaction


cannot execute it further. This type of failure affects only few tables or processes. The
failure can be because of logical errors in the code or because of system errors like
deadlock or unavailability of system resources to execute the transactions.
● System Crash: - this can be because of hardware or software failure or because of
external factors like power failure. In most of the cases data in the secondary memory
are not affected because of this crash. This is because; the database has lots of integrity
checkpoints to prevent the data loss from secondary memory.
● Disk Failure: - these are the issues with hard disks like formation of bad sectors, disk
head crash, unavailability of disk etc.

As we have seen already, each transaction has ACID property. In case of transaction failure or
system crash, it should maintain its ACID property. Failing to maintain ACID is the failure of
database system. That means any transaction in the system cannot be left at the stage of its
failure. It should either be completed fully or rolled back to the previous consistent state.

Suppose there was a transaction on the Student database to enter the marks of a student in 3
subjects and then to calculate his total. Suppose there is a transaction failure, when 3rd mark has
been entered into the table. This transaction cannot be left at this stage because student has marks
in two subjects already entered. When the system is recovered and total is calculated, it is
calculated based on two subject marks, which is not correct. In this case, either the transaction
has to be completed fully to enter the 3rd mark and calculate the total, or remove the marks that
have entered already. Either completing the transaction fully or revert the transaction fully brings
the database into a consistent state and data will not lead to any miscalculation.

● Log Based Recovery: - In this method, log of each transaction is maintained in some
stable storage, so that in case of any failure, it can be recovered from there to recover
the database. But storing the logs should be done before applying the actual transaction
on the database.

Every log in this case will have information like what transaction is being executed, which
values have been modified to which value, and state of the transaction. All these log information
will be stored in the order of execution.
● Shadow paging: - This is the method where all the transactions are executed in the
primary memory. Once all the transactions completely executed, it will be updated to
the database. Hence, if there is any failure in the middle of transaction, it will not be
reflected in the database. Database will be updated after all the transaction.

Constraints in DBMS

Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a
table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can
be created in RDBMS.

Types of constraints

● NOT NULL
● UNIQUE
● DEFAULT
● CHECK
● Key Constraints – PRIMARY KEY, FOREIGN KEY
● Domain constraints
● Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t
provide value for a particular column while inserting a record into a table, it takes NULL value
by default. By specifying NULL constraint, we can be sure that a particular column(s) cannot
have NULL values.

Example:

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235),
PRIMARY KEY (ROLL_NO)
);
UNIQUE:

UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has
a unique constraint, it means that particular column cannot have duplicate values in a table.

Example:

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

DEFAULT:

The DEFAULT constraint provides a default value to a column when there is no value provided
while inserting a record into a table.

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
CHECK:

This constraint is used for specifying range of values for a particular column of a table. When
this constraint is being set on a column, it ensures that the specified column must have the value
falling in the specified range.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
In the above example we have set the check constraint on ROLL_NO column of STUDENT
table. Now, the ROLL_NO field must have the value greater than 1000.

Key constraints:

PRIMARY KEY:

Primary key uniquely identifies each record in a table. It must have unique values and cannot
contain nulls. In the below example the ROLL_NO field is marked as primary key, that means
the ROLL_NO field cannot have duplicate and null values.

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table. They act
as a cross-reference between tables.
Read more about it here.

Domain constraints:

Each table has certain set of columns and each column allows a same type of data, based on its
data type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:

Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT)

You might also like