0% found this document useful (0 votes)
3 views

Dbms and Data Warehouse

Uploaded by

Suman Chatterjee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Dbms and Data Warehouse

Uploaded by

Suman Chatterjee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

What is Database

The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff, students
and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in
different applications.
o DBMS provides an interface to perform various operations like database creation, storing
data in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the
data in one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of
data from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of
the organization, all the data stored in a single database and if the database is damaged
due to electric failure or database corruption then the data may be lost forever.

Some examples of Database Management System

 MySQL
 PostgreSQL
 Microsoft Access
 SQL Server
 Oracle
There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a collection of


DBMS, the user is not required to data. In this system, the user has
write the procedures. to write the procedures for
managing the database.

Sharing of data Due to the centralized approach, Data is distributed in many files,
data sharing is easy. and it may be of different
formats, so it isn't easy to share
data.

Data Abstraction DBMS gives an abstract view of The file system provides the
data that hides the details. detail of the data representation
and storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.

Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the
user from system failure. system crashes while entering
some data, then the content of
the file will be lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store store and retrieve the data.
and retrieve the data.

Concurrency DBMS takes care of Concurrent In the File system, concurrent


Problems access of data using some form of access has many problems like
locking. redirecting the file while deleting
some information or updating
some information.

Where to use Database approach used in large File system approach used in
systems which interrelate many large systems which interrelate
files. many files.

Cost The database system is expensive The file system approach is


to design. cheaper to design.

Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by
Inconsistency redundancy and inconsistency are different programmers so that
controlled. there exists a lot of duplication of
data which may lead to
inconsistency.

Structure The database structure is complex The file system approach has a
to design. simple structure.
Data In this system, Data Independence In the File system approach,
Independence exists, and it can be of two types. there exists no Data
o Logical Data Independence Independence.

o Physical Data Independence

Integrity Integrity Constraints are easy to Integrity Constraints are difficult


Constraints apply. to implement in file system.

Data Models In the database approach, 3 types In the file system approach, there
of data models exist: is no concept of data models
exists.
o Hierarchal data models
o Network data models
o Relational data models

Flexibility Changes are often a necessity to The flexibility of the system is less
the content of the data stored in as compared to the DBMS
any system, and these changes are approach.
more easily with a database
approach.

3-schema architecture of DBMS:


The DBMS architecture can be classified as
o The three schema architecture is also used to separate the user applications and
physical database.

The three-schema architecture is as follows:


The main objective of three level architecture is to enable multiple users to access the
same data with a personalized view while storing the underlying data only once. Thus it
separates the user's view from the physical structure of the database. This separation is
desirable for the following reasons:

o Internal structure of the database should be unaffected by changes to physical


aspects of the storage.

1. Internal Level

o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be
stored in a block.
o The physical level is used to describe complex low-level data structures in detail.

2. Conceptual Level

o The conceptual schema describes the design of a database at the conceptual


level. Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and
also describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.

3. External Level

o At the external level, a database contains several schemas that sometimes called
as subschema. The subschema is used to describe the different view of the
database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.

Applications of Database management system in different


sectors
 Manufacturing: Product-based industries manufacture different
types of products and deliver them daily, weekly, or monthly basis.
In this sector, DBMS is used to store information about products
like the quantity of the products, product bills, supply chain
management.
 Banking & Finance: In banking, DBMS is used to store the
customer transaction details, and in the finance sector, it is used to
hold the data about sales, stocks, and bonds.
 Education sector: DBMS is very useful for this sector. Information
about the students, their attendance, courses, fees, results are
stored in the database. Apart from this, DBMS is used to store the
staff data also. In addition, many colleges and universities use
DMMS for conducting online examinations.
 Credit card transaction: While purchasing a credit card and
creating a monthly statement, DBMS is used.
 Social media sites: Nowadays, social media are the popular
platforms to share our thoughts and views with the world and with
our friends. Social media also allow us to connect with our friends.
Millions of sign-ins and sign-ups happen daily for social media like
Facebook, Twitter, Linkedin, and so on. All these things happen
with the help of DBMS that allows us to connect with others.
 Telecommunication: All telecommunication companies use
DBMS. The database management system is crucial for this sector
to store monthly postpaid bills and customer call details in the
database.
 Railways & airlines reservation system: DBMS is necessary to
store the data of ticket booking and keep the information of train/
airplane arrivals, departure, and delay status.
 Human resource management: Big industries have more
employees. DBMS is required to store employees’ information like
their permanent address, salary, tax and other details.
 Online shopping: To save time, online shopping is helpful.
Present days online shopping created a trend. People love to do
online shopping through websites like Amazon, Flipkart. All the
transactions such as products added, the products sold,
generation of invoice bills, payments happen with the help of
DBMS.

Data Warehouse Defined

A data warehouse is a type of data management system that is


designed to enable and support business intelligence (BI)
activities, especially analytics. Data warehouses are solely intended
to perform queries and analysis and often contain large amounts
of historical data. The data within a data warehouse is usually
derived from a wide range of sources such as application log files
and transaction applications.
A data warehouse centralizes and consolidates large amounts of
data from multiple sources. Its analytical capabilities allow
organizations to derive valuable business insights from their data
to improve decision-making. Over time, it builds a historical
record that can be invaluable to data scientists and business
analysts. Because of these capabilities, a data warehouse can be
considered an organization’s “single source of truth.”
Benefits of a Data Warehouse

Data warehouses offer the overarching and unique benefit of


allowing organizations to analyze large amounts of variant data
and extract significant value from it, as well as to keep a historical
record.

Four unique characteristics (described by computer scientist


William Inmon, who is considered the father of the data
warehouse) allow data warehouses to deliver this overarching
benefit. According to this definition, data warehouses are

 Subject-oriented. They can analyze data about a particular


subject or functional area (such as sales).
 Integrated. Data warehouses create consistency among
different data types from disparate sources.
 Nonvolatile. Once data is in a data warehouse, it’s stable and
doesn’t change.
 Time-variant. Data warehouse analysis looks at change over
time.
A well-designed data warehouse will perform queries very quickly,
deliver high data throughput, and provide enough flexibility for
end users to “slice and dice” or reduce the volume of data for
closer examination to meet a variety of demands—whether at a
high level or at a very fine, detailed level. The data warehouse
serves as the functional foundation for middleware BI
environments that provide end users with reports, dashboards,
and other interfaces.

How can a data warehouse benefit an organization? Or


features of Data Warehouse.

1. Subject-oriented
A specific from
collected business
here.purpose can be analyzed with the data

Suppose the business wants to understand the machine


downtime and how it can reduce. In that case, data can be
collected from the data warehouse to understand the various
times or situations during which the machines stopped working,
the reasons behind the same, and how this can be reduced.

2. Integrated

Data from different sources are integrated to provide collective


data. For instance, if a company wants to do budgeting for the
next quarter, a data warehouse will have all the information
required.
From incurred costs to depreciation costs, the entire set of data is
available in one single source.

3. Time-variant

The company utilizes the historical data stored in the system to


extract relevant reports and understand the overall organization’s
health.
But data such as the employee database, which includes
addresses and phone numbers, must not be included as they are
subject to change.

4. Non-volatile

Once data is entered, it remains the same. Therefore, the firm


must ensure that information is highly protected, and there is no
change for alteration.

If there are any modifications made, then it will affect the reports
and analysis.

5. Improved data quality

Helps to improve data quality by providing consistent, accurate


data and fixing insufficient data.

Disadvantages of data warehouse

Cost v/s Benefit


A dataand
hours
implementation
warehouse
more and
money
is an
maintenance
IT
from
project,
the budget.
and
are very
it consumes
Moreover,
expensive.
more
its man-
Hence the cost to benefit ratio is very low. However, if the
organization is small and medium, it may affect the revenue of the
organization.

Data Ownership
We know
service. The
that
main
data
concern
warehouses
of it isare
thesoftware
security applications
of data. for

You have to be more sure that the people who handle and
analyze the customer data are the employees that your company
trusts.

Because leaking of the customer’s personal data within the


organization may cause problems for executives and also affect
the relationship between the company and the customer.

Data Rigidity

The data that is imported into the data warehouse is often static
data sets that have less flexibility. They have less ability to
generate a particular solution.

Warehouses are subjected to ad hoc queries that are highly


difficult due to their most minor processing and query speed.

Miscalculation of ETL processing time


The entire
cleaning,
takes moreand
process
time.
loading
of data
of consolidated
warehouse development
data into the warehouse
is extraction,

But usually, organizations do not guess the time required for the
ETL process. As a result, it leads to a backlog of works in the
organization.
Discuss ETL process:

1) Data Extraction: This method has to deal with numerous data


sources. We have to employ the appropriate techniques for each
data source.

2) Data Transformation: As we know, data for a data warehouse


comes from many different sources. If data extraction for a data
warehouse posture big challenges, data transformation present
even significant challenges. We perform several individual tasks as
part of data transformation.

First, we clean the data extracted from each source. Cleaning may
be the correction of misspellings or may deal with providing
default values for missing data elements, or elimination of
duplicates when we bring in the same data from various source
systems.
Standardization of data components forms a large part of data
transformation. Data transformation contains many forms of
combining pieces of data from different sources. We combine
data from single source record or related data parts from many
source records.

On the other hand, data transformation also contains purging


source data that is not useful and separating outsource records
into new combinations. Sorting and merging of data take place on
a large scale in the data staging area. When the data
transformation function ends, we have a collection of integrated
data that is cleaned, standardized, and summarized.

3) Data Loading: Two distinct categories of tasks form data


loading functions. When we complete the structure and
construction of the data warehouse and go live for the first time,
we do the initial loading of the information into the data
warehouse storage. The initial load moves high volumes of data
using up a substantial amount of time.

Why we need a separate Data Warehouse?

Data Warehouse queries are complex because they involve the


computation of large groups of data at summarized levels.

It may require the use of distinctive data organization, access, and


implementation method based on multidimensional views.
Performing OLAP queries in operational database degrade the
performance of functional tasks.

Data Warehouse is used for analysis and decision making in which


extensive database is required, including historical data, which
operational database does not typically maintain.

The separation of an operational database from data warehouses


is based on the different structures and uses of data in these
systems.

Because the two systems provide different functionalities and


require different kinds of data, it is necessary to maintain separate
databases.

Difference between Database and Data Warehouse

Database Data Warehouse

1. It is used for Online 1. It is used for Online


Transactional Processing Analytical Processing
(OLTP) but can be used for (OLAP). This reads the
other objectives such as Data historical information for
Warehousing. This records the the customers for business
data from the clients for decisions.
history.
2. The tables and joins are 2. The tables and joins are
complicated since they are accessible since they are
normalized for RDBMS. This is de-normalized. This is
done to reduce redundant files done to minimize the
and to save storage space. response time for
analytical queries.
3. Data is dynamic 3. Data is largely static
4. Entity: Relational modeling 4. Data: Modeling
procedures are used for approach are used for the
RDBMS database design. Data Warehouse design.
5. Optimized for write 5. Optimized for read
operations. operations.
6. Performance is low for 6. High performance for
analysis queries. analytical queries.
7. The database is the place 7. Data Warehouse is the
where the data is taken as a place where the
base and managed to get application data is handled
available fast and efficient for analysis and reporting
access. objectives.
What is Data Mart?

A Data Mart is a subset of a directorial information store,


generally oriented to a specific purpose or primary data subject
which may be distributed to provide business needs. Data Marts
are analytical record stores designed to focus on particular
business functions for a specific community within an
organization. Data marts are derived from subsets of data in a
data warehouse, though in the bottom-up data warehouse design
methodology, the data warehouse is created from the union of
organizational data marts.
The fundamental use of a data mart is Business Intelligence
(BI) applications. BI is used to gather, store, access, and analyze
record. It can be used by smaller businesses to utilize the data
they have accumulated since it is less expensive than
implementing a data warehouse.

Reasons for creating a data mart

o Creates collective data by a group of users


o Easy access to frequently needed data
o Ease of creation
o Improves end-user response time
o Lower cost than implementing a complete data warehouses
o Potential clients are more clearly defined than in a
comprehensive data warehouse
o It contains only essential business data and is less cluttered.

Types of Data Marts


There are mainly two approaches to designing data marts. These
approaches are

o Dependent Data Marts


o Independent Data Marts

Dependent Data Marts

A dependent data marts is a logical subset of a physical subset of


a higher data warehouse. According to this technique, the data
marts are treated as the subsets of a data warehouse. In this
technique, firstly a data warehouse is created from which further
various data marts can be created. These data mart are dependent
on the data warehouse and extract the essential record from it. In
this technique, as the data warehouse creates the data mart;
therefore, there is no need for data mart integration. It is also
known as a top-down approach.

9.3M
129
SQL CREATE TABLE

Independent Data Marts


The second approach is Independent data marts (IDM) Here,
firstly independent data marts are created, and then a data
warehouse is designed using these independent multiple data
marts. In this approach, as all the data marts are designed
independently; therefore, the integration of data marts is required.
It is also termed as a bottom-up approach as the data marts are
integrated to develop a data warehouse.

Other than these two categories, one more type exists that is
called "Hybrid Data Marts."

Hybrid Data Marts

It allows us to combine input from sources other than a data


warehouse. This could be helpful for many situations; especially
when Adhoc integrations are needed, such as after a new group
or product is added to the organizations.

Difference between Data Warehouse and Data Mart


Data Warehouse Data Mart

A Data Warehouse is a vast A data mart is an only


repository of information subtype of a Data
collected from various Warehouses. It is
organizations or departments architecture to meet the
within a corporation. requirement of a specific
user group.
It may hold multiple subject It holds only one subject
areas. area. For example, Finance
or Sales.
It holds very detailed It may hold more
information. summarized data.
Works to integrate all data It concentrates on
sources integrating data from a
given subject area or set of
source systems.
In data warehousing, Fact In Data Mart, Star Schema
constellation is used. and Snowflake Schema are
used.
It is a Centralized System. It is
a Decentralized System.

Types of data warehouse architecture


Single-tier architecture:

In this type of architecture, only the source layer is available. Thus,


the single-tier consists of the source layer, data warehouse
layer, and analysis layer.
Two-tier architecture: It consists of a data staging area or ETL
(extraction, transformation, and loading) and the source layer.

This layer helps to merge diversified data into one standard


schema. This type of architecture consists of the source layer,
data staging layer, data warehouse layer, and analysis layer.
Three-tier architecture: In this architecture contains reconciled
layer along with the data staging and source layer.

The source layer contains multiple sources in this architecture, and


the data warehouse layer has data warehouses and data marts.

The role of a reconciled layer is to generate a standard data


model for the entire enterprise. This reconciled layer can also use
to do some operational works like reporting.

This architecture consists of the source, data staging,


reconciled, data warehouse, and analysis layers.
What is online analytical processing (OLAP)?

Online analytical processing (OLAP) is a category of software


tools that analyze data stored in a database. OLAP tools enable
users to analyze different dimensions of multidimensional data.

Generally, this tool is used for budget forecasting, sales


forecasting, financial reporting, and other planning & forecasting
needs of the organization.

How does it work?


The chief component of online analytical processing is the OLAP
server, which sits between a client and a database management
system (DBMS), and which understands how data is organized in
the database and has special functions for analyzing the data.

There are OLAP servers available for nearly all the major database
systems.

It is more important to know about the Online analytical


processing cube. The OLAP cube is a data structure that allows
you to analyze data very quickly.

In the OLAP cube, numeric facts (measures) are categorized by


dimensions. OLAP holds multidimensional data. The OLAP cube
helps to store and analyze this multidimensional data.

Advantages of Online Analytical Processing

 It helps to get all the data together to create accurate and quick
information about the business.
 It helps to analyze the time series.
 Provides a platform for all types of business, including planning,
budgeting, forecasting, financial reporting, data warehouse
reporting.
 Allows users to do compatible calculations.
 Allows users to divide a big cube into dice cube data by several
dimensions, measures, and filters.
 It helps the end-users to analyze data in multiple dimensions so
that they make better decisions in business.
Disadvantages of Online Analytical Processing

 It is challenging to have a large number of dimensions in a single


OLPA cube.
 Snowflake schema required for organizing data is complex to
implement.
 Modification of an OLAP cube requires a complete update of the
cube that consumes more time.

Analytical operations in OLAP

Generally, OLAP has four basic analytical operations.

Roll-up operation: It is also called ‘aggregation.’ We can perform


this operation in two ways.

 Reduction of dimension: It is the system in which the cube reduces


its dimension.

 Climbing up concept hierarchy. It is the system of grouping things


based on their level.
Roll Up on Geography from cities to country

The above image shows the roll-up operation.

 Here cities New York and Washington rolled up into USA


 The sales figures of the cities were 400 and 550 and became 950
after rolled up

Drill-down operation: It is the opposite process of roll-up. It


performs in 2 ways.

 Increasing of dimension

 Climbing down the concept hierarchy


Drill Down on time(From Quarter to Month)
This image shows the drill-down operation

 Quarter 1 is divided into months January, February, and March


 Months dimension is added

Slice and dice operation: In slice operation, one dimension is


selected, and a subcube is created. In dice operation, two or more
dimensions are selected, and subcubes are created.
Pivot operation:
presentation of data,
In this
youoperation,
need to rotate
to provide
the data
a substitute
axes.
When do you use online analytical processing?

You can use OLAP in the following situations.

 When you are required to perform complex analytical and ad hoc


quickly without interrupting and affecting the OLTP system.
 When you need to issue reports using your data to the business
users in an easy way.
 When you want to deliver several aggregations to help the user
with consistent and quick results.

Difference between OLAP and OLTP

What is OLTP?
OLTP means online transaction processing. It is an operational
system used for handling recent operational data.

Following are the differences between OLAP and OLTP.

 OLAP is the system used for data analysis, OLTP is the system used
for data transactions.
 OLAP is identified by a large amount of data, whereas OLTP is
identified by a large number of small amounts of data.
 OLAP is large in size, basically ranging from 1Tb to 100Pb, OLTP is
small in size ranging from 1Mb to 10 Gb.
 OLAP operates with a data warehouse, OLTP operates with a
traditional database management system.
 Processing speed is less in OLAP, but OLTP has a faster processing
speed.
 OLAP reply time is more, usually takes seconds to minute to
respond, OLTP responds fastly, takes milliseconds.
 OLTP needs both read and write operations, but OLAP needs only
read operations.
 The objective of OLAP is to make decisions with the help of large
data sources. On the other hand, the objective of OLTP is day-to-
day operations.
 Queries are complex in OLAP, and queries are simple in OLTP.
 User strength is low in OLAP. Its database allows only hundreds of
users, whereas the OLTP database allows thousands of users.
 OLAP helps to improve the productivity of business analysts, OLTP
helps to improve the productivity and self-service of users.
 OLAP is created for business analysis, whereas OLTP is created for
real-time business operations.

You might also like