0% found this document useful (0 votes)
20 views

Dataware House Lec 2

dataware lecture notes

Uploaded by

Milan Deep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Dataware House Lec 2

dataware lecture notes

Uploaded by

Milan Deep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Components and building block of data ware house

30 January 2024 12:08

Data Warehouse vs DBMS


Database Data Warehouse
A common Database is based on operational A data Warehouse is based on analytical processing.
or transactional processing. Each operation is
an indivisible transaction.
Generally, a Database stores current and up-to- A Data Warehouse maintains historical data over
date data which is used for daily operations. time. Historical data is the data kept over years and
can used for trend analysis, make future predictions
and decision support.
A database is generally application specific. A Data Warehouse is integrated generally at the
Example – A database stores related data, organization level, by combining data from different
such as the student details in a school. databases.
Example – A data warehouse integrates the data
from one or more databases , so that analysis can be
done to get results , such as the best performing
school in a city.
Constructing a Database is not so expensive. Constructing a Data Warehouse can be expensive.
Example Applications of Data Warehousing
Data Warehousing can be applied anywhere where we have a huge amount of data and we want to see statistical
results that help in decision making.
• Social Media Websites: The social networking websites like Facebook, Twitter, Linkedin, etc. are based on
analyzing large data sets. These sites gather data related to members, groups, locations, etc., and store it in a
single central repository. Being a large amount of data, Data Warehouse is needed for implementing the same.
• Banking: Most of the banks these days use warehouses to see the spending patterns of account/cardholders.
They use this to provide them with special offers, deals, etc.
• Government: Government uses a data warehouse to store and analyze tax payments which are used to detect tax
thefts.
Components data ware house

The figure shows the essential elements of a typical warehouse. We see the Source Data component shows on the
left. The Data staging element serves as the next building block. In the middle, we see the Data Storage
component that handles the data warehouses data. This element not only stores and manages the data; it also
keeps track of data using the metadata repository. The Information Delivery component shows on the right
consists of all the different ways of making the information from the data warehouses available to the us

Source Data Component


Source data coming into the data warehouses may be grouped into four broad categories:
Production Data: This type of data comes from the different operating systems of the enterprise. Based on the
data requirements in the data warehouse, we choose segments of the data from the various operational modes.
Internal Data: In each organization, the client keeps their "private" spreadsheets, reports, customer profiles,
and sometimes even department databases. This is the internal data, part of which could be useful in a data
warehouse.
Archived Data: Operational systems are mainly intended to run the current business. In every operational
Unit 1 Page 1
Archived Data: Operational systems are mainly intended to run the current business. In every operational
system, we periodically take the old data and store it in achieved files.
External Data: Most executives depend on information from external sources for a large percentage of the
information they use. They use statistics associating to their industry produced by the external department.

Data Staging Component


After we have been extracted data from various operational systems and external sources, we have to prepare the
files for storing in the data warehouse. The extracted data coming from several different sources need to be
changed, converted, and made ready in a format that is relevant to be saved for querying and analysis.
We will now discuss the three primary functions that take place in the staging area.

1) Data Extraction: This method has to deal with numerous data sources. We have to employ the appropriate
techniques for each data source.
2) Data Transformation: As we know, data for a data warehouse comes from many different sources. If data
extraction for a data warehouse posture big challenges, data transformation present even significant challenges.
We perform several individual tasks as part of data transformation.
First, we clean the data extracted from each source. Cleaning may be the correction of misspellings or may deal
with providing default values for missing data elements, or elimination of duplicates when we bring in the same
data from various source systems.
Standardization of data components forms a large part of data transformation. Data transformation contains many
forms of combining pieces of data from different sources. We combine data from single source record or related
data parts from many source records.
On the other hand, data transformation also contains purging source data that is not useful and separating
outsource records into new combinations. Sorting and merging of data take place on a large scale in the data
staging area. When the data transformation function ends, we have a collection of integrated data that is cleaned,
standardized, and summarized.
3) Data Loading: Two distinct categories of tasks form data loading functions. When we complete the structure
and construction of the data warehouse and go live for the first time, we do the initial loading of the information
into the data warehouse storage. The initial load moves high volumes of data using up a substantial amount of
time.

Data Storage Components


Data storage for the data warehousing is a split repository. The data repositories for the operational systems
generally include only the current data. Also, these data repositories include the data structured in highly
normalized for fast and efficient processing.
Information Delivery Component
The information delivery element is used to enable the process of subscribing for data warehouse files and
having it transferred to one or more destinations according to some customer-specified scheduling algorithm.

Metadata Component
Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database management
system. In the data dictionary, we keep the data about the logical data structures, the data about the records and
addresses, the information about the indexes, and so on.
Data Marts
It includes a subset of corporate-wide data that is of value to a specific group of users. The scope is confined to

Unit 1 Page 2
It includes a subset of corporate-wide data that is of value to a specific group of users. The scope is confined to
particular selected subjects. Data in a data warehouse should be a fairly current, but not mainly up to the minute,
although development in the data warehouse industry has made standard and incremental data dumps more
achievable. Data marts are lower than data warehouses and usually contain organization. The current trends in
data warehousing are to developed a data warehouse with several smaller related data marts for particular kinds
of queries and reports.
Management and Control Component
The management and control elements coordinate the services and functions within the data warehouse. These
components control the data transformation and the data transfer into the data warehouse storage. On the other
hand, it moderates the data delivery to the clients. Its work with the database management systems and
authorizes data to be correctly saved in the repositories. It monitors the movement of information into the staging
method and from there into the data warehouses storage itself.

Unit 1 Page 3

You might also like