0% found this document useful (1 vote)

339 views

Unit 3 Notes

The document provides an introduction to data warehouses. It discusses how a data warehouse stores integrated information from across an organization to support decision making. It describes the key components of a data warehouse including the central data warehouse, data marts, and legacy systems. It also discusses dimensional modeling techniques used to design relational data warehouses and categorizes different types of data warehouse users.

Uploaded by

Rajkumar Dharmaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

339 views

Unit 3 Notes

Uploaded by

Rajkumar Dharmaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT 3 NOTES

DATAWARE HOUSE – INTRODUCTION

Data Warehouse (DW) is a database that stores information oriented to satisfy decision-making
requests. A very frequent problem in enterprises is the impossibility for accessing to corporate,
complete and integrated information of the enterprise that can satisfy decision-making requests.

A paradox occurs: data exists but information cannot be obtained. In general, a DW is

constructed with the goal of storing and providing all the relevant information that is generated
along the different databases of an enterprise.

A DW is a database with particular features. Concerning the data it contains, it is the result of
transformations, quality improvement and integration of data that comes from operational bases.
Besides, it includes indicators that are derived from operational data and give it additional value.
Concerning its utilization, it is supposed to support complex queries (summarization, aggregates,
crossing of data), while its maintenance does not suppose transactional load.

In addition, in a DW environment end users make queries directly against the DW through user-
friendly query tools, instead of accessing information through reports generated by specialists.

Building and maintaining a DW need to solve problems of many different aspects. In this chapter
we concentrate in DW design.

A data warehouse has three main components:

1. A “Central Data Warehouse” or “Operational Data Store(ODS)”, which is a data base

organized according to the corporate data model.

2. One or more “data marts”—extracts from the central data warehouse that are organized
according to the particular retrieval requirements of individual users.

3. The “legacy systems” where an enterprise’s data are currently kept.

THE CENTRAL DATA WAREHOUSE

The Central Data Warehouse is just that—a warehouse. All the enterprise’s data are stored in
there, “normalized”, in order to minimize redundancy and so that each may be found easily.
This is accomplished by organizing it according to the enterprise’s corporate data model. Think
of it as a giant grocery store warehouse where the chocolates are kept in one section, the T-shirt
is in another, and the CDs are in a third.

We found in the literature, globally two different approaches for Relational DW design:

One that applies dimensional modeling techniques, and another that bases mainly in the concept
of materialized view.

Dimensional models represent data with a “cube” structure, making more compatible logical

data representation with OLAP data management. According to the objectives of dimensional
modeling are:

(i) To produce database structures that are easy for end-users to understand and write queries
against,

(ii) To maximize the efficiency of queries

It achieves these objectives by minimizing the number of tables and relationships between them.
Normalized databases have some characteristics that are appropriate for OLTP systems, but not
for DWs:

1. Its structure is not easy for end-users to understand and use. In OLTP systems this is not a
problem because, usually end-users interact with the database through a layer of software.

2. Data redundancy is minimized. This maximizes efficiency of updates, but tends to penalize
retrievals. Data redundancy is not a problem in DWs because data is not updated on-line.

The basic concepts of dimensional modeling are: facts, dimensions and measures

 A fact is a collection of related data items, consisting of measures and context data. It
typically represents business items or business transactions.
 A dimension is a collection of data that describe one business dimension. Dimensions
determine the contextual background for the facts; they are the parameters over which we
want to perform OLAP.
 A measure is a numeric attribute of a fact, representing the performance or behavior of
the business relative to the dimensions.

GOALS OF DATA WAREHOUSE ARCHITECTURE

A data warehouse exists to serve its users—analysts and decision makers. A data warehouse
must be designed to satisfy the following requirements:

1. Deliver a great user experience—user acceptance is the measure of success.

2. Function without interfering with OLTP systems.

3. Provide a central repository of consistent data.

4. Answer complex queries quickly.

5. Provide a variety of powerful analytical tools such as OLAP and data mining.

Most successful data warehouses that meet these requirements have these common

characteristics:

1. Based on a dimensional model

2. Contain historical data

3. Include both detailed and summarized data

4. Consolidate disparate data from multiple sources while retaining consistency

5. Focus on a single subject such as sales, inventory, or finance

DATA WAREHOUSE USERS

The success of a data warehouse is measured solely by its acceptance by users. Without users,
historical data might as well be archived to magnetic tape and stored in the basement. Successful
data warehouse design starts with understanding the users and their needs.

Data warehouse users can be divided into four categories:

 Statisticians
 knowledge workers
 information consumers
 executives.

Each type makes up a portion of the user population as illustrated in this diagram

Statisticians

 There are typically only a handful of statisticians and operations research types in any
organization.
 Their work can contribute to closed loop systems that deeply influence the operations and
profitability of the company.

Knowledge Workers

 A relatively small number of analysts perform the bulk of new queries and analyses
against the data warehouse.
 These are the users who get the Designer or Analyst versions of user access tools. They
will figure out how to quantify a subject area. After a few iterations, their queries and
reports typically get published for the benefit of the Information Consumers.
 Knowledge Workers are often deeply engaged with the data warehouse design and place
the greatest demands on the ongoing data warehouse operations team for training and
support.

Information Consumers

 Most users of the data warehouse are Information Consumers; they will probably never
compose a true ad hoc query.
 They use static or simple interactive reports that others have developed. They usually
interact with the data warehouse only through the work product of others.
 This group includes a large number of people, and published reports are highly visible.
Set up a great communication infrastructure for distributing information widely, and
gather feedback from these users to improve the information sites over time.

Executives:

 Executives are a special case of the Information Consumers group.

Process Managers

Process managers are responsible for maintaining the flow of data both into and out of the data
warehouse. There are three different types of process managers −

1. Load manager

2. Warehouse manager

3. Query manager

1. Load Manager

 Load manager performs the operations required to extract and load the data into the
database.
 The size and complexity of a load manager varies between specific solutions from one
data warehouse to another.
 The load manager does performs the following functions −
 Extract data from the source system.
 Fast load the extracted data into temporary data store.
 Perform simple transformations into structure similar to the one in the data warehouse.

 The data is extracted from the operational databases or the external information
providers.
 Gateways are the application programs that are used to extract data.
 It is supported by underlying DBMS and allows the client program to generate SQL to be
executed at a server.
 Open Database Connection (ODBC) and Java Database Connection (JDBC) are examples
of gateway.

EXTRACT DATA FROM SOURCE

 The data is extracted from the operational databases or the external information
providers.
 Gateways are the application programs that are used to extract data. It is supported by
underlying DBMS and allows the client program to generate SQL to be executed at a
server.
 Open Database Connection (ODBC) and Java Database Connection (JDBC) are examples
of gateway.
FAST LOAD

 In order to minimize the total load window, the data needs to be loaded into the
warehouse in the fastest possible time.
 Transformations affect the speed of data processing.
 It is more effective to load the data into a relational database prior to applying
transformations and checks.
 Gateway technology is not suitable, since they are inefficient when large data volumes
are involved.

SIMPLE TRANSFORMATIONS

While loading, it may be required to perform simple transformations. After completing simple
transformations, we can do complex checks.

Suppose we are loading the EPOS sales transaction, we need to perform the following checks −

 Strip out all the columns that are not required within the warehouse.
 Convert all the values to required data types.

2. Warehouse Manager

The warehouse manager is responsible for the warehouse management process. It consists of a
third-party system software, C programs, and shell scripts.

The size and complexity of a warehouse manager varies between specific solutions.

WAREHOUSE MANAGER ARCHITECTURE

A warehouse manager includes the following −

 The controlling process

 Stored procedures or C with SQL
 Backup/Recovery tool
 SQL scripts
FUNCTIONS OF WAREHOUSE MANAGER

A warehouse manager performs the following functions −

 Analyzes the data to perform consistency and referential integrity checks.

 Creates indexes, business views, partition views against the base data.
 Generates new aggregations and updates the existing aggregations.
 Generates normalizations.
 Transforms and merges the source data of the temporary store into the published data
warehouse.
 Backs up the data in the data warehouse.
 Archives the data that has reached the end of its captured life.

Note: A warehouse Manager analyzes query profiles to determine whether the index and
aggregations are appropriate
3. Query Manager

The query manager is responsible for directing the queries to suitable tables. By directing the
queries to appropriate tables, it speeds up the query request and response process.

In addition, the query manager is responsible for scheduling the execution of the queries posted
by the user.

QUERY MANAGER ARCHITECTURE

A query manager includes the following components −

 Query redirection via C tool or RDBMS

 Stored procedures
 Query management tool
 Query scheduling via C tool or RDBMS
 Query scheduling via third-party software

FUNCTIONS OF QUERY MANAGER

 It presents the data to the user in a form they understand.

 It schedules the execution of the queries posted by the end-user.
 It stores query profiles to allow the warehouse manager to determine which indexes and
aggregations are appropriate.
DATA WAREHOUSING OBJECTS

The following types of objects are commonly used in dimensional data warehouse schemas:

FACT TABLES

 Fact tables are the large tables in your warehouse schema that store business
measurements.
 Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables
representdata, usually numeric and additive, that can be analyzed and examined.
Examples include sales, cost, and profit.

DIMENSION TABLES

 Dimension tables, also known as lookup or reference tables, contain the relatively static
data in the warehouse.
 Dimension tables store the information you normally use to contain queries.
 Dimension tables are usually textual and descriptive and you can use them as the row
headers of the result set.
 Examples are customers, Location, Time, Suppliers or Products.
 Fact Tables
 A fact table typically has two types of columns: those that contain numeric facts (often
called measurements), and those that are foreign keys to dimension tables.
 A fact table contains either detail-level facts or facts that have been aggregated. Fact
tables that contain aggregated facts are often called SUMMARY TABLES.
 A fact table usually contains facts with the same level of aggregation.
 Though most facts are additive, they can also be semi-additive or non-additive. Additive
facts can be aggregated by simple arithmetical addition.
 A common example of this is sales. Non-additive facts cannot be added at all.
 An example of this is averages. Semi-additive facts can be aggregated along some of the
dimensions and not along others.
 An example of this is inventory levels, where you cannot tell what a level means simply
by looking at it.
Creating a new fact table

 You must define a fact table for each star schema.

 From a modeling standpoint, the primary key of the fact table is usually a composite key
that is made up of all of its foreign keys.
 Fact tables contain business event details for summarization. Fact tables are often very
large, containing hundreds of millions of rows and consuming hundreds of gigabytes or
multiple terabytes of storage.
 Because dimension tables contain records that describe facts, the fact table can be
reduced to columns for dimension foreign keys and numeric fact values. Text, BLOBs,
and denormalized data are typically not stored in the fact table

The definitions of this ‘sales’ fact table follow:

CREATE TABLE sales

prod_id NUMBER(7) CONSTRAINT sales_product_nn NOT NULL,

cust_id NUMBER CONSTRAINT sales_customer_nn NOT NULL,

time_id DATE CONSTRAINT sales_time_nn NOT NULL,

ad_id NUMBER(7),quantity_sold NUMBER(4) CONSTRAINT sales_quantity_nn

NOT NULL,

amount NUMBER(10,2) CONSTRAINT sales_amount_nn NOT NULL,

cost NUMBER(10,2) CONSTRAINT sales_cost_nn NOT NULL )

Multiple Fact Tables:

 Multiple fact tables are used in data warehouses that address multiple business functions,
such as sales, inventory, and finance.
 Each business function should have its own fact table and will probably have some
unique dimension tables.
 Any dimensions that are common across the business functions must represent the
dimension information in the same way, as discussed earlier in “Dimension Tables.”
 Each business function will typically have its own schema that contains a fact table,
several conforming dimension tables, and some dimension tables unique to the specific
business function.
 Such business-specific schemas may be part of the central data warehouse or
implemented as data marts. Very large fact tables may be physically partitioned for
implementation and maintenance design considerations.
 The partition divisions are almost always along a single dimension, and the time
dimension is the most common one to use because of the historical nature of most data
warehouse data.

Dimension Tables

 A dimension is a structure, often composed of one or more hierarchies, that categorizes

data.
 Dimensional attributes help to describe the dimensional value. They are normally
descriptive, textual values. Several distinct dimensions, combined with facts, enable you
to answer business questions.
 Commonly used dimensions are customers, products, and time. Dimension data is
typically collected at the lowest level of detail and then aggregated into higher-level
totals that are more useful for analysis.
 These natural rollups or aggregations within a dimension table are called hierarchies.
 A dimension table may be used in multiple places if the data warehouse contains multiple
fact tables or contributes data to data marts.
 A dimension such as customer, time, or product that is used in multiple schemas is called
a conforming dimension if all copies of the dimension are the same. Summarization data
and reports will not correspond if different schemas use different versions of a dimension
table.
The definitions of this ‘customer’ fact table follow:

CREATE TABLE customers ( cust_id NUMBER, cust_first_name VARCHAR2(20)

CONSTRAINT customer_fname_nn NOT NULL, cust_last_name VARCHAR2(40)
CONSTRAINT customer_lname_nn NOT NULL,cust_sex CHAR(1), cust_year_of_birth
NUMBER(4), cust_marital_status VARCHAR2(20),cust_street_address VARCHAR2(40)
CONSTRAINT customer_st_addr_nn NOT NULL, cust_postal_code VARCHAR2(10)
CONSTRAINT customer_pcode_nn NOT NULL,cust_city VARCHAR2(30) CONSTRAINT
customer_city_nn NOT NULL, cust_state_district VARCHAR2(40),country_id CHAR(2)
CONSTRAINT customer_country_id_nn NOT NULL, cust_phone_number VARCHAR2(25),
cust_income_level VARCHAR2(30), cust_credit_limit NUMBER, cust_email VARCHAR2(30)
)

CREATE DIMENSION products_dim

LEVEL product IS (products.prod_id)

LEVEL subcategory IS (products.prod_subcategory)

LEVEL category IS (products.prod_category)

HIERARCHY prod_rollup (

product CHILD OF

subcategory CHILD OF

ATTRIBUTE product DETERMINES products.prod_name

ATTRIBUTE product DETERMINES products.prod_desc

ATTRIBUTE subcategory DETERMINES products.prod_subcat_desc

ATTRIBUTE category DETERMINES products.prod_cat_desc;

 The records in a dimension table establish one-to-many relationships with the fact table.
 For example, there may be a number of sales to a single customer, or a number of sales of
a single product.
 The dimension table contains attributes associated with the dimension entry; these
attributes are rich and user-oriented textual details, such as product name or customer
name and address.
 Attributes serve as report labels and query constraints. Attributes that are coded in an
OLTP database should be decoded into descriptions.
 For example, product category may exist as a simple integer in the OLTP database, but
the dimension table should contain the actual text for the category.
 The code may also be carried in the dimension table if needed for maintenance. This
denormalization simplifies and improves the efficiency of queries and simplifies user
query tools.
 However, if a dimension attribute changes frequently, maintenance may be easier if the
attribute is assigned to its own table to create a snowflake dimension

Hierarchies:

 The data in a dimension is usually hierarchical in nature. Hierarchies are determined by

the business need to group and summarize data into usable information. For example, a
time dimension often contains the hierarchy elements: (all time), Year, Quarter, Month,
Day or Week.
 A dimension may contain multiple hierarchies – a time dimension often contains both
calendar and fiscal year hierarchies.
 Geography is seldom a dimension of its own; it is usually a hierarchy that imposes a
structure on sales points, customers, or other geographically distributed dimensions.
 An example geography hierarchy for sales points is: (all), country, region, state or
district, city, store
 Level relationships specify top-to-bottom ordering of levels from most general (the root)
to most specific information.
 They define the parent-child relationship between the levels in a hierarchy. Hierarchies
are also essential components in enabling more complex rewrites.
Multi-use dimensions

 Sometimes data warehouse design can be simplified by combining a number of small,

unrelated dimensions into a single physical dimension, often called a junk dimension.
 This can greatly reduce the size of the fact table by reducing the number of foreign keys
in fact table records. Often the combined dimension will be prepopulated with the
cartesian product of all dimension values.
 If the number of discrete values creates a very large table of all possible value
combinations, the table can be populated with value combinations as they are
encountered during the load or update process.
 A common example of a multi-use dimension is a dimension that contains customer
demographics selected for reporting standardization.
 Another multiuse dimension might contain useful textual comments that occur
infrequently in the source data records; collecting these comments in a single dimension
removes a sparse text field from the fact table and replaces it with a compact foreign key.

DATA WAREHOUSING SCHEMAS

 A schema is a collection of database objects, including tables, views, indexes, and

synonyms.
 You can arrange schema objects in the schema models designed for data warehousing in
a variety of ways.
 Most data warehouses use a dimensional model. The model of your source data and the
requirements of your users help you design the data warehouse schema.
 You can sometimes get the source model from your company’s enterprise data model and
reverse-engineer the logical data model for the data warehouse from this.
 The physical implementation of the logical data warehouse model may require some
changes to adapt it to your system parameters—size of machine, number of users, storage
capacity, type of network, and software
Dimensional Model Schemas

 The principal characteristic of a dimensional model is a set of detailed business facts

surrounded by multiple dimensions that describe those facts.
 When realized in a database, the schema for a dimensional model contains a central fact
table and multiple dimension tables.
 A dimensional model may produce a star schema or a snowflake schema.

Star Schemas

 A schema is called a star schema if all dimension tables can be joined directly to the fact
table.
 The following diagram shows a classic star schema. In the star schema design, a single
object (thefact table) sits in the middle and is radically connected to other surrounding
objects (dimension lookup tables) like a star.
 A star schema can be simple or complex. A simple star consists of one fact table; a
complex star can have more than one fact table.

Steps in Designing Star Schema

 Identify a business process for analysis (like sales).

 Identify measures or facts (sales dollar).
 Identify dimensions for facts (product dimension, location dimension, time dimension,
organization dimension).
 List the columns that describe each dimension (region name, branch name, subregion
name).
 Determine the lowest level of summary in a fact table (sales dollar).
Star schema with time dimension

Snowflake Schemas

 A schema is called a snowflake schema if one or more dimension tables do not join
directly to the fact table but must join through other dimension tables.
 For example, a dimension that describes products may be separated into three tables
(snowflaked).
 The snowflake schema is an extension of the star schema where each point of the star
explodes into more points.
 The main advantage of the snowflake schema is the improvement in query performance
due to minimized disk storage requirements and joining smaller lookup tables.
 The main disadvantage of the snowflake schema is the additional maintenance efforts
needed due to the increase number of lookup tables.
Important Aspects of Star Schema & Snowflake Schema

 In a star schema every dimension will have a primary key.

 In a star schema, a dimension table will not have any parent table.
 Whereas in a snowflake schema, a dimension table will have one or more parent tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
 Whereas hierarchies are broken into separate tables in snowflake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
7 pages
Entity Framework Core Cheat Sheet.
100% (2)
Entity Framework Core Cheat Sheet.
3 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
8 pages
DATA WAREHOUSE
No ratings yet
DATA WAREHOUSE
143 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Module 3 - Datawarehousing
No ratings yet
Module 3 - Datawarehousing
45 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
612719980-DATA-ware-house-mining-NOTES
No ratings yet
612719980-DATA-ware-house-mining-NOTES
31 pages
Data Warehouse
No ratings yet
Data Warehouse
73 pages
Data Mining
No ratings yet
Data Mining
65 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Data Warehousing Reema Thareja
0% (1)
Data Warehousing Reema Thareja
25 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Data Ware House and Its Purposes
No ratings yet
Data Ware House and Its Purposes
13 pages
CS 2208 DATA MINING AND WAREHOUSING NOTES
No ratings yet
CS 2208 DATA MINING AND WAREHOUSING NOTES
14 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Business Analytics
No ratings yet
Business Analytics
27 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Assignment 1
No ratings yet
Assignment 1
15 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
62 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Data Ware Housing1
No ratings yet
Data Ware Housing1
18 pages
Data Mining & Housing
No ratings yet
Data Mining & Housing
13 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
100% (1)
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
13 pages
Unit 1 - CS-703
No ratings yet
Unit 1 - CS-703
16 pages
Chapter Four
No ratings yet
Chapter Four
43 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
52 pages
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
No ratings yet
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
10 pages
6th_SEM Data Science Notes
No ratings yet
6th_SEM Data Science Notes
46 pages
dw part B notes for all unit
No ratings yet
dw part B notes for all unit
60 pages
Data Warehousing, Business Analytics and Online Analytical -1 (1)
No ratings yet
Data Warehousing, Business Analytics and Online Analytical -1 (1)
35 pages
DW PART A PART B NOTES
No ratings yet
DW PART A PART B NOTES
69 pages
Unit 2 Data Mining & Warehouse
No ratings yet
Unit 2 Data Mining & Warehouse
40 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
No ratings yet
Malineni Lakshmaiah Engineering College S.KONDA-523101 Andhra Pradesh
15 pages
Unit II Lecture Notes
No ratings yet
Unit II Lecture Notes
26 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
DW Unit1
No ratings yet
DW Unit1
26 pages
DWH Start l2
No ratings yet
DWH Start l2
117 pages
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
No ratings yet
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
10 pages
What Is a Data Warehouse
No ratings yet
What Is a Data Warehouse
9 pages
2024 Meeting 1 - Data Warehouse Fundamentals
No ratings yet
2024 Meeting 1 - Data Warehouse Fundamentals
47 pages
Course Overview: What Is Data Warehouse
No ratings yet
Course Overview: What Is Data Warehouse
75 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
1 & 2 Data Warehousing_021052
No ratings yet
1 & 2 Data Warehousing_021052
80 pages
Overview of Data Warehousing and OLAP: Slide 29-2
No ratings yet
Overview of Data Warehousing and OLAP: Slide 29-2
36 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
02 DW
No ratings yet
02 DW
84 pages
Components (Building Blocks) of Data Warehouse
No ratings yet
Components (Building Blocks) of Data Warehouse
17 pages
UCS15E08 - Cloud Computing - Unit 3 Notes
No ratings yet
UCS15E08 - Cloud Computing - Unit 3 Notes
13 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
18 pages
Data Mining Syllabus
No ratings yet
Data Mining Syllabus
2 pages
Price: SNO Month Price 1 Jan 75 2 Feb 70 3 March 65 4 April 80 5 May 85
No ratings yet
Price: SNO Month Price 1 Jan 75 2 Feb 70 3 March 65 4 April 80 5 May 85
2 pages
SRM Institute of Science and Technology Faculty of Science & Humanities
No ratings yet
SRM Institute of Science and Technology Faculty of Science & Humanities
2 pages
Basic Concepts of Network Security
No ratings yet
Basic Concepts of Network Security
16 pages
Assignment Report
No ratings yet
Assignment Report
23 pages
SQL Examples
No ratings yet
SQL Examples
10 pages
Complete SQL Cheat Sheet
No ratings yet
Complete SQL Cheat Sheet
4 pages
SQL Programs (1-15)
No ratings yet
SQL Programs (1-15)
16 pages
Database Design and Programming 20210071
No ratings yet
Database Design and Programming 20210071
440 pages
Non Rac To Rac
No ratings yet
Non Rac To Rac
3 pages
File System Vs Database
No ratings yet
File System Vs Database
4 pages
Lab Book
No ratings yet
Lab Book
250 pages
DB Tools Help
No ratings yet
DB Tools Help
17 pages
DB Audit: For Oracle, Microsoft SQL Server, Sybase ASE, Sybase ASA, and IBM DB2
No ratings yet
DB Audit: For Oracle, Microsoft SQL Server, Sybase ASE, Sybase ASA, and IBM DB2
5 pages
CM1603 Final
No ratings yet
CM1603 Final
14 pages
Setting Up Multi-Source Replication in MariaDB 10
No ratings yet
Setting Up Multi-Source Replication in MariaDB 10
71 pages
Database Ass
No ratings yet
Database Ass
25 pages
Manual de Freradius - Radius - Dalo 2020
No ratings yet
Manual de Freradius - Radius - Dalo 2020
10 pages
Module 1 Olap - Oltp
No ratings yet
Module 1 Olap - Oltp
12 pages
DBMS Interview Questions PDF
No ratings yet
DBMS Interview Questions PDF
14 pages
DB02 ER Model
No ratings yet
DB02 ER Model
53 pages
DB 3 PDF
No ratings yet
DB 3 PDF
5 pages
Intersystem Cache
No ratings yet
Intersystem Cache
20 pages
Fixpack3 Updated
No ratings yet
Fixpack3 Updated
8 pages
03 Introduction To PostgreSQL
No ratings yet
03 Introduction To PostgreSQL
43 pages
Lab1 3-Instalacion A2billing
No ratings yet
Lab1 3-Instalacion A2billing
14 pages
DBMS DOC 2[1]
No ratings yet
DBMS DOC 2[1]
10 pages
Database Concepts 6th Edition by Kroenke and Auer ISBN Test Bank
100% (45)
Database Concepts 6th Edition by Kroenke and Auer ISBN Test Bank
16 pages
MSBTE Practical Questions
No ratings yet
MSBTE Practical Questions
10 pages
Peter Premkumar Yendluri
No ratings yet
Peter Premkumar Yendluri
8 pages
Oracle Alerts: Mona Lisa Sahu
No ratings yet
Oracle Alerts: Mona Lisa Sahu
35 pages
RDBMS_UNIT2_ERD
No ratings yet
RDBMS_UNIT2_ERD
7 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
43 pages

Unit 3 Notes

Uploaded by

Unit 3 Notes

Uploaded by

UNIT 3 NOTES

DATAWARE HOUSE – INTRODUCTION

A paradox occurs: data exists but information cannot be obtained. In general, a DW is

A data warehouse has three main components:

1. A “Central Data Warehouse” or “Operational Data Store(ODS)”, which is a data base

3. The “legacy systems” where an enterprise’s data are currently kept.

THE CENTRAL DATA WAREHOUSE

(ii) To maximize the efficiency of queries

GOALS OF DATA WAREHOUSE ARCHITECTURE

1. Deliver a great user experience—user acceptance is the measure of success.

2. Function without interfering with OLTP systems.

3. Provide a central repository of consistent data.

4. Answer complex queries quickly.

1. Based on a dimensional model

2. Contain historical data

3. Include both detailed and summarized data

4. Consolidate disparate data from multiple sources while retaining consistency

5. Focus on a single subject such as sales, inventory, or finance

DATA WAREHOUSE USERS

Data warehouse users can be divided into four categories:

 Executives are a special case of the Information Consumers group.

EXTRACT DATA FROM SOURCE

WAREHOUSE MANAGER ARCHITECTURE

A warehouse manager includes the following −

 The controlling process

A warehouse manager performs the following functions −

 Analyzes the data to perform consistency and referential integrity checks.

QUERY MANAGER ARCHITECTURE

A query manager includes the following components −

 Query redirection via C tool or RDBMS

FUNCTIONS OF QUERY MANAGER

 It presents the data to the user in a form they understand.

 You must define a fact table for each star schema.

The definitions of this ‘sales’ fact table follow:

CREATE TABLE sales

prod_id NUMBER(7) CONSTRAINT sales_product_nn NOT NULL,

cust_id NUMBER CONSTRAINT sales_customer_nn NOT NULL,

time_id DATE CONSTRAINT sales_time_nn NOT NULL,

ad_id NUMBER(7),quantity_sold NUMBER(4) CONSTRAINT sales_quantity_nn

amount NUMBER(10,2) CONSTRAINT sales_amount_nn NOT NULL,

cost NUMBER(10,2) CONSTRAINT sales_cost_nn NOT NULL )

Multiple Fact Tables:

 A dimension is a structure, often composed of one or more hierarchies, that categorizes

CREATE TABLE customers ( cust_id NUMBER, cust_first_name VARCHAR2(20)

CREATE DIMENSION products_dim

LEVEL product IS (products.prod_id)

LEVEL subcategory IS (products.prod_subcategory)

LEVEL category IS (products.prod_category)

ATTRIBUTE product DETERMINES products.prod_name

ATTRIBUTE product DETERMINES products.prod_desc

ATTRIBUTE subcategory DETERMINES products.prod_subcat_desc

ATTRIBUTE category DETERMINES products.prod_cat_desc;

 The data in a dimension is usually hierarchical in nature. Hierarchies are determined by

 Sometimes data warehouse design can be simplified by combining a number of small,

DATA WAREHOUSING SCHEMAS

 A schema is a collection of database objects, including tables, views, indexes, and

 The principal characteristic of a dimensional model is a set of detailed business facts

Steps in Designing Star Schema

 Identify a business process for analysis (like sales).

 In a star schema every dimension will have a primary key.

You might also like