0% found this document useful (0 votes)

7 views

Module 3 - Datawarehousing

Uploaded by

ambika venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Module 3 - Datawarehousing

Uploaded by

ambika venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Module-3

Data Warehousing
Contents

 Data Warehousing Definitions and Concepts

 Data Warehousing Process Overview

 Data Warehousing Architectures

 Data Integration and the Extraction Transformation, and Load (ETL)

Processes
What Is a Data Warehouse?

 a data warehouse (DW) is a pool of data produced to support decision

making
 a repository of current and historical data of potential interest to
managers throughout the organization.
 Data are usually structured to be available in a form ready for analytical
processing activities (i.e., online analytical processing [OLAP], data
mining, querying, reporting, and other decision support applications)
 The data warehouse is a collection of integrated, subject-oriented
databases designed to support DSS functions, where each unit of data is
non-volatile and relevant to some moment in time
A Historical Perspective to Data Warehousing

ü Mainframe computers ü Centralized data storage ü Big Data analytics

ü Simple data entry ü Data warehousing was born ü Social media analytics
ü Routine reporting ü Inmon, Building the Data Warehouse ü Text and Web Analytics
ü Primitive database structures ü Kimball, The Data Warehouse Toolkit ü Hadoop, MapReduce, NoSQL
ü Teradata incorporated ü EDW architecture design ü In-memory, in-database

1970s 1980s 1990s 2000s 2010s

ü Mini/personal computers (PCs) ü Exponentially growing data Web data

ü Business applications for PCs ü Consolidation of DW/BI industry
ü Distributer DBMS ü Data warehouse appliances emerged
ü Relational DBMS ü Business intelligence popularized
ü Teradata ships commercial DBs ü Data mining and predictive modeling
ü Business Data Warehouse coined ü Open source software
ü SaaS, PaaS, Cloud Computing
 The motivations that led to developing data warehousing technologies go
back to the 1970s, when the computing world was dominated by the
mainframes.
 Real business data-processing applications, the ones run on the corporate
mainframes, had complicated file structures using early-generation
databases in which they stored data.
 Although these applications did a decent job of performing routine
transactional data-processing functions, the data created as a result of
these functions was locked away in the depths of the files and databases.
 When aggregated information such as sales trends by region and by
product type was needed, one had to formally request it from the data-
processing department, where it was put on a waiting list with a couple
hundred other report requests
 Later in this decade, commercial hardware and software companies began to emerge
with solutions to this problem. Founders worked to design a database management
system for parallel processing with multiple microprocessors, targeted specifically for
decision support.
 The 1980s were the decade of personal computers and minicomputers.
 Real computer applications were no longer only on mainframes; they were all over the
place-everywhere you looked in an organization. That led to a portentous problem
called islands of data.
 The solution - distributed database management system, which would pull the
requested data from databases across the organization, bring all the data back to the
same place, and then consolidate it, sort it, and do whatever else was necessarily to
answer the user's question.
 Although the concept was a good one and early results from research were promising,
the results were plain and simple: They just didn't work efficiently in the real world, and
• In the 1990s a new approach to solving the islands-of-data problem
surfaced. The 1990s philosophy involved going back to the 1970s
method, in which data from those places was copied to another
location-only doing it right this time;hence, data warehousing was born.
• In 1993, Bill Inmon wrote the seminal book Building the Data
Warehouse. Many people recognize Bill as the father of data
warehousing.
• In the 2000s, in the world of data warehousing, both popularity and the
amount of data continued to grow.
• In the 2010s the big buzz has been Big Data. The technologies that
came with Big Data include Hadoop, MapReduce, NoSQL, Hive, and so
forth
Characteristics of Data Warehousing

1. Subject oriented
2. Integrated
3. Time-variant (time series)
4. Nonvolatile
5. Web based
6. relational/multi-dimensional
7. Client/server
8. real-time
9. Include metadata
Characteristics of Data Warehousing

1. Subject oriented
 Data are organized by detailed subject, such as sales, products, or
customers, containing only information relevant for decision support.
 Subject orientation enables users to determine not only how their
business is performing, but why.
 Subject orientation provides a more comprehensive view of the
organization.
2. Integrated
 A data warehouse is developed by integrating data from varied sources
into a consistent format.
 The data must be stored in the warehouse in a consistent and universally
acceptable manner in terms of naming, format, and coding.
3. Time variant
 A warehouse maintains historical data. The data do not necessarily
provide current status (except in real-time systems).
 They detect trends, deviations, and long-term relationships for
forecasting and comparisons, leading to decision making.
 The data stored in a data warehouse is documented with an element of
time, either explicitly or implicitly.
 Data for analysis from multiple sources contains multiple time points
(e.g., daily, weekly, monthly views).
4. Nonvolatile
 Data once entered into a data warehouse must remain unchanged.
 All data is read-only. Previous data is not erased when current data is
entered.
 This helps you to analyze what has happened and when..
5. Web based
 Data warehouses are typically designed to provide an efficient
computing environment for Web-based applications.
6. Relational/multidimensional
 A data warehouse uses either a relational structure or a
multidimensional structure.
 Relational models are flat, ie. tables are two-dimensional;
multidimensional models can have more then two dimensions
7. Client/server
 A data warehouse uses the client/ server architecture to provide easy
access for end users.
8. Real time
 Newer data warehouses provide real-time, or active, data-access and
analysis capabilities
9. Include metadata
 A data warehouse contains metadata (data about data) about how
the data are organized and how to effectively use them.
Data Marts

 subset of a data warehouse, typically consisting of a single subject area (e.g.,

marketing, operations).
 can be either dependent or independent.
 Dependent data mart
• a subset that is created directly from the data warehouse.
• has the advantages of using a consistent data model and providing quality
data.
• ensures that the end user is viewing the same version of the data that is
accessed by all other data warehouse users.
• The high cost of data warehouses limits their use to large companies.
 Independent data mart
• a lower-cost, scaled-down version of a data warehouse.
• small warehouse designed for a strategic business unit (SBU) or a department,
Operational Data Stores

 A type of database often used as an interim area for a data warehouse

 Unlike the static contents of a data warehouse, the contents of an ODS are updated

throughout the course of business operations.

 An ODS is used for short-term decisions involving mission-critical applications rather

than for the medium- and long-term decisions associated with an EDW.
 An ODS is similar to short-term memory in that it stores only very recent

information. In comparison, a data warehouse is like long-term memory because it

stores permanent information.

 An ODS consolidates data from multiple source systems and provides a near-real-

time, integrated view of volatile, current data.

 Oper marts are created when operational data needs to be analyzed

multidimensionally. The data for an oper mart come from an ODS.

Enterprise Data Warehouses (EDW)

 A data warehouse for the enterprise.

 a large-scale data warehouse that is used across the enterprise for
decision support
 The large-scale nature provides integration of data from many sources
into a standard format for effective BI and decision support applications.
 EDW are used to provide data for many types of DSS, including CRM,
supply chain management (SCM), business performance management
(BPM), business activity monitoring (BAM), product life-cycle
management (PLM), revenue management, and sometimes even
knowledge management systems (KMS).
Metadata

 Metadata are data about data.

 Metadata describe the structure of and some meaning about data,
thereby contributing to their effective or ineffective use.
 In a data warehouse, metadata describe the contents of a data
warehouse and the manner of its acquisition and use
Data Warehousing Process Overview

 Many organizations need to create data warehouses-massive data

stores of time series data for decision support.
 Data are imported from various external and internal resources
and are cleansed and organized in a manner consistent with the
organization's needs.
 After the data are populated in the data warehouse, data marts
can be loaded for a specific area or department.
 Alternatively, data marts can be created first, as needed, and then
integrated into an EDW.
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select
Legacy Metadata Data/text

/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,

API
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data

Fig: A Data Warehouse Framework and Views.

 The following are the major components of the data warehousing process:

1. Data sources
- Data are sourced from multiple independent operational "legacy" systems and

possibly from external data providers (such as the U.S. Census).

- Data may also come from an OLTP or ERP system. Web data in the form of Web

logs may also feed a data warehouse.

2. Data extraction and transformation

- Data are extracted and properly transformed using custom-written or

commercial software called ETL.

3. Data loading
- Data are loaded into a staging area, where they are transformed and cleansed.
- The data are then ready to load into the data warehouse and/or data marts.
4. Comprehensive database
- EDW to support all decision analysis by providing relevant summarized

and detailed information originating from many different sources.

5. Metadata
- Metadata are maintained so that they can be assessed by IT personnel

and users.
- Metadata include software programs about data and rules for organizing

data summaries that are easy to index and search , especially with Web

tools.
6. Middleware tools
- enable access to the data warehouse
- Power users such as analysts may write their own SQL queries
- Others may employ a managed query environment, such as Business
Objects, to access data.
- There are many front-end applications that business users can use to
interact with data stored in the data repositories, including data mining,
OLAP, reporting tools, and data visualization tools.
DATA WAREHOUSING ARCHITECTURES
 client/ server or n-tier architectures
• two-tier architectures
• three-tier architectures
 multi-tiered architectures are known to be capable of serving the needs of large-scale,
performance demanding information systems such as data warehouses.

• Three parts:

1. The data warehouse itself, which contains the data and associated software

2. Data acquisition (back-end) software, which extracts data from legacy systems and
external sources, consolidates and summarizes them, and loads them into the
data warehouse

3. Client (front-end) software, which allows users to access and analyze data from
the warehouse
3-tier architecture

Tier 1: Tier 2: Tier 3:

Client workstation Application server Database server

 In a three-tier architecture, operational systems contain the data and the

software for data acquisition, the data warehouse (i.e., the server), the data
warehouse in one tier, and the other tier includes the DSS/BI/BA engine (i.e., the
application server) and the client
 Data from the warehouse are processed twice and deposited in an additional
multidimensional database, organized for easy multidimensional analysis and
presentation, or replicated in data marts.
 advantage : separation of the functions of the data warehouse, which eliminates
resource constraints and makes it possible to easily create data marts.
2-tier architecture

 In a two-tier architecture, the DSS

engine physically runs on the same
hardware platform as the data
warehouse Therefore, it is more
economical than the three-tier
Tier 1: Tier 2:
structure. Client workstation Application & database server

 The two-tier architecture can have

performance problems for large data
warehouses that work with data-
intensive applications for decision
support.
Web-based data warehousing
 Data warehousing and the Internet are two key technologies that offer
important solutions for managing corporate data.
 The integration of these two technologies produces Web-based data
warehousing.
 The architecture is three tiered and includes the PC client, Web server, and
application server.
1. On the client side, the user needs an Internet connection and a Web
browser (preferably Java enabled) through the familiar graphical user
interface (GUI).
2. The Internet/ intranet/ extranet is the communication medium between
client and servers.
3. On the server side, a Web server is used to manage the inflow and
outflow of information between client and server. It is backed by both a
data warehouse and an application server.
 Web-based data warehousing offers several compelling advantages,
including ease of access, platform independence, and lower cost.
 Page-loading speed is an important consideration in designing Web-based
 Several issues must be considered when deciding which architecture to
use. Among them are the following:

1.Which database management system (DBMS) should be used?

2.Will parallel processing and/or partitioning be used?

3.Will data migration tools be used to load the data warehouse?

4.What tools will be used to support data retrieval and analysis?

1. Which database management system (DBMS) should be used?
 Most data warehouses are built using relational database management
systems (RDBMS). Oracle ,SQL Server and DB2 are the ones most
commonly used.
 Each of these products supports both client/server and Web-based
architectures.
2. Will parallel processing and/or partitioning be used?
 Parallel processing enables multiple CPUs to process data warehouse
query requests simultaneously and provides scalability.
 Data warehouse designers need to decide whether the database tables
will be partitioned (i.e., split into smaller tables) for access efficiency and
what the criteria will be.
 This is an important consideration that is necessitated by the large
amounts of data contained in a typical data warehouse.
3. Will data migration tools be used to load the data warehouse?
 Moving data from an existing system into a data warehouse is a tedious
and laborious task.
 Depending on the diversity and the location of the data assets, migration
may be a relatively simple procedure or (in contrast) a months-long
project.
 The results of a thorough assessment of the existing data assets should
be used to determine whether to use migration tools and, if so, what
capabilities to seek in those commercial tools.
4. What tools will be used to support data retrieval and analysis?
 Often it is necessary to use specialized tools to periodically locate,
access, analyze, extract, transform, and load necessary data into a
data warehouse.
 A decision has to be made on
(1) developing the migration tools in-house
(2) purchasing them from a third-party provider, or
(3) using the ones provided with the data warehouse system.
Alternative Data Warehousing Architectures

 The five architectures alternatives to the basic architectural design types

1.Independent data marts.
2.Data mart bus architecture.
3.Hub-and-spoke architecture.
4.Centralized data warehouse.
5.Federated data warehouse
1. Independent data marts.

 simplest and the least costly architecture alternative

 The data marts are developed to operate independently of each
another to serve the needs of individual organizational units
 Because of their independence, they may have inconsistent data
definitions and different dimensions and measures, making it difficult to
analyze data across the data marts
2. Data mart bus architecture

 This architecture is a viable alternative to the independent data marts

where the individual marts are linked to each other via some kind of
middleware
 Because the data are linked among the individual marts, there is a better
chance of maintaining data consistency across the enterprise
 Even though it allows for complex data queries across data marts, the
performance of these types of analysis may not be at a satisfactory level.
3. Hub-and-spoke architecture.

 perhaps the most famous data warehousing architecture today

 Here the attention is focused on building a scalable and maintainable
infrastructure that includes a centralized data warehouse and several
dependent data marts (each for an organizational unit)
 This architecture allows for easy customization of user interfaces and reports.
 On the negative side, this architecture lacks the holistic enterprise view, and
may lead to data redundancy and data latency.
4. Centralized data warehouse

 similar to the hub-and-spoke architecture except that there are no dependent

data marts; instead, there is a gigantic enterprise data warehouse that serves the
needs of all organizational units
 provides users with access to all data in the data warehouse instead of limiting
them to data marts.
 it reduces the amount of data the technical team has to transfer or change,
therefore simplifying data management and administration.
 If designed and implemented properly, this architecture provides a timely and
holistic view of the enterprise to whomever, whenever, and wherever they may be
within the organization.
5. Federated data warehouse.

 The federated approach is a concession to the natural forces that

undermine the best plans for developing a perfect system
 It uses all possible means to integrate analytical resources from multiple
sources to meet changing needs or business conditions
 Essentially, the federated approach involves integrating disparate systems
 In a federated architecture, existing decision support structures are left in
place, and data are accessed from those sources as needed
 The federated approach is supported by middleware vendors that propose
distributed query and join capabilities.
 These eXtensible Markup Language (XML)-based tools offer users a global
view of distributed data sources, including data warehouses, data marts,
Web sites, documents, and operational systems.
 When users choose query objects from this view and press the submit
button, the tool automatically queries the distributed sources, joins the
results, and presents them to the user.
 Because of performance and data quality issues, most experts agree that
federated approaches work well to supplement data warehouses, not
replace them
 Each architecture has advantages and disadvantages!
 Which architecture is the best?
 Ten factors that potentially affect the architecture selection decision

1. Information interdependence 6. Strategic view of the data

between organizational units warehouse prior to
implementation
2. Upper management’s 7. Compatibility with existing
information needs systems
8. Perceived ability of the in-
3. Urgency of need for a data
house IT staff
warehouse 9. Technical issues
4. Nature of end-user tasks 10. Social/political factors
5. Constraints on resources
Teradata Corp. DW Architecture
Data Integration and the Extraction, Transformation, and Load Process

Data Integration
 comprises three major processes that, when correctly implemented,
permit data to be accessed and made accessible to an array of ETL and
analysis tools and the data warehousing environment:
- data access (i.e., the ability to access and extract data from any data
source)
- data federation (i.e. , the integration of business views across multiple
data stores)
-change capture (based on the identification, capture, and delivery of
the changes made to enterprise data sources) from many sources.
 Some vendors, such as SAS Institute, Inc., have developed strong data
integration tools.
 The SAS enterprise data integration server includes customer data
integration tools that improve data quality in the integration process.
 The Oracle Business Intelligence Suite assists in integrating data as well.
 A major purpose of a data warehouse is to integrate data from multiple
systems.
 Various integration technologies enable data and metadata integration:
• Enterprise application integration (EAI)
• Service-oriented architecture (SOA)
• Enterprise information integration (Ell)
• Extraction, transformation, and load (ETL)
Enterprise application integration (EAi)
 provides a vehicle for pushing data from source systems into the data
warehouse.
 It involves integrating application functionality and is focused on sharing
functionality (rather than data) across systems, thereby enabling flexibility and
reuse.
 Traditionally, EAI solutions have focused on enabling application reuse at the
application programming interface (API) level.
 Recently, EAI is accomplished by using SOA coarse-grained services (a collection
of business processes or functions) that are well defined and documented.
• Using Web services is a specialized way of implementing an SOA.
• EAI can be used to facilitate data acquisition directly into a near-real-time data
warehouse or to deliver decisions to the OLTP systems.
• There are many different approaches to and tools for EAI implementation.
Enterprise information integration (Ell)
 an evolving tool space that promises real-time data integration from a variety of

sources, such as relational databases, Web services, and multidimensional

databases.
 It is a mechanism for pulling data from source systems to satisfy a request for

information. Ell tools use predefined metadata to populate views that make

integrated data appear relational to end users.

 XML may be the most important aspect of Ell because XML allows data to be

tagged either at creation time or later.

 These tags can be extended and modified to accommodate almost any area of

Knowledge Physical data integration has conventionally been the main

mechanism for creating an integrated view with data warehouses and data marts.

Web Design Business Plan
67% (3)
Web Design Business Plan
43 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Better Out Than in 1
No ratings yet
Better Out Than in 1
12 pages
Data Warehousing
No ratings yet
Data Warehousing
35 pages
Unit 1
No ratings yet
Unit 1
22 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Data Warehouse
No ratings yet
Data Warehouse
73 pages
Topic 4 (Data Warehouse)
No ratings yet
Topic 4 (Data Warehouse)
41 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Warehousing
No ratings yet
Data Warehousing
15 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
BI Chapter 03 - Unlocked
No ratings yet
BI Chapter 03 - Unlocked
80 pages
Data Warehouse
No ratings yet
Data Warehouse
56 pages
Chap 2 - Data Warehousing Part I (2)
No ratings yet
Chap 2 - Data Warehousing Part I (2)
31 pages
DATA WAREHOUSING
No ratings yet
DATA WAREHOUSING
23 pages
Bca Vi Sem (Datawartehousing) Unit - I Notes
No ratings yet
Bca Vi Sem (Datawartehousing) Unit - I Notes
66 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Introduction to Data Warehouse
No ratings yet
Introduction to Data Warehouse
42 pages
CH08 DSS Turban Data Warehouse
No ratings yet
CH08 DSS Turban Data Warehouse
65 pages
Presentation Prepared By:: Aqsa Ashfaq
No ratings yet
Presentation Prepared By:: Aqsa Ashfaq
22 pages
Unit-3 - I MGN 343
No ratings yet
Unit-3 - I MGN 343
61 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
EDWH
No ratings yet
EDWH
10 pages
DWDM Notes/Unit 1
No ratings yet
DWDM Notes/Unit 1
31 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
DWDM
No ratings yet
DWDM
15 pages
Org Mem 1
No ratings yet
Org Mem 1
28 pages
DWM Unit 1
No ratings yet
DWM Unit 1
34 pages
CS 2208 DATA MINING AND WAREHOUSING NOTES
No ratings yet
CS 2208 DATA MINING AND WAREHOUSING NOTES
14 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
14 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
Warehousing Des-WPS Office
No ratings yet
Warehousing Des-WPS Office
7 pages
INFORMATION MANAGEMENT Unit 3 NEW
100% (1)
INFORMATION MANAGEMENT Unit 3 NEW
61 pages
BIDW Concepts
100% (1)
BIDW Concepts
56 pages
2024 Meeting 1 - Data Warehouse Fundamentals
No ratings yet
2024 Meeting 1 - Data Warehouse Fundamentals
47 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
BA unit2 own
No ratings yet
BA unit2 own
10 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Turban Dss9e Ch08
No ratings yet
Turban Dss9e Ch08
50 pages
1 & 2 Data Warehousing_021052
No ratings yet
1 & 2 Data Warehousing_021052
80 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
Data Ware Housing1
No ratings yet
Data Ware Housing1
18 pages
Materi Data Warehouse
No ratings yet
Materi Data Warehouse
51 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
DW NOTES
No ratings yet
DW NOTES
4 pages
Business Intelligence - Data Warehouse Implementation
100% (1)
Business Intelligence - Data Warehouse Implementation
157 pages
Dw Midterms Notes
No ratings yet
Dw Midterms Notes
48 pages
DWDM - Unit - I
No ratings yet
DWDM - Unit - I
70 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
DB m8 9 10 11 PDF
No ratings yet
DB m8 9 10 11 PDF
170 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Simad University: Chapter 8: Data Warehousing
No ratings yet
Simad University: Chapter 8: Data Warehousing
9 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
What Is a Data Warehouse
No ratings yet
What Is a Data Warehouse
9 pages
DATA WAREHOUSE - Imp
No ratings yet
DATA WAREHOUSE - Imp
76 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Module 5
No ratings yet
Module 5
38 pages
String
No ratings yet
String
17 pages
Unit-1 - Introduction To E-Commerce
No ratings yet
Unit-1 - Introduction To E-Commerce
19 pages
POINTERS
No ratings yet
POINTERS
29 pages
Module 2 - Computerized Decision Support
No ratings yet
Module 2 - Computerized Decision Support
9 pages
Module 2 - Modeling & Analysis
No ratings yet
Module 2 - Modeling & Analysis
9 pages
Digitalfluencyccig V
No ratings yet
Digitalfluencyccig V
145 pages
Basics of Internet, Intranet, E-Mail, Audio and Video-Conferencing (ICT)
No ratings yet
Basics of Internet, Intranet, E-Mail, Audio and Video-Conferencing (ICT)
5 pages
Worksheet - Experiment 4 Color Reactions of Proteins
No ratings yet
Worksheet - Experiment 4 Color Reactions of Proteins
3 pages
ANSI B4.2 - 1978 Preferred Metric Limits and Fits
100% (3)
ANSI B4.2 - 1978 Preferred Metric Limits and Fits
72 pages
Japan Test Report: Report No.: JR1D0705AD Page No.: 1 of 18 Report Version: Rev. 01
No ratings yet
Japan Test Report: Report No.: JR1D0705AD Page No.: 1 of 18 Report Version: Rev. 01
64 pages
MyEdspace response on our request for refund
No ratings yet
MyEdspace response on our request for refund
3 pages
ICT Assignment 2
No ratings yet
ICT Assignment 2
2 pages
CBM-SCM N Link of Import - Export
No ratings yet
CBM-SCM N Link of Import - Export
15 pages
Ableton Live API
No ratings yet
Ableton Live API
51 pages
Bella Report
No ratings yet
Bella Report
4 pages
District Calendar
No ratings yet
District Calendar
1 page
Application Form-B.tech
No ratings yet
Application Form-B.tech
4 pages
Dear Guest Kullu Manali by Volvo Package 4n5d 7
No ratings yet
Dear Guest Kullu Manali by Volvo Package 4n5d 7
6 pages
BAE Systems Cyber Threat Intelligence Brochure
No ratings yet
BAE Systems Cyber Threat Intelligence Brochure
12 pages
s10661-025-13940-8
No ratings yet
s10661-025-13940-8
18 pages
Reservoir Characterization Thesis-98
No ratings yet
Reservoir Characterization Thesis-98
1 page
Idiomatic Expressions in Bayambang, Pangasinan
No ratings yet
Idiomatic Expressions in Bayambang, Pangasinan
13 pages
General Surgery: Post-Operative Care and Management
No ratings yet
General Surgery: Post-Operative Care and Management
9 pages
Sipgear TK 550
No ratings yet
Sipgear TK 550
5 pages
Observation and Reflection 1
No ratings yet
Observation and Reflection 1
4 pages
Science - Grade 7
No ratings yet
Science - Grade 7
10 pages
JBASE Tools
No ratings yet
JBASE Tools
86 pages
Rakesh Offer Letter
No ratings yet
Rakesh Offer Letter
6 pages
Gautam Adani
No ratings yet
Gautam Adani
2 pages
Diploma 6TH Sem Project
No ratings yet
Diploma 6TH Sem Project
34 pages
2021 EARTH AND LIFE - Flexible Evaluation Mechanism
No ratings yet
2021 EARTH AND LIFE - Flexible Evaluation Mechanism
2 pages
Steel Plant Layout
75% (4)
Steel Plant Layout
40 pages
An Introduction To GSD - General Sewing Data - TEXTILE LIBRARY
No ratings yet
An Introduction To GSD - General Sewing Data - TEXTILE LIBRARY
5 pages
Minimization of Construction Waste in Chennai Construction Industry
No ratings yet
Minimization of Construction Waste in Chennai Construction Industry
7 pages
9cn0 01 Que 20220527
No ratings yet
9cn0 01 Que 20220527
40 pages

Module 3 - Datawarehousing

Uploaded by

Module 3 - Datawarehousing

Uploaded by

Module-3

 Data Warehousing Definitions and Concepts

 Data Warehousing Process Overview

 Data Warehousing Architectures

 Data Integration and the Extraction Transformation, and Load (ETL)

 a data warehouse (DW) is a pool of data produced to support decision

ü Mainframe computers ü Centralized data storage ü Big Data analytics

1970s 1980s 1990s 2000s 2010s

ü Mini/personal computers (PCs) ü Exponentially growing data Web data

 subset of a data warehouse, typically consisting of a single subject area (e.g.,

 A type of database often used as an interim area for a data warehouse

throughout the course of business operations.

information. In comparison, a data warehouse is like long-term memory because it

stores permanent information.

time, integrated view of volatile, current data.

multidimensionally. The data for an oper mart come from an ODS.

 A data warehouse for the enterprise.

 Metadata are data about data.

 Many organizations need to create data warehouses-massive data

Fig: A Data Warehouse Framework and Views.

possibly from external data providers (such as the U.S. Census).

logs may also feed a data warehouse.

2. Data extraction and transformation

- Data are extracted and properly transformed using custom-written or

commercial software called ETL.

and detailed information originating from many different sources.

Tier 1: Tier 2: Tier 3:

 In a three-tier architecture, operational systems contain the data and the

 In a two-tier architecture, the DSS

 The two-tier architecture can have

1.Which database management system (DBMS) should be used?

2.Will parallel processing and/or partitioning be used?

3.Will data migration tools be used to load the data warehouse?

4.What tools will be used to support data retrieval and analysis?

 The five architectures alternatives to the basic architectural design types

 simplest and the least costly architecture alternative

 This architecture is a viable alternative to the independent data marts

 perhaps the most famous data warehousing architecture today

 similar to the hub-and-spoke architecture except that there are no dependent

 The federated approach is a concession to the natural forces that

1. Information interdependence 6. Strategic view of the data

sources, such as relational databases, Web services, and multidimensional

integrated data appear relational to end users.

tagged either at creation time or later.

Knowledge Physical data integration has conventionally been the main

You might also like