0% found this document useful (0 votes)
79 views

Unit 1 - DWM

Uploaded by

vikasbhowate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Unit 1 - DWM

Uploaded by

vikasbhowate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 112

St.

Vincent Pallotti College of Engineering &


Technology

Data Warehousing and Mining


(BEIT701T)
7th Sem B.E. (IT)
Presented By

Samir Siddiqui
CR FINAL YEAR
Department of Information Technology
1
Decision Support System

DSS (Decision Support Systems) also known


as EIS (Executive Information Systems)
supports organization’s leading decision
makers for making complex and important
decisions.
Why DSS?

 Management is decision making.

 The manager is a decision maker.

 Organizations are filled with decision makers at different level.

 However decision making today is becoming more complicated:


 Technology / Information/Computers : increasing  More alternative to
choose
 Structural Complexity / Competition : increasing  larger cost of error
 International markets / Consumerism : increasing  more uncertainty
about future
 Changes, Fluctuations : increasing  need for quick decision
Management problems
Most management problems for which decisions are sought can be
represented by three standard elements – objectives, decision variables, and
constraints.

 Objective
 Provide earliest Maximize profit
 entry into market
 Minimize employee discomfort/turnover

 Decision variables
 Determine what price to use
 Determine length of time tests should be run on a new product/service
 Determine the responsibilities to assign to each worker

 Constraints
 Can’t charge below cost
 Test enough to meet minimum safety regulations
 Ensure responsibilities are at most shared by two workers
Characteristics and Capabilities of DSS

The key DSS characteristics and capabilities are as follows:


1. Support for decision makers in semistructured and unstructured problems.
2. Support managers at all levels.
3. Support individuals and groups.
4. Support for interdependent or sequential decisions.
5. Support intelligence, design, choice, and implementation.
6. Support variety of decision processes and styles.
7. DSS should be adaptable and flexible.
8. DSS should be interactive ease of use.
9. Effectiveness, but not efficiency.
10. Complete control by decision-makers.
11. Ease of development by end users.
12. Support modeling and analysis.
13. Data access.
14. Standalone, integration and Web-based
Decision making characteristics
 Decision is made based on the information
available.
 At each part of the assessment, there may
have to be iterative development to take
account improvement in data that take place
as the project proceeds.
 A project will not go ahead unless there is
adequate funding.
Types of Decisions
The most common types of decisions that an organization usually makes are
given as follows:
Programmed decisions: Repetitive practice, routine jobs, e.g. customer
complaint.
Non-programmed decisions: non-routine jobs, no need to set the guidelines
or rules , unplanned, e.g. a decision on whether the fir should go for a
merger/acquisition or not.
Strategic
decisions: Long term decisions, e.g. a decision on whether the
company should launch a new product.
Tactical decisions: Medium term decisions, implementing strategic
decisions, e.g. market analysis for a new product.
Operational decisions: Short term decisions, guiding to perform the regular
operations, e.g. the decision to hire a particular logistic company to make
deliveries.
History of DSS

Goal: Use best parts of IS, OR/MS, AI & cognitive science to support more
effective decision
What is Operations Research?
Operations
The activities carried out in an organization.
 
Research
The process of observation and testing characterized
by the scientific method. Situation, problem
statement, model construction, validation,
experimentation, candidate solutions.
 
Model
An abstract representation of reality. Mathematical,
physical, narrative, set of rules in computer program.
Systems Approach
Include broad implications of decisions for the
organization at each stage in analysis. Both quantitative
and qualitative factors are considered.

Optimal Solution
A solution to the model that optimizes (maximizes or
minimizes) some measure of merit over all feasible
solutions.
 
Team
A group of individuals bringing various skills and
viewpoints to a problem.
 
Operations Research Techniques
A collection of general mathematical models, analytical
procedures, and algorithms.
Artificial Intelligence
• Behavior by a machine that, if performed by
a human being, would be considered
intelligent
• “…study of how to make computers do
things at which, at the moment, people are
better” (Rich and Knight [1991])
• Theory of how the human mind works
(Mark Fox)

Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson 22
6th ed, Copyright 2001, Prentice Hall, Upper Saddle River, NJ
AI Objectives
• Make machines smarter (primary goal)
• Understand what intelligence is (Nobel
Laureate purpose)
• Make machines more useful
(entrepreneurial purpose)

(Winston and Prendergast [1984])

Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson 23
6th ed, Copyright 2001, Prentice Hall, Upper Saddle River, NJ
What Is Cognitive Science?
 The (interdisciplinary) study of mind and
intelligence. (e.g. neural network and robot)

 The study of cognitive processes involved in the


acquisition, representation and use of human
knowledge.

 The scientific study of the mind, the brain, and


intelligent behaviour, whether in humans,
animals, machines or the abstract.

A discipline in the process of construction.


Information Systems to support
decisions
Management Decision Support
Information Systems
Systems
Decision Provide information about Provide information and
support the performance of the techniques to analyze
provided organization specific problems
Information Periodic, exception, Interactive inquiries and
form and demand, and push reports responses
frequency and responses
Information Prespecified, fixed format Ad hoc, flexible, and
format adaptable format

Information Information produced by Information produced by


processing extraction and manipulation analytical modeling of
methodology of business data business data
Essential steps in the process of
making a decision
Step 1 Concept of Project is Identified
Decision To Proceed Decision To Abandon

Project assessment. Taking


Step 2 account of all issues involved

Decision To Proceed Decision To Abandon

Project Goes to Detail


Step 3 Specification For Tender
Decision To Proceed Decision To Abandon

Tender Accepted. Construction


Step 4
Starts
Decision To Proceed Decision To Abandon

Step 5 Operation Starts

Decision To Proceed Decision To Abandon


Step 1
 The conceptual need for a project arise mainly as a result
of an basement of future requirements.
 It may be made by a team of experts.
 Typically a conceptual study will identify the technical
solution required, the economic merits, and acceptability
of project in socio political terms.
 It may require discussion with financial institutions
wither or not they will provide necessary funds.
Step 2
 Assuming the decision has been made to develop the project further
then a detailed assessment will have to be made of all technical,
economic and socio-political factors.
 The details may be quantitative and based on subjective knowledge.
 A major decision making is about novelty of project.
 A project may technically be novel ( making a new airplane ).
 The project may employ an established technology in novel environment
( using electrical train in third world country).
 In this step the degree of uncertainty associated with each factor will
begin to emerge.
 An understanding of uncertainty associated with any proposal is
essential for a feasible decision making.
Step 3
 If the outcome of step 2 is to proceed the project, then a tender
specification has to be prepared.
 It should define, exactly what work the tender is required to do. Ideally
it has to define every thing that has to be done.
 The magnitude of uncertainty associated with this stage is a reason for
possible variations in cost and duration of projects.
 Before a tender specification is issued it is prudent to confirm that the
project is acceptable to regulatory authorities and that the adequate
finance is available.
 The financer need to be convinced that the project is viable, that the
proposer is sound and has the experience and capability to derive the
project to a successful conclusion.
Step 4 ,5
 Step 4
 The first action is to decide if one of the tender should be
accepted.
 The tenderer should have the appropriate experience, capability
and adequate financial resources.
 Step 5
 Assuming all steps completed satisfactorily, a decision has to be
taken to start the project.
 Even if the project starts, it might have to be stopped if the
environment it operates is changed.
• Monetary Cost
• Overemphasize Decision Making
• Assumption of Relevance
• Transfer of Power
• Unanticipated Effects
• Obscuring Responsibility
• False Belief in Objectivity
• Status Reduction
• Information Overall
Applications

Decision Support System for Infrastructure Planning


https://www.youtube.com/watch?v=VrmMF9Be_DE
Question Bank
Q.1 Define DSS? What are the features of past DSS? (8M)(S18), (5M)(W16)
Q.2 Give details on DSS characteristics.
Q.3 Why we need a DSS?
Q.4 Write a short on evolution or history of DSS.
Q.5 Which are the essential steps in the process of making a decision?
Q.6 Differentiate operational data and DSS.( 4M)(S19), (3M)(W17).
Q.7 Explain the component of DSS.
Q.8 Illustrate advantages and disadvantages of DSS.
Q.9 Explain various types of decisions
Q.10 Explain in brief the failure of past decision support system.(6M)(S16)
Inmon
• Father of the data warehouse
• Co-creator of the Corporate
Information Factory.
• He has 35 years of
experience in database
technology management
and data warehouse design.
Inmon-Cont’d
• Bill has written about a variety of topics on the building, usage,
& maintenance of the data warehouse & the Corporate Information
Factory.

• He has written more than 650 articles (Datamation, ComputerWorld,


and Byte Magazine).

• Inmon has published 45 books.


– Many of books has been translated to Chinese, Dutch, French, German,
Japanese, Korean, Portuguese, Russian, and Spanish.
Introduction
• What is Data Warehouse?
A data warehouse is a collection of integrated
databases designed to support a DSS.

• According to Inmon’s (father of data warehousing)


definition:
– It is a collection of integrated, subject-oriented
databases designed to support the DSS function,
where each unit of data is non-volatile and relevant
to some moment in time.
Need of Data Warehouse
Characteristics of Data Warehouse
What is a Data Warehouse?
A Practitioners Viewpoint

“A data warehouse is simply a single,


complete, and consistent store of data
obtained from a variety of sources and made
available to end users in a way they can
understand and use it in a business context.”
-- Barry Devlin, IBM Consultant

CS 336 40
Introduction-Cont’d.
• Where is it used?
It is used for evaluating future strategy.
• It needs a successful technician:
– Flexible.
– Team player.
– Good balance of business and technical understanding.
• The key to survival:
– Is the ability to analyze, plan, and react to changing business
conditions in a much more rapid fashion.
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer,
product, sales.
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing.
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.

42
A Data Warehouse is Subject Oriented
Subject Orientation

Application Environment Data warehouse


Environment
Design activities must be equally DW world is primarily void of process
focused on both process and database design and tends to focus exclusively on
design issues of data modeling and database
design
Data Warehouse - Integrated
• Constructed by integrating multiple,
heterogeneous data sources
– relational databases, flat files, on-line transaction
records
• Data cleaning and data integration techniques are
applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.

45
Data Integration
Problem:
• Different interfaces
• Different data
• representations
• Duplicated information
• Inconsistent information
Goal:
• Collect and combine information
• Provide an integrated view
• Provide a uniform user interface
• Support sharing of data
Problem: Heterogeneous Information
Sources
“Heterogeneities are everywhere”
Personal
Databases

World
Scientific Databases
Wide
Web
Digital Libraries
l Different interfaces
l Different data representations
l Duplicate and inconsistent information

CS 336 47
Problem: Data Management in Large
Enterprises
• Vertical fragmentation of informational systems
(vertical stove pipes)
• Result of application (user)-driven development
of operational systems
Sales Planning Suppliers Num. Control
Stock Mngmt Debt Mngmt Inventory
... ... ...

Sales Administration Finance Manufacturing ...


CS 336 48
Goal: Unified Access to Data

Integration System

World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases

 Collects and combines information


 Provides integrated view, uniform user interface
 Supports sharing
CS 336 49
Data Integrated
Data Warehouse -Time Variant

• The time horizon for the data warehouse is


significantly longer than that of operational
systems.
– Operational database: current value data.
– Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not
contain “time element”.

51
Time Variant
• Every piece of data contained within the
warehouse must be associated with a
particular point in time if any useful analysis is
to be conducted with it.
• Another aspect of time variance in DW data is
that, once recorded, data within the
warehouse cannot be updated or changed.
Data Warehouse - Non Updatable

• A physically separate store of data


transformed from the operational
environment.
• Operational update of data does not occur in
the data warehouse environment.
– Does not require transaction processing, recovery,
and concurrency control mechanisms.
– Requires only two operations in data accessing:
• initial loading of data and access of data.

53
Nonvolatility
• Typical activities such as deletes, inserts, and
changes that are performed in an operational
application environment are completely
nonexistent in a DW environment.
• Only two data operations are ever performed
in the DW: data loading and data access
Definition
• Data Warehouse:
Warehouse (W.H. Immon)

– A subject-oriented, integrated, time-variant, non-


updatable collection of data used in support of
management decision-making processes
– Subject-oriented: e.g. customers, patients, students,
products
– Integrated: Consistent naming conventions, formats,
encoding structures; from multiple data sources
– Time-variant: Can study trends and changes
– Nonupdatable: Read-only, periodically refreshed

55
Data Warehouse
• In order for data to be effective, DW must be:
– Consistent.
– Well integrated.
– Well defined.
– Time stamped.
• DW environment:
– The data store, data mart & the metadata.
Differentiate between Operational Data Store and
Data Warehouse

Characteristics Operational Data Data Warehouse


Store
How is it built? One application or subject area Typically multiple subject
at a time. areas at a time

Area of support? Day-to-day business operations. Decision support for


managerial activities.

Currency of data? Up-to the –minute, real time. Typically represents a static
point in time.

Typical unit for Small, manageable, transaction Large, unpredictable, variable


analysis? level units units.

Design focus? High-performance, limited High flexibility, high


flexibility. performance.
Characteristics of Data Warehouse

• Subject oriented. Data are organized based on how


the users refer to them.
• Integrated. All inconsistencies regarding naming
convention and value representations are removed.
• Nonvolatile. Data are stored in read-only format and
do not change over time.
• Time variant. Data are not current but normally time
series.
Characteristics of Data Warehouse

• Summarized Operational data are mapped into a


decision-usable format
• Large volume. Time series data sets are normally
quite large.
• Not normalized. DW data can be, and often are,
redundant.
• Metadata. Data about data are stored.
• Data sources. Data come from internal and external
unintegrated operational systems.
Warehouse is a Specialized DB
Standard DB (OLTP) Warehouse (OLAP)
• Mostly updates(Current data)  Mostly reads(Historic data)
• Many small transactions  Queries are long and complex
• Mb - Gb of data  Gb - Tb of data
• Current snapshot  History
• Raw data  Summarized, reconciled data
• Thousands of users (e.g., clerical users)  Hundreds of users (e.g., decision-makers,
• Transaction oriented analysts)
• Normalized table structure (many  Subject oriented
tables, few columns per table) • De-normalized table structure (few tables
• Continuous updates • many columns per table)
• Simple to complex queries • Batch updates
• Usually very complex queries
CS 336 60
Building a Data Warehouse

Data Warehouse Lifecycle

– Analysis
– Design
– Import data
– Install front-end tools
– Test and deploy
Stage 1: Analysis
Analysis
– Design
• Identify: – Import data
– Target Questions – Install front-end tools
– Test and deploy
– Data needs
– Timeliness of data
– Granularity
• Create an enterprise-level data dictionary
• Dimensional analysis
– Identify facts and dimensions
Stage 2: Design
– Analysis
Design
• Star schema – Import data
– Install front-end tools
• Data Transformation – Test and deploy

• Aggregates
Dimensional Modeling
• Pre-calculated Values
• HW/SW Architecture
Dimensional Modeling

• Fact Table – The primary table in a


dimensional model that is meant to contain
measurements of the business.
• Dimension Table – One of a set of companion
tables to a fact table. Most dimension tables
contain many textual attributes that are the
basis for constraining and grouping within
data warehouse queries.

SOURCE: Ralph Kimball


Stage 3: Import Data
– Analysis
• Identify data sources – Design
• Extract the needed data from Import data
existing systems to a data – Install front-end tools
staging area
– Test and deploy
• Transform and Clean the data
– Resolve data type conflicts
– Resolve naming and key
conflicts
– Remove, correct, or flag bad
data
– Conform Dimensions
• Load the data into the
warehouse
Importing Data Into the Warehouse

OLTP 1

Data Staging Area Data


OLTP 2
Warehouse

OLTP 3

Operational Systems
(source systems)
Stage 4: Install Front-end Tools

– Analysis
– Design
• Reporting tools – Import data
Install front-end tools
• Data mining tools – Test and deploy

• GIS
• Etc.
Stage 5: Test and Deploy
– Analysis
– Design
• Usability tests – Import data
– Install front-end tools
• Software installation Test and deploy

• User training
• Performance tweaking based on usage
Components of Data Warehouse

 Identifying the source

 Cleaning the Data

 Transformation Tools
Data Warehouse Attributes
• A DWH provides a mechanism for separating
operational and informational processing.
• A DWH is designed to help resolve inconsistencies in
data formats, semantics and usage across multiple
operational systems.
• DWH procedures include aggregating and
summarizing data to make it more relevant and
useful for users.
Cont…
• The data content of the warehouse is a subset
of all data in an organization.
• Automating the data extraction and the
required frequency of updates needs to be the
warehouse responsibility.
Data Warehouse Examples
• Credit card usage information .
• Advertising medium information.
• College applicant information.
• Stores sales information by product, region and time
period.
• Medical insurance claim information by city, age
occupation and time of policies.
Benefits of Data Warehouse
• Understand business trends and make better
forecasting decisions.
• Bring better products to market in a more
timely manner.
• Analyze daily sales information and make
quick decisions that can significantly affect
your company’s performance.
Cont…
• Locating the right information.
• Presentation of information (reports, graphs)
• Testing of hypothesis
• Discovery of information
• Sharing the analysis.
Performance
Why to improve the performance of DWH
application?

• Summarization
• Demoralization
• Partitioning.
Challenges of the DWH
• Technical Challenges 42%
• Data Management 40%
• Hardware, software staffing 32%
• Selling to management 26%
• Training users 16%
• Managing expectations 11%
• Managing change 8%
Future of the DWH
• Peta byte system(1 PB = 1024 TB)
• Size of the database grows to a very large
database(VLDB) to extremely very large
database(ELDB).
• Integration, Manipulation, non
textual(multimedia) and textual data.
• Web enabled application grows.
Cont…
• Building and running ever larger data
warehouse system.
• Handle vast quantities of multi format data.
• Distributed databases will be used .
• Cross database integrity.
• Use of middleware and multiple tiers.
Building A Data Warehouse
• The builders of Data warehouse should take a
broad view of the anticipated use of the
warehouse.
– The design should support ad-hoc querying
– An appropriate schema should be chosen that
reflects the anticipated usage.

Slide 29- 79
Building A Data Warehouse
• The Design of a Data Warehouse involves
following steps.
– Acquisition of data for the warehouse.
– Ensuring that Data Storage meets the query
requirements efficiently.
– Giving full consideration to the environment in
which the data warehouse resides.

Slide 29- 80
Building A Data Warehouse
• Acquisition of data for the warehouse
– The data must be extracted from multiple,
heterogeneous sources.
– Data must be formatted for consistency within the
warehouse.
– The data must be cleaned to ensure validity.
• Difficult to automate cleaning process.
• Back flushing, upgrading the data with cleaned data.

Slide 29- 81
Building A Data Warehouse
• Acquisition of data for the warehouse (contd.)
– The data must be fitted into the data model of the
warehouse.
– The data must be loaded into the warehouse.
• Proper design for refresh policy should be considered.

Slide 29- 82
Building A Data Warehouse
• Storing the data according to the data model of
the warehouse
• Creating and maintaining required data
structures
• Creating and maintaining appropriate access
paths
• Providing for time-variant data as new data are
added
• Supporting the updating of warehouse data.
• Refreshing the data
• Purging data
Slide 29- 83
Building A Data Warehouse
• Usage projections
• The fit of the data model
• Characteristics of available resources
• Design of the metadata component
• Modular component design
• Design for manageability and change
• Considerations of distributed and parallel
architecture
– Distributed vs. federated warehouses

Slide 29- 84
Generic Warehouse Architecture
Issues in Data Warehousing
• Warehouse Design
• Extraction
– Wrappers, monitors (change detectors)
• Integration
– Cleansing & merging
• Warehousing specification & Maintenance
• Optimizations
• Miscellaneous (e.g., evolution)

CS 336 87
Question Bank
Q1.What is DWM? Give its architecture.(8M)(S16)
Q2.What are the components of data warehouse? (5M)(S18), (9M)(S17)
Q3.How data is acquired or collected in a data warehouse?
Q4.Give the conceptual view of data warehouse.
Q.5 What are the advantages and disadvantages of data warehouse?
Q.6 Why we need a separate data warehouse?
Q7.What do you mean by subject-oriented, integrated, non-volatile and time-variant
collection of data in data warehousing?
Q.8 Differentiate between operational data store and data warehouse.(4M)(S16)
Q9.Discuss the characteristics of DWH.
Q10.Explain building block of data warehouse.
Q.11. Explain 3-tier architecture of data warehouse with neat diagram. (6M)(S19),
((7M)(W17), (7M)(W16)
Q.12. Explain life cycle of data warehouse with neat sketch.(7M)(W17), (5M)(S18),
(6M)(S17), (6M)(W16)
Data Mart
• Smaller, local data warehouse are called
data marts.
• A subset of a data warehouse that
supports the requirements of a
particular department or business
function.
• There are 2 kinds of data marts-
Dependent Data Mart
Flat Files
Operational
Systems Marketing

Marketing
Sales Sales
Finance
Human Resources

Data Finance
Warehouse

Data Marts

External Data
Cont…
• In a dependent data mart, the data can be
derived from an enterprise wide data
warehouse.
• A dependent data mart is one whose source is
a data warehouse.
• All dependent data marts are fed by the same
source the data warehouse.
Independent Data Mart
Operational Flat Files
Systems

Sales or Marketing

External Data
Cont…
• Independent mart is one that is derived
independently from operational data is called
an independent data mart.
• In an independent data mart, data can be
collected directly from sources.
Reasons for Creating a Data Mart
• To give users more flexible access to the
data they need to analyze most often.
• To provide data in a form that matches the
collective view of a group of users
Cont…
• To improve end-user response time due to the
reduction in the volume of data to be accessed.
• The cost of implementing data marts is far less than
that required to establish a data warehouse.
• To provide appropriately structured data as
dictated by the requirements of the end-user
access tools.

• Building a data mart is simpler compared with


establishing a corporate data warehouse.
Cont…
• Users access to data in multiple marts – one approach is to
replicate data between different data marts or alternatively,
build virtual data mart it is views of several physical data
marts or the corporate data warehouse.
• Data mart installation- Data marts are becoming increasingly
complex to build.
• Data mart load performance- 2 critical components: end user
response time and date loading performance to increment
database updating so that only cells affected by the change
are updated and not the entire database structure.
Cont…
• Data mart internet/intranet access- It offers users low cost
access to data marts and DWH using web browsers.
• Data mart administration organization cannot easily perform
administration of multiple data marts, giving rise to issues
such as data mart versioning , data and metadata, consistency
and integrity, enterprise wide security and performance. Data
mart administrative tools are commercially available.
Data Marts Issues
• Data mart functionality
• Data mart size
• Data mart load performance
• Users access to data in multiple data marts
• Data mart Internet / Intranet access
• Data mart administration
• Data mart installation
Data Warehouses Vs Data Marts

Data
Data Mart
Warehouse

Property Data Warehouse Data Mart

Scope Enterprise Department

Subjects Multiple Single-subject

Data Source Many Few

Size (typical) 100 GB to > 1 TB < 100 GB

Implementation time Months to years Months


Characteristics
1. Data marts do not normally contain detailed
operational data unlike data warehouses as
data marts contain less data compared with
data warehouses, data marts are more easily
understood and navigated.

2. The data mart focuses on only the


requirements of users associated with one
department or business function.
Security in a data mart
• Secretive information includes financial
information, medical records and human
resources information etc.
• The data mart administrator should make
necessary security arrangements such as :
firewalls, log on/off security, application based
security, database security, encryption and
decryption etc.
Metadata
• Meta data means data about something else.
• Meta data is data that describes the source , location
and meaning of another piece of data.
• Metadata is data about data: it is like a card index
describing how information is structured within the
data warehouse.
• Metadata acts as logical link or a bridge between the
DSS and DWH.
Types of metadata
• Technical Metadata- Technical metadata,
which contains information about warehouse
data for use by warehouse designers and
administrators.
- Information about data sources.
Cont…
- Transformation descriptions, that is the
mapping method from operational databases
into the warehouse, and algorithms used to
convert , enhance or transform data.
- Access authorization, backup history, archive
history, information delivery history, data
acquisition history, data access etc.
Cont…
• Business metadata- Business metadata
contains information that gives users an easy
to understand perspective of the information
stored in the data warehouse.
- Subject areas and information object type,
including queries, reports, images, video, and
audio clips
Cont…
- Data warehouse operational information,
data history (snapshots, versions), ownership
extract audit data etc.
- Internet home pages.
Cont…
• Acquisition metadata- Acquisition metadata
maps the translation of information from the
operational system to the analytical system.
This includes an extract history describing
data origins, updates, algorithms used to
summarize data, and frequency of extractions
from operational systems.
Cont…
• Transformation metadata- Transformation
metadata includes a history of data
transformations , changes in names and other
physical characteristics.
• Access metadata- Access metadata provides
navigation and graphical user interfaces that
allow non technical business users to interact
with the contents of warehouse.
Uses of Metadata
• It is used for building, maintaining , managing
and using the data warehouse.
• It is used by end users for querying purposes,
as well as by the data manager for structuring
the management of a database site.
• It is used in data acquisition/collection, data
transformation, and data access.
WHY METADATA IS IMPORTANT
• Metadata in a data warehouse contains the answers to
questions about the data in the data warehouse.
• sample list of definitions:
– Data about the data
– Table of contents for the data
– Catalog for the data
– Data warehouse atlas
– Data warehouse roadmap
– Data warehouse directory
– Glue that holds the data warehouse contents together
– Tongs to handle the data
– The nerve center
Question Bank
Q1. What is METADATA. State and explain its categories.(4M)(S17)

Q2. Explain METADATA and its importance.

Q3. Explain the concept of data mart. How it differs from data warehouse?

Q4. Differentiate between dependent and independent data mart.

Q5. What is the reason for creating a data mart?

Q6. Differentiate between data warehouse and data mart. (3M)(W17), (4M)(S16),
(4M)(W16), (4M)(S19),

Q7. Why METADATA is important?

Q8 Differentiate between data marts and METADATA.(2M)(S18)

You might also like