Unit 1 - DWM
Unit 1 - DWM
Samir Siddiqui
CR FINAL YEAR
Department of Information Technology
1
Decision Support System
Objective
Provide earliest Maximize profit
entry into market
Minimize employee discomfort/turnover
Decision variables
Determine what price to use
Determine length of time tests should be run on a new product/service
Determine the responsibilities to assign to each worker
Constraints
Can’t charge below cost
Test enough to meet minimum safety regulations
Ensure responsibilities are at most shared by two workers
Characteristics and Capabilities of DSS
Goal: Use best parts of IS, OR/MS, AI & cognitive science to support more
effective decision
What is Operations Research?
Operations
The activities carried out in an organization.
Research
The process of observation and testing characterized
by the scientific method. Situation, problem
statement, model construction, validation,
experimentation, candidate solutions.
Model
An abstract representation of reality. Mathematical,
physical, narrative, set of rules in computer program.
Systems Approach
Include broad implications of decisions for the
organization at each stage in analysis. Both quantitative
and qualitative factors are considered.
Optimal Solution
A solution to the model that optimizes (maximizes or
minimizes) some measure of merit over all feasible
solutions.
Team
A group of individuals bringing various skills and
viewpoints to a problem.
Operations Research Techniques
A collection of general mathematical models, analytical
procedures, and algorithms.
Artificial Intelligence
• Behavior by a machine that, if performed by
a human being, would be considered
intelligent
• “…study of how to make computers do
things at which, at the moment, people are
better” (Rich and Knight [1991])
• Theory of how the human mind works
(Mark Fox)
Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson 22
6th ed, Copyright 2001, Prentice Hall, Upper Saddle River, NJ
AI Objectives
• Make machines smarter (primary goal)
• Understand what intelligence is (Nobel
Laureate purpose)
• Make machines more useful
(entrepreneurial purpose)
Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson 23
6th ed, Copyright 2001, Prentice Hall, Upper Saddle River, NJ
What Is Cognitive Science?
The (interdisciplinary) study of mind and
intelligence. (e.g. neural network and robot)
CS 336 40
Introduction-Cont’d.
• Where is it used?
It is used for evaluating future strategy.
• It needs a successful technician:
– Flexible.
– Team player.
– Good balance of business and technical understanding.
• The key to survival:
– Is the ability to analyze, plan, and react to changing business
conditions in a much more rapid fashion.
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer,
product, sales.
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing.
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.
42
A Data Warehouse is Subject Oriented
Subject Orientation
45
Data Integration
Problem:
• Different interfaces
• Different data
• representations
• Duplicated information
• Inconsistent information
Goal:
• Collect and combine information
• Provide an integrated view
• Provide a uniform user interface
• Support sharing of data
Problem: Heterogeneous Information
Sources
“Heterogeneities are everywhere”
Personal
Databases
World
Scientific Databases
Wide
Web
Digital Libraries
l Different interfaces
l Different data representations
l Duplicate and inconsistent information
CS 336 47
Problem: Data Management in Large
Enterprises
• Vertical fragmentation of informational systems
(vertical stove pipes)
• Result of application (user)-driven development
of operational systems
Sales Planning Suppliers Num. Control
Stock Mngmt Debt Mngmt Inventory
... ... ...
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
51
Time Variant
• Every piece of data contained within the
warehouse must be associated with a
particular point in time if any useful analysis is
to be conducted with it.
• Another aspect of time variance in DW data is
that, once recorded, data within the
warehouse cannot be updated or changed.
Data Warehouse - Non Updatable
53
Nonvolatility
• Typical activities such as deletes, inserts, and
changes that are performed in an operational
application environment are completely
nonexistent in a DW environment.
• Only two data operations are ever performed
in the DW: data loading and data access
Definition
• Data Warehouse:
Warehouse (W.H. Immon)
55
Data Warehouse
• In order for data to be effective, DW must be:
– Consistent.
– Well integrated.
– Well defined.
– Time stamped.
• DW environment:
– The data store, data mart & the metadata.
Differentiate between Operational Data Store and
Data Warehouse
Currency of data? Up-to the –minute, real time. Typically represents a static
point in time.
– Analysis
– Design
– Import data
– Install front-end tools
– Test and deploy
Stage 1: Analysis
Analysis
– Design
• Identify: – Import data
– Target Questions – Install front-end tools
– Test and deploy
– Data needs
– Timeliness of data
– Granularity
• Create an enterprise-level data dictionary
• Dimensional analysis
– Identify facts and dimensions
Stage 2: Design
– Analysis
Design
• Star schema – Import data
– Install front-end tools
• Data Transformation – Test and deploy
• Aggregates
Dimensional Modeling
• Pre-calculated Values
• HW/SW Architecture
Dimensional Modeling
OLTP 1
OLTP 3
Operational Systems
(source systems)
Stage 4: Install Front-end Tools
– Analysis
– Design
• Reporting tools – Import data
Install front-end tools
• Data mining tools – Test and deploy
• GIS
• Etc.
Stage 5: Test and Deploy
– Analysis
– Design
• Usability tests – Import data
– Install front-end tools
• Software installation Test and deploy
• User training
• Performance tweaking based on usage
Components of Data Warehouse
Transformation Tools
Data Warehouse Attributes
• A DWH provides a mechanism for separating
operational and informational processing.
• A DWH is designed to help resolve inconsistencies in
data formats, semantics and usage across multiple
operational systems.
• DWH procedures include aggregating and
summarizing data to make it more relevant and
useful for users.
Cont…
• The data content of the warehouse is a subset
of all data in an organization.
• Automating the data extraction and the
required frequency of updates needs to be the
warehouse responsibility.
Data Warehouse Examples
• Credit card usage information .
• Advertising medium information.
• College applicant information.
• Stores sales information by product, region and time
period.
• Medical insurance claim information by city, age
occupation and time of policies.
Benefits of Data Warehouse
• Understand business trends and make better
forecasting decisions.
• Bring better products to market in a more
timely manner.
• Analyze daily sales information and make
quick decisions that can significantly affect
your company’s performance.
Cont…
• Locating the right information.
• Presentation of information (reports, graphs)
• Testing of hypothesis
• Discovery of information
• Sharing the analysis.
Performance
Why to improve the performance of DWH
application?
• Summarization
• Demoralization
• Partitioning.
Challenges of the DWH
• Technical Challenges 42%
• Data Management 40%
• Hardware, software staffing 32%
• Selling to management 26%
• Training users 16%
• Managing expectations 11%
• Managing change 8%
Future of the DWH
• Peta byte system(1 PB = 1024 TB)
• Size of the database grows to a very large
database(VLDB) to extremely very large
database(ELDB).
• Integration, Manipulation, non
textual(multimedia) and textual data.
• Web enabled application grows.
Cont…
• Building and running ever larger data
warehouse system.
• Handle vast quantities of multi format data.
• Distributed databases will be used .
• Cross database integrity.
• Use of middleware and multiple tiers.
Building A Data Warehouse
• The builders of Data warehouse should take a
broad view of the anticipated use of the
warehouse.
– The design should support ad-hoc querying
– An appropriate schema should be chosen that
reflects the anticipated usage.
Slide 29- 79
Building A Data Warehouse
• The Design of a Data Warehouse involves
following steps.
– Acquisition of data for the warehouse.
– Ensuring that Data Storage meets the query
requirements efficiently.
– Giving full consideration to the environment in
which the data warehouse resides.
Slide 29- 80
Building A Data Warehouse
• Acquisition of data for the warehouse
– The data must be extracted from multiple,
heterogeneous sources.
– Data must be formatted for consistency within the
warehouse.
– The data must be cleaned to ensure validity.
• Difficult to automate cleaning process.
• Back flushing, upgrading the data with cleaned data.
Slide 29- 81
Building A Data Warehouse
• Acquisition of data for the warehouse (contd.)
– The data must be fitted into the data model of the
warehouse.
– The data must be loaded into the warehouse.
• Proper design for refresh policy should be considered.
Slide 29- 82
Building A Data Warehouse
• Storing the data according to the data model of
the warehouse
• Creating and maintaining required data
structures
• Creating and maintaining appropriate access
paths
• Providing for time-variant data as new data are
added
• Supporting the updating of warehouse data.
• Refreshing the data
• Purging data
Slide 29- 83
Building A Data Warehouse
• Usage projections
• The fit of the data model
• Characteristics of available resources
• Design of the metadata component
• Modular component design
• Design for manageability and change
• Considerations of distributed and parallel
architecture
– Distributed vs. federated warehouses
Slide 29- 84
Generic Warehouse Architecture
Issues in Data Warehousing
• Warehouse Design
• Extraction
– Wrappers, monitors (change detectors)
• Integration
– Cleansing & merging
• Warehousing specification & Maintenance
• Optimizations
• Miscellaneous (e.g., evolution)
CS 336 87
Question Bank
Q1.What is DWM? Give its architecture.(8M)(S16)
Q2.What are the components of data warehouse? (5M)(S18), (9M)(S17)
Q3.How data is acquired or collected in a data warehouse?
Q4.Give the conceptual view of data warehouse.
Q.5 What are the advantages and disadvantages of data warehouse?
Q.6 Why we need a separate data warehouse?
Q7.What do you mean by subject-oriented, integrated, non-volatile and time-variant
collection of data in data warehousing?
Q.8 Differentiate between operational data store and data warehouse.(4M)(S16)
Q9.Discuss the characteristics of DWH.
Q10.Explain building block of data warehouse.
Q.11. Explain 3-tier architecture of data warehouse with neat diagram. (6M)(S19),
((7M)(W17), (7M)(W16)
Q.12. Explain life cycle of data warehouse with neat sketch.(7M)(W17), (5M)(S18),
(6M)(S17), (6M)(W16)
Data Mart
• Smaller, local data warehouse are called
data marts.
• A subset of a data warehouse that
supports the requirements of a
particular department or business
function.
• There are 2 kinds of data marts-
Dependent Data Mart
Flat Files
Operational
Systems Marketing
Marketing
Sales Sales
Finance
Human Resources
Data Finance
Warehouse
Data Marts
External Data
Cont…
• In a dependent data mart, the data can be
derived from an enterprise wide data
warehouse.
• A dependent data mart is one whose source is
a data warehouse.
• All dependent data marts are fed by the same
source the data warehouse.
Independent Data Mart
Operational Flat Files
Systems
Sales or Marketing
External Data
Cont…
• Independent mart is one that is derived
independently from operational data is called
an independent data mart.
• In an independent data mart, data can be
collected directly from sources.
Reasons for Creating a Data Mart
• To give users more flexible access to the
data they need to analyze most often.
• To provide data in a form that matches the
collective view of a group of users
Cont…
• To improve end-user response time due to the
reduction in the volume of data to be accessed.
• The cost of implementing data marts is far less than
that required to establish a data warehouse.
• To provide appropriately structured data as
dictated by the requirements of the end-user
access tools.
Data
Data Mart
Warehouse
Q3. Explain the concept of data mart. How it differs from data warehouse?
Q6. Differentiate between data warehouse and data mart. (3M)(W17), (4M)(S16),
(4M)(W16), (4M)(S19),