0% found this document useful (0 votes)

23 views17 pages

Overview of Data Ware Housing

A data warehouse centralizes and consolidates large amounts of data from various sources to support business intelligence activities, enabling better decision-making and improved data quality. It consists of components like external sources, staging areas, data warehouses, data marts, and data mining, and can be constructed using top-down or bottom-up approaches. The architecture typically follows a three-tier model, comprising data sources, an OLAP engine, and front-end tools, facilitating efficient data handling and analysis.

Uploaded by

sivanisri1319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views17 pages

Overview of Data Ware Housing

Uploaded by

sivanisri1319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

2.

Over view of Data Ware

Housing
2.1.define data Ware housing
A data warehouse is a system that centralizes and consolidates large
amounts of data from various sources to support business intelligence (BI)
activities, especially analytics and reporting. It acts as a single source of
truth for an organization, holding both current and historical data, and is
designed to facilitate complex queries and analysis.
2.2.State the importance of Data Ware housing
Data warehousing is crucial for businesses as it enables them to consolidate,
analyze, and utilize vast amounts of data from various sources to make informed decisions and
improve overall business performance.

 Centralized Data Storage:

A data warehouse stores all the important data from different sources in one place. This
makes data easy to access and manage.

 Better Decision Making:

It helps managers and analysts make better decisions by providing accurate and organized
data.

 Improves Data Quality and Consistency:

It removes duplicate and incorrect data, making the information more reliable.

 Faster Query Performance:

Data warehouses are designed for quick searching and reporting, so users get faster results.

 Historical Data Analysis:

It stores past data, which helps in comparing current and previous performance over time.

 Supports Business Intelligence Tools:

Data warehouses work with BI tools to generate charts, graphs, and reports for business
analysis.

 Time-saving for Reporting:

Users don’t need to collect data manually from different places; everything is already
available in the warehouse.

 Helps in Trend Analysis and Forecasting:

Businesses can identify trends and predict future outcomes by analyzing stored data.
2.3.Differences between Database and data
warehouse

Feature Database Data Warehouse

Stores current data for day-to-dayStores historical data for analysis
Purpose
operations and reporting
Transactional data (e.g., sales, Analytical data (e.g., trends,
Data Type
payments) summaries)
Used by clerks, database admins, Used by data analysts and decision-
Users
and developers makers
Data is updated periodically (e.g.,
Data Update Frequently updated with real-time data
daily, weekly)
Optimized for read/write operations Optimized for read-heavy operations
Design
(OLTP) (OLAP)
Highly normalized (to avoid Often denormalized (to improve
Normalization
redundancy) query speed)
Amazon Redshift, Google BigQuery,
Examples MySQL, Oracle DB, PostgreSQL
Snowflake

2.4. Explain the Data warehouse architecture

Data Warehouse Architecture

A Data Warehouse is a system that combines data from multiple sources, organizes it under a
single architecture, and helps organizations make better decisions. It simplifies data handling,
storage, and reporting, making analysis more efficient. Data Warehouse Architecture uses a
structured framework to manage and store data effectively.

There are two common approaches to constructing a data warehouse:

1. Top-Down Approach: This method starts with designing the overall data warehouse architecture
first and then creating individual data marts.
2. Bottom-Up Approach: In this method, data marts are built first to meet specific business needs, and
later integrated into a central data warehouse.

Before diving deep into these approaches, we will first discuss the components of data
warehouse architecture.

Components of Data Warehouse Architecture

A data warehouse architecture consists of several key components that work together to store,
manage, and analyze data.
 External Sources: External sources are where data originates. These sources provide a
variety of data types, such as:
 Structured data (databases, spreadsheets)
 Semi-structured data (XML, JSON)
 Unstructured data (emails, images)

 Staging Area: The staging area is a temporary space where raw data from external sources is
validated and prepared before entering the data warehouse. This process ensures that the data
is consistent and usable. To handle this preparation effectively, ETL (Extract, Transform,
Load) tools are used:
 Extract (E): Pulls raw data from external sources.
 Transform (T): Converts raw data into a standard, uniform format.
 Load (L): Loads the transformed data into the data warehouse for further processing.

 Data Warehouse: The data warehouse acts as the central repository for storing cleansed and
organized data. It contains metadata and raw data. The data warehouse serves as the
foundation for advanced analysis, reporting, and decision-making.

 Data Marts: A data mart is a subset of a data warehouse that stores data for a specific team
or purpose, like sales or marketing. It helps users quickly access the information they need
for their work.

 Data Mining: Data mining is the process of analyzing large datasets stored in the data
warehouse to uncover meaningful patterns, trends, and insights. The insights gained can
support decision-making, identify hidden opportunities, and improve operational efficiency.

Top-Down Approach

The Top-Down Approach, introduced by Bill Inmon, is a method for designing data
warehouses that starts by building a centralized, company-wide data warehouse. This central
repository acts as the single source of truth for managing and analyzing data across the
organization. It ensures data consistency and provides a strong foundation for decision-
making.
Working of Top-Down Approach

1. Central Data Warehouse: The process begins with creating a comprehensive data
warehouse where data from various sources is collected, integrated, and stored. This involves
the ETL (Extract, Transform, Load) process to clean and transform the data.

2. Specialized Data Marts: Once the central warehouse is established, smaller, department-
specific data marts (e.g., for finance or marketing) are built. These data marts pull
information from the main data warehouse, ensuring consistency across departments.

This structured approach allows organizations to maintain a high level of data integrity and
provides a robust framework for data analysis and reporting.
Bottom-Up Approach
The Bottom-Up Approach, popularized by Ralph Kimball, takes a more flexible and
incremental path to designing data warehouses. Instead of starting with a central data
warehouse, it begins by building small, department-specific data marts that cater to the
immediate needs of individual teams, such as sales or finance. These data marts are later
integrated to form a larger, unified data warehouse.

Fig(B):Bottom up approach
Working of bottom up approach

1. Department-Specific Data Marts: The process starts with creating data marts for individual
departments or specific business functions. These data marts are designed to meet immediate
data analysis and reporting needs, allowing departments to gain quick insights.
2. Integration into a Data Warehouse: Over time, these data marts are connected and
consolidated to create a unified data warehouse. The integration ensures consistency and
provides a comprehensive view of the organization’s data.
✅ Advantages of Data Warehouse:

1. Improved Decision Making

o Helps managers make better decisions by providing accurate, consistent data.
2. Faster Query Performance
o Large amounts of data can be searched and analyzed quickly.
3. Data Integration
o Combines data from different sources (e.g., sales, marketing, finance) into one
place.
4. Historical Analysis
o Stores historical data, useful for trend analysis over time.
5. High Data Quality
o Data is cleaned, organized, and validated before storage.
6. Support for Business Intelligence (BI)
o Useful for reporting, dashboards, data mining, and analytics.

❌ Disadvantages of Data Warehouse:

1. High Cost
o Expensive to build and maintain (hardware, software, skilled staff).
2. Complex Implementation
o Needs careful planning and time to implement properly.
3. Data Loading Time
o Loading huge amounts of data (ETL process) can be time-consuming.
4. Not Suitable for Real-Time Data
o Usually works with batch data, not real-time updates.
5. Difficult to Change
o Once built, making changes or adding new data sources is complex.

2.4.Expalin three-tier Data Warehouse

Architecture
Data warehousing is essential for businesses looking to make informed
decisions based on large amounts of information. The architecture of a
data warehouse is key to its effectiveness, influencing how easily data can
be accessed and used. The Three/Multi-Tier Data Warehouse
Architecture is widely adopted due to its clear and organized framework.
This architecture divides data handling into three main layers:
 Bottom Tier (Data Sources and Data Storage)
 Middle Tier (OLAP Engine)
 Top Tier (Front-End Tools)
Three/Multi-tier Architecture of Data Warehouse

Bottom Tier

The Bottom Tier serves as the foundation of the data warehouse architecture. It is
primarily responsible for data collection and storage. This tier typically utilizes a
warehouse database server, often an RDBMS (Relational Database Management
System), to house data extracted from various operational and external sources. The
core activities in this tier involve the ETL process, which stands for Extract,
Transform, and Load.

ETL Process

The ETL process is vital for ensuring that data is clean, consistent, and optimized for
quick retrieval. The steps involved are:
1. Extract: Data is gathered from various sources, including relational databases, flat files, and
web services.
2. Transform: Data is cleaned and transformed to align with business logic and analysis
needs.
3. Load: The transformed data is loaded into a structured repository, often housed in an
RDBMS or a multidimensional database system.

Common Challenges in Bottom Tier

 Data Quality: Inconsistent data can lead to errors and unreliable analytics.
 Data Compatibility: Different data formats and structures complicate integration.
 Scalability: Efficiently handling increasing volumes of data.

Solutions
 Implement Robust ETL Tools: Utilize powerful ETL tools like Informatica, Microsoft SSIS,
or Confluent.
 Standardize Data Formats: Standardizing data at the point of entry minimizes compatibility
issues.
 Continuous Data Quality Management: Regularly check and clean data to maintain high
quality.
 Scalability Planning: Design data storage solutions that can expand as data volume grows.

Middle Tier

The Middle Tier is where the OLAP (Online Analytical Processing) server resides.
This tier acts as the processing layer that manages and enables complex analytical
queries on data stored in the bottom tier. It serves as a mediator between the data
repository and the end-user interface.

OLAP Models

OLAP technology is designed for high-speed analytical processing and comes in

three categories:
 ROLAP (Relational OLAP): Uses a relational database to manage warehouse data, ideal
for large data volumes.
 MOLAP (Multidimensional OLAP): Stores data in a multidimensional cube, making it
efficient for complex analytical queries.
 HOLAP (Hybrid OLAP): Combines relational and multidimensional processing paradigms.

Common Challenges in Middle Tier

 Data Latency: Delays in data availability can impact decision-making.
 Query Performance: Large data volumes can slow down query performance.
 Data Integration: Combining data from different sources with varying formats can be
challenging.

Solutions
 Real-Time Data Processing: Implement real-time processing and incremental loading
techniques to reduce data latency.
 Query Optimization Techniques: Utilize indexing and partitioning to improve query
performance.
 Standardization and Advanced Integration Tools: Standardize data formats and
implement robust middleware solutions.

Top Tier

The Top Tier comprises the front-end client layer, essential for interacting with the
data stored and processed in the lower tiers. This layer includes various business
intelligence (BI) tools designed to facilitate easy access and manipulation of data for
reporting, analysis, and decision-making.
Popular BI Tools
 IBM Cognos: Comprehensive reporting capabilities.
 Microsoft BI Platform: Integrates well with existing Microsoft products.
 SAP BW: Manages large datasets and integrates with other SAP products.
 Crystal Reports: Known for powerful reporting features.
 SAS Business Intelligence: Provides advanced analytics.
 Pentaho: Versatile tool for data integration and visualization.

Common Challenges in Top Tier

 Usability Issues: Complex tools can hinder user adoption.
 Integration Difficulties: Ensuring seamless integration with other tiers can be challenging.

Solutions
 User Training and Support: Offer comprehensive training sessions for users.
 Choosing Integrative Tools: Select tools that easily integrate with existing systems.

2.6.State the imporatances of Operational Data

stores
An Operational Data Store (ODS) is crucial for businesses because
it provides a real-time, consolidated view of operational data, enabling faster, more informed
decision-making.

1. Real-Time Information:
o ODS gives current (up-to-date) data for daily business activities.
2. Quick Decision Making:
o Managers and employees can take fast decisions using the latest data.
3. Combines Data from Different Places:
o It collects data from many systems (like billing, sales, customer service) into
one place.
4. Improves Work Efficiency:
o Staff don’t need to check many systems—ODS gives a single view of
everything.
5. Reduces Pressure on Main Systems:
o It handles reporting and queries, so the main systems stay fast and free for
daily work.
6. Prepares Data for Analysis:
o It helps clean and organize data before sending it to the data warehouse for
deeper analysis.
2.7.Define ETL and ELT
1. ETL – Extract, Transform, Load

Definition:
ETL is a data integration process where data is first Extracted from source systems, then
Transformed (cleaned, formatted), and finally Loaded into a data warehouse or database.

Steps:

1. Extract – Get data from various sources (e.g., databases, files).

2. Transform – Clean, format, and apply business rules to the data.
3. Load – Store the transformed data into the target system (usually a data warehouse).

Used When:

 Data needs heavy transformation before loading

 Traditional data warehouses are used

2. ELT – Extract, Load, Transform

Definition:
ELT is a variation where data is first Extracted, then Loaded directly into the target system,
and then Transformed inside the system (e.g., using SQL in a data warehouse or data lake).

Steps:

1. Extract – Pull data from source systems.

2. Load – Directly load raw data into the target system.
3. Transform – Perform transformation within the target (like a cloud warehouse).

Used When:

 The target system is powerful (e.g., cloud-based like Snowflake, BigQuery)

 Suitable for big data and flexible processing

2.8.List Types of Data warehouses

three main types of data warehouses is clear and informative.

1. Enterprise Data Warehouse (EDW):

 A centralized repository for the entire organization.

 Stores all historical data from various departments.
 Supports decision-making across the whole enterprise.
 Example: A large company consolidating data from sales, HR, finance, etc., into one
centralized location.
2. Operational Data Store (ODS):
 Designed for real-time or near real-time reporting.

 Contains current, up-to-date data rather than historical data.

 Supports daily operations and immediate decision-making, not long-term analysis.
 Example: A bank utilizing an ODS to display real-time customer balances.
3. Data Mart:
 Concentrates on a specific department or business unit, such as sales or HR.

 Easier to manage and provides faster access to relevant data.

 Example: A retail company maintaining a separate data mart specifically for sales
analysis.

2.9.Explain Data Ware Housing Model

A data warehousing model refers to the structured design and organization
of data within a data warehouse to facilitate efficient storage, retrieval, and
analysis for business intelligence and decision-making. It aims to integrate
data from various sources into a cohesive and consistent framework,
optimizing it for analytical queries rather than transactional processing.

Key Components of Data Warehousing Models:

 Enterprise Data Warehouse (EDW):
This is a centralized, comprehensive data repository that integrates data from
across the entire organization. It aims to provide a unified, holistic view of business
operations and typically contains highly detailed, historical data.
 Data Marts:
These are smaller, subject-oriented subsets of an EDW, designed to serve the
specific analytical needs of a particular department or business function (e.g.,
sales, marketing, finance).
 Dependent Data Marts: Built by extracting data from an existing EDW.
 Independent Data Marts: Created directly from operational data sources, without
relying on a central EDW.
 Virtual Data Warehouse:
This is not a physical repository but rather a logical view of operational data,
providing a quick overview without the need for extensive data extraction and
loading. It uses middleware to integrate data from disparate sources on demand.
Common Data Modeling Techniques:
 Dimensional Modeling (Kimball Methodology):
This is a widely adopted approach for data warehousing, focusing on creating a
user-friendly, performance-optimized structure for analytical queries.
 Fact Tables: Store quantitative measures (metrics) and foreign keys to dimension
tables.
 Dimension Tables: Store descriptive attributes related to the facts (e.g., time,
product, customer).
 Star Schema: A simple dimensional model where a central fact table is directly
connected to multiple dimension tables.
 Snowflake Schema: An extension of the star schema where dimension tables are
further normalized into sub-dimensions.
 Normalized Modeling (Inmon Methodology):
This approach emphasizes data integrity and minimizes redundancy by adhering to
database normalization rules (e.g., 3NF). It creates a highly normalized structure,
similar to a transactional database, but optimized for reporting.
 Data Vault Modeling:
A more recent approach designed for agile data warehouse development,
emphasizing scalability and integration. It organizes data into:
 Hubs: Represent core business entities.
 Links: Represent relationships between hubs.
 Satellites: Store descriptive attributes about hubs or links, allowing for historical
tracking of changes.
The choice of data warehousing model and modeling technique depends
on factors such as organizational size, data volume, analytical
requirements, and development methodology.

2.10 explain data ware house design approaches

Data Warehouse Design

The process of designing the structure, schema, and flow of data in a data
warehouse to support efficient data analysis and reporting. It involves organizing
data, modeling relationships, and integrating data from multiple sources.

Main Data Warehouse Design Approaches

There are three major approaches to data warehouse design:

1. Top-Down Approach (Inmon’s Approach)

 Definition: Begins with a centralized enterprise data warehouse (EDW) from which data
marts are created.
 Steps:
 Build a normalized EDW (3NF).
 Create data marts using dimensional modeling.
 Load data into EDW via ETL processes.
 Use data marts for analysis/reporting.
 Characteristics: Central repository, integrated data, normalized EDW.
 Advantages: Consistent, integrated data; good for long-term strategy; avoids redundancy.
 Disadvantages: High initial cost and time; complex implementation; longer time to show
results.

2. Bottom-Up Approach (Kimball’s Approach)

 Definition: Starts with data marts for individual departments, which are later integrated into
a data warehouse.
 Steps:
 Identify business processes (e.g., sales, orders).
 Create dimensional models (star/snowflake schema).
 Build data marts for each process.
 Integrate data marts using a bus architecture.
 Characteristics: Fast and incremental; uses denormalized models; focuses on user needs.
 Advantages: Quick results; early ROI; easy to understand and maintain; lower initial cost.
 Disadvantages: Risk of data silos; integration challenges; potential redundancy.

3. Hybrid Approach

 Definition: Combines both top-down and bottom-up methods, starting with important data
marts while planning for an enterprise-wide structure.
 Characteristics: Uses dimensional modeling with an enterprise vision; balances speed and
consistency.
 Advantages: Quick implementation; maintains consistency and integration; scalable and
flexible.
 Disadvantages: Complex design and management; requires careful planning and
governance.

Design Modeling Techniques

 Dimensional Modeling (used in Kimball's method): Includes Star Schema and Snowflake
Schema.
 ER Modeling (Entity-Relationship) (used in Inmon’s method): Focuses on a normalized
structure.
2.11 DEFINE TERMS META DATA,DATA MART
META DATA
Metadata is data about data, providing information about other data, such
as its structure, content, and usage
 Metadata helps in understanding, managing, and utilizing data effectively.
 It can be about various aspects of data, including its creation, modification, and
purpose.
 Metadata helps in finding, organizing, and categorizing data.
 Examples include file names, creation dates, author information, and data types.
 In data warehousing, metadata acts as a roadmap, helping users understand the
structure and content of the data warehouse.

DATA MART
Data mart, on the other hand, is a subset of a data warehouse focused on
a specific subject or business line, designed for efficient analysis and
reporting within that area.
 A data mart is a focused, subject-oriented database derived from a larger data warehouse.
 It contains a specific set of data relevant to a particular business unit or function.
 Data marts are designed for faster and more efficient analysis of specific business areas.
 For instance, a marketing department might have a data mart containing customer data,
campaign results, and marketing metrics.
 They are smaller and more manageable than a full data warehouse, making them easier to
implement and maintain.

2.12 DEFINE OLAP

OLAP Overview
 Purpose: OLAP enables users to analyze large datasets from multiple perspectives,
primarily for business intelligence and decision support.
 Data Model: Utilizes a multidimensional data model, often represented as an "OLAP cube,"
to facilitate efficient data analysis.

Key Features
1. Multidimensional Analysis: Users can analyze data across various dimensions (e.g., time,
product, region) simultaneously.
2. Data Cubes: The OLAP cube organizes data in a multidimensional structure, allowing for
operations like "drilling down" (viewing detailed data) and "rolling up" (viewing summarized
data).
3. Speed and Efficiency: Designed for fast query processing, enabling quick retrieval and
analysis of large datasets.
4. Business Intelligence: Integral to BI systems, providing tools for reporting, analysis, and
informed decision-making.

Comparison with OLTP

 OLAP vs. OLTP: OLAP focuses on historical data analysis and trend identification, while
OLTP (Online Transaction Processing) is centered on real-time transaction processing (e.g.,
recording sales).

Applications
 Commonly used in financial reporting, sales forecasting, budgeting, and market analysis.

2.13 LIST THE CHARACTERSTICS OF OLAP

Online Analytical Processing (OLAP) systems are designed specifically for analytical
purposes and possess several distinct characteristics that set them apart from
transactional systems. Here are the key features:

1. Multidimensional Data Analysis:

OLAP systems organize data into multidimensional structures, often referred to as
"cubes." This allows users to analyze data across various dimensions such as time,
product, geography, and customer. Users can view data from multiple perspectives
and perform complex analytical queries, including drill-down, roll-up, slice, dice, and
pivot operations.

2. High Performance for Analytical Queries:

OLAP systems are optimized for the fast retrieval and aggregation of large volumes
of data, providing quick responses to complex analytical queries. This performance
is achieved through pre-calculated aggregations and optimized data storage
structures.

3. Support for Complex Calculations and Business Logic:

OLAP systems facilitate the execution of complex calculations, aggregations, and
business rules. This capability enables users to derive insights and perform activities
such as forecasting, budgeting, and planning.

4. Historical Data Analysis:

OLAP systems are designed to store and analyze historical data, allowing users to
identify trends, patterns, and anomalies over time. This feature is crucial for strategic
decision-making.
5. User -Friendly Interface:
OLAP tools typically provide intuitive and interactive interfaces, often integrated with
popular business intelligence tools like dashboards and reporting applications. This
accessibility makes them usable for business users without extensive technical
knowledge.

6. Client/Server Architecture:
OLAP systems often employ a client/server architecture. In this setup, the OLAP
server processes and stores the multidimensional data, while client applications
provide the user interface for data interaction and analysis.

7. Data Integration from Multiple Sources:

OLAP systems can integrate data from various operational systems and data
sources, consolidating it into a unified view for comprehensive analysis.

2.14 DIFFERNTIATE BETWEEN OLTP AND OLAP

Criteria OLAP OLTP
OLAP helps you analyze large volumes OLTP helps you manage and process
Purpose
of data to support decision-making. real-time transactions.
OLAP uses historical and aggregated OLTP uses real-time and transactional
Data source
data from multiple sources. data from a single source.
OLAP uses multidimensional (cubes) or
Data structure OLTP uses relational databases.
relational databases.
OLAP uses star schema, snowflake OLTP uses normalized or
Data model
schema, or other analytical models. denormalized models.
OLTP has comparatively smaller
OLAP has large storage requirements.
Volume of data storage requirements. Think gigabytes
Think terabytes (TB) and petabytes (PB).
(GB).
OLAP has longer response times, OLTP has shorter response times,
Response time
typically in seconds or minutes. typically in milliseconds
OLAP is good for analyzing trends, OLTP is good for processing
Example
predicting customer behavior, and payments, customer data
applications
identifying profitability. management, and order processing.

2.15 LIST THE TYPES OF OLAP

There are three primary types of OLAP (Online Analytical Processing)

systems: MOLAP (Multidimensional OLAP), ROLAP (Relational OLAP),
and HOLAP (Hybrid OLAP). Each type offers different approaches to
storing and analyzing data, with varying strengths and weaknesses.

1. MOLAP (Multidimensional OLAP):

 Data Storage: MOLAP stores data in a multidimensional cube structure, optimized for fast
data analysis.

 Performance: It excels in speed and efficiency for complex queries due to pre-calculated
aggregations.
 Scalability: MOLAP can struggle with very large datasets.

 Examples: Microsoft Analysis Services, IBM Cognos TM1, and Essbase.

2. ROLAP (Relational OLAP):
 Data Storage: ROLAP stores data in relational databases, leveraging SQL for analysis.

 Performance: ROLAP can handle large volumes of data but may be slower than MOLAP for
complex queries.

 Scalability: ROLAP scales well with large datasets and is suitable for complex queries
against massive datasets.

 Examples: Uses existing relational database systems.

3. HOLAP (Hybrid OLAP):
 Data Storage: HOLAP combines the strengths of both MOLAP and ROLAP, storing some
data in a multidimensional structure and other data in a relational database.

 Performance: HOLAP offers a balance between speed and scalability.

 Scalability: HOLAP is designed to handle large volumes of data while also providing faster
query performance for specific aggregated data.

 Examples: A combination of MOLAP and ROLAP approaches.

2.16 DIFFERENTIATE BETWEEN DATA MINING

AND DATA WAREHOUSE

Aspect Data Warehousing Data Mining

Data mining focuses on

Data warehousing focuses on analyzing data to uncover
storing and organizing large patterns, trends, and
Purpose volumes of structured data. insights.

It involves collecting, It involves using algorithms

cleaning, and integrating data and techniques to analyze
Process from multiple sources. and interpret the data.

It helps predict outcomes

It supports reporting, and make data-driven
Functiona querying, and data analysis decisions based on the
lity tools for business intelligence. insights.

Stores historical and current Works on stored data to

Data data in a structured and identify relationships and
State consistent format. predict future patterns.
Used by data scientists and
Used by data analysts and decision-makers for
business intelligence teams predictive and descriptive
Users for reporting and monitoring. insights.

Generates actionable
Provides organized data for insights from analyzing the
Output easier access and use. data.

Data Warehouse Insights
No ratings yet
Data Warehouse Insights
18 pages
DWDM Unit 1 (R23)
No ratings yet
DWDM Unit 1 (R23)
85 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Unit 2
No ratings yet
Unit 2
26 pages
Data Warehouse Architecture
100% (2)
Data Warehouse Architecture
5 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Data Warehousing and DSS
No ratings yet
Data Warehousing and DSS
53 pages
Unit 3 - Data Warehouse
No ratings yet
Unit 3 - Data Warehouse
26 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
Notes Download Ba
No ratings yet
Notes Download Ba
104 pages
UNIT 1 Data Warehouseing
No ratings yet
UNIT 1 Data Warehouseing
26 pages
Data Warehouse
No ratings yet
Data Warehouse
33 pages
Data Wareousing and Mining-Notes
No ratings yet
Data Wareousing and Mining-Notes
37 pages
DWDM QB
No ratings yet
DWDM QB
29 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Data Warehouse Architecture Guide
No ratings yet
Data Warehouse Architecture Guide
21 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
20it501 DWDM PPT Unit I
No ratings yet
20it501 DWDM PPT Unit I
127 pages
Unit 2
No ratings yet
Unit 2
19 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
50 pages
Database Warehouse Data Mining
No ratings yet
Database Warehouse Data Mining
29 pages
Unit I
No ratings yet
Unit I
18 pages
Data Warehousing Unit 1
No ratings yet
Data Warehousing Unit 1
18 pages
Introduction To Data Warehousing - Overview
No ratings yet
Introduction To Data Warehousing - Overview
21 pages
Unit 3 Notes DWM
No ratings yet
Unit 3 Notes DWM
22 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
Module 2
No ratings yet
Module 2
43 pages
Data Warehousing
No ratings yet
Data Warehousing
2 pages
Application and Adv.
No ratings yet
Application and Adv.
24 pages
Data Warehouse and Design Approaches
No ratings yet
Data Warehouse and Design Approaches
54 pages
KM 2
No ratings yet
KM 2
7 pages
Ex 1
No ratings yet
Ex 1
14 pages
Data Warehouse
No ratings yet
Data Warehouse
6 pages
Data Warehouse: Key Concepts & Architecture
No ratings yet
Data Warehouse: Key Concepts & Architecture
30 pages
Data Ware House
No ratings yet
Data Ware House
203 pages
All Unit
No ratings yet
All Unit
17 pages
Microsoft SQL Server With Plant 3D
No ratings yet
Microsoft SQL Server With Plant 3D
32 pages
Data Warehousing
No ratings yet
Data Warehousing
33 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
9 pages
Data Warehouse Architecture Guide
No ratings yet
Data Warehouse Architecture Guide
10 pages
All About Data-Warehouse
No ratings yet
All About Data-Warehouse
11 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Data Warehousing and Mining Guide
No ratings yet
Data Warehousing and Mining Guide
46 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
DWDM202
No ratings yet
DWDM202
6 pages
Enterprise Data Fabric Design Guide
100% (3)
Enterprise Data Fabric Design Guide
67 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Data Warehouse Overview & Insights
No ratings yet
Data Warehouse Overview & Insights
18 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
Datastage Anwers
No ratings yet
Datastage Anwers
75 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
OLAP Cube Analysis Guide
No ratings yet
OLAP Cube Analysis Guide
4 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
Ecs V10il1 37GV10000 C0 Rev C PDF
No ratings yet
Ecs V10il1 37GV10000 C0 Rev C PDF
32 pages
UltraEdit User Guide
No ratings yet
UltraEdit User Guide
156 pages
DWDM
No ratings yet
DWDM
12 pages
Netbackup Backup and Restore Processes - 20 Unique Mcqs
No ratings yet
Netbackup Backup and Restore Processes - 20 Unique Mcqs
6 pages
Self-Service Analytics Maturity Model Guide
100% (2)
Self-Service Analytics Maturity Model Guide
21 pages
Operating System W14 Answer Paper
No ratings yet
Operating System W14 Answer Paper
47 pages
B Trees
No ratings yet
B Trees
31 pages
Dell EMC Unity XT Hardware
No ratings yet
Dell EMC Unity XT Hardware
115 pages
TR1074 Vsphere MEM
No ratings yet
TR1074 Vsphere MEM
25 pages
Practical File
No ratings yet
Practical File
30 pages
SQL Bit Bank
No ratings yet
SQL Bit Bank
6 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Aa85b-M2s & Aa75b-M2s & Aa55e-M2s - Bios - 121012
No ratings yet
Aa85b-M2s & Aa75b-M2s & Aa55e-M2s - Bios - 121012
33 pages
Juniper Commands v2
No ratings yet
Juniper Commands v2
29 pages
Indira College of Commerce and Science
No ratings yet
Indira College of Commerce and Science
43 pages
Follow Below Steps For Deinstalling XDB Component
No ratings yet
Follow Below Steps For Deinstalling XDB Component
10 pages
SRS v1
No ratings yet
SRS v1
32 pages
Gzip 114
No ratings yet
Gzip 114
7 pages
IR ISO - Intercompany Pricing PDF
No ratings yet
IR ISO - Intercompany Pricing PDF
6 pages
Test Disk
No ratings yet
Test Disk
71 pages
SAP T-Codes Reference Guide
No ratings yet
SAP T-Codes Reference Guide
5 pages
Aditya Kumar CS Practical File
No ratings yet
Aditya Kumar CS Practical File
27 pages
Data Structures: Dictionary ADTs
No ratings yet
Data Structures: Dictionary ADTs
5 pages
Hive Partitions and Buckets Exercises
No ratings yet
Hive Partitions and Buckets Exercises
8 pages
Employee Management System2
No ratings yet
Employee Management System2
14 pages
Congestion Control
No ratings yet
Congestion Control
4 pages
Preboard1-X Centre For Excellence
No ratings yet
Preboard1-X Centre For Excellence
4 pages
Aditya Rai CV
No ratings yet
Aditya Rai CV
1 page

Overview of Data Ware Housing

Uploaded by

Overview of Data Ware Housing

Uploaded by

2.

Over view of Data Ware

 Centralized Data Storage:

 Better Decision Making:

 Improves Data Quality and Consistency:

 Faster Query Performance:

 Historical Data Analysis:

 Supports Business Intelligence Tools:

 Time-saving for Reporting:

 Helps in Trend Analysis and Forecasting:

Feature Database Data Warehouse

2.4. Explain the Data warehouse architecture

Data Warehouse Architecture

There are two common approaches to constructing a data warehouse:

Components of Data Warehouse Architecture

1. Improved Decision Making

❌ Disadvantages of Data Warehouse:

2.4.Expalin three-tier Data Warehouse

Common Challenges in Bottom Tier

OLAP technology is designed for high-speed analytical processing and comes in

Common Challenges in Middle Tier

Common Challenges in Top Tier

2.6.State the imporatances of Operational Data

1. Extract – Get data from various sources (e.g., databases, files).

 Data needs heavy transformation before loading

2. ELT – Extract, Load, Transform

1. Extract – Pull data from source systems.

 The target system is powerful (e.g., cloud-based like Snowflake, BigQuery)

2.8.List Types of Data warehouses

1. Enterprise Data Warehouse (EDW):

 A centralized repository for the entire organization.

 Contains current, up-to-date data rather than historical data.

 Easier to manage and provides faster access to relevant data.

2.9.Explain Data Ware Housing Model

Key Components of Data Warehousing Models:

2.10 explain data ware house design approaches

Data Warehouse Design

Main Data Warehouse Design Approaches

There are three major approaches to data warehouse design:

1. Top-Down Approach (Inmon’s Approach)

2. Bottom-Up Approach (Kimball’s Approach)

Design Modeling Techniques

2.12 DEFINE OLAP

Comparison with OLTP

2.13 LIST THE CHARACTERSTICS OF OLAP

1. Multidimensional Data Analysis:

2. High Performance for Analytical Queries:

3. Support for Complex Calculations and Business Logic:

4. Historical Data Analysis:

7. Data Integration from Multiple Sources:

2.14 DIFFERNTIATE BETWEEN OLTP AND OLAP

2.15 LIST THE TYPES OF OLAP

There are three primary types of OLAP (Online Analytical Processing)

1. MOLAP (Multidimensional OLAP):

 Examples: Microsoft Analysis Services, IBM Cognos TM1, and Essbase.

 Examples: Uses existing relational database systems.

 Performance: HOLAP offers a balance between speed and scalability.

 Examples: A combination of MOLAP and ROLAP approaches.

2.16 DIFFERENTIATE BETWEEN DATA MINING

Aspect Data Warehousing Data Mining

Data mining focuses on

It involves collecting, It involves using algorithms

It helps predict outcomes

Stores historical and current Works on stored data to

You might also like