0% found this document useful (0 votes)
2 views

Chapter_1_Introduction_to_Data warehousing

The document provides an introduction to data warehousing, outlining its fundamental concepts, components, and architecture. It discusses the importance of data warehouses in managing and analyzing large amounts of data for strategic decision-making, as well as the historical development and various types of data warehouses. Additionally, it highlights the roles of different components and approaches in data warehousing, including ETL processes and data mining techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter_1_Introduction_to_Data warehousing

The document provides an introduction to data warehousing, outlining its fundamental concepts, components, and architecture. It discusses the importance of data warehouses in managing and analyzing large amounts of data for strategic decision-making, as well as the historical development and various types of data warehouses. Additionally, it highlights the roles of different components and approaches in data warehousing, including ETL processes and data mining techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 1

Introduction of Data
warehousing
Amol D. Vibhute (PhD)
Assistant Professor

Email ID:- [email protected]


Course Goal:
• Understand the fundamental concepts and basic principles of Data
warehousing and data mining
– Introduction to Data warehouse
– Data Warehouse Modelling,
– Extraction-Transform-Load (ETL),
– Data Mining,
• Classification, Clustering and Association.

– Introduction to data mining tool.

Thursday, January 2, 2025 Dr. Amol 2


Reference Books:
• Books Recommended:
1. Data Warehousing By Bpb Publications
2. Data Warehousing, By Sinha, amitesh
3. Data Warehousing design And Development Perspectives, By Krishna,s.Jaya
4. Data Mining: Introductory And Advanced Topics, By Dunham, Margaret H.
5. Data Mining: Methods And Technique, By Ali, Abm. Shawka
6. Data Mining: concepts And Techniques, By Han, jiawei / Kamber, micheline
7. Journal Of Computer Science, G.G.Books & Periodicals

Thursday, January 2, 2025 Dr. Amol 3


Tentative topics:
• Introduction to Data warehouse
• Data Warehouse Modelling,
• Extraction-Transform-Load (ETL),
• Data Mining,
• Classification,
• Clustering and Association,
• Introduction to Data Mining tool
• Application of DW and DM

Thursday, January 2, 2025 Dr. Amol 4


Roadmap of Chapter:
• Introduction and Background,
• What is Data Warehousing?
• Need for data warehousing,
• Role of DW,
• DW characteristics,
• Data Warehouse Architecture and Components,
• Data Marts,
• Application of DW.
Thursday, January 2, 2025 Dr. Amol 5
Introduction:
• A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful
business insights. A data warehouse is a database, which is kept separate from the organization's operational database.
• It is typically used to connect and analyze business data from heterogeneous sources. It is a combination of
technologies and components which helps the strategic use of data.
• It is electronic storage of a large amount of information by a business which is designed for query and analysis instead
of transaction processing.
• It is a process of transforming data into information and making it available to users in a timely manner to make a
difference.
• The Data warehouse benefits users to understand and enhance their organization’s performance.
• The need to warehouse data evolved as computer systems became more complex and needed to handle increasing
amounts of Information. There is no frequent updating done in a data warehouse. It possesses consolidated historical
data, which helps the organization to analyze its business. A data warehouse helps executives to organize, understand,
and use their data to take strategic decisions. Data warehouse systems help in the integration of diversity of application
systems. A data warehouse system helps in consolidated historical data analysis.
Thursday, January 2, 2025 Dr. Amol 6
Cont.…
• Subject oriented, integrated, time-variant, nonvolatile collection of data in support of
management’s decision making process,
• Different from RDBMS, transaction processing system, and file systems,
• Kept separate from organization’s live database,
• No frequent updates in a DW,
• Contains consolidated historical data,
• Provides architecture and tools for business executives to systematically organize,
understand and use data to make strategic decisions,
• Data warehousing – process of constructing and using DW.

Thursday, January 2, 2025 Dr. Amol 7


History of Data warehouse:
• 1960 (end of paper work-Tapes are used)- Dartmouth and General Mills in a joint
research project, develop the terms dimensions and facts.
• 1970 (RAM)- A Nielsen and IRI introduces dimensional data marts for retail sales.
• 1975-Online Transaction Processing (OLTP)
• 1983- Tera Data Corporation introduces a database management system which is
specifically designed for decision support
• Data warehousing started in the late 1980s when IBM worker Paul Murphy and Barry
Devlin developed the Business Data Warehouse.
• 1990-the real concept was given by Inmon Bill.
•A father of data warehouse.
•Written about a variety of topics for building, usage, and maintenance of the
warehouse & the Corporate Information Factory.
•According to Inmon, a data warehouse is a subject oriented, integrated, time-variant,
and non-volatile collection of data. This data helps analysts to take informed decisions
in an organization.

Thursday, January 2, 2025 Dr. Amol 8


Similar names of Data warehouse:

Source: https://www.guru99.com/data-warehousing.html
Thursday, January 2, 2025 Dr. Amol 9
Need and Role of Data warehouse:
• An ordinary Database can store MBs to GBs of data and that too for a specific purpose.
• For storing data of TB size, the storage shifted to the Data Warehouse.
• Besides this, a transactional database doesn’t offer itself to analytics.
• To effectively perform analytics, an organization keeps a central Data Warehouse to
closely study its business by organizing, understanding, and using its historical data for
making strategic decisions and analyzing trends.

Thursday, January 2, 2025 Dr. Amol 10


Data Warehouse Features/characteristics:
• The key features of a data warehouse are discussed below −
– Subject Oriented − A data warehouse is subject oriented because it provides information around a
subject rather than the organization's ongoing operations. These subjects can be product, customers,
suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it
focuses on modeling and analysis of data for decision making.
– Integrated − A data warehouse is constructed by integrating data from heterogeneous sources such as
relational databases, flat files, etc. This integration enhances the effective analysis of data.
– Time Variant − The data collected in a data warehouse is identified with a particular time period. The
data in a data warehouse provides information from the historical point of view.
– Non-volatile − Non-volatile means the previous data is not erased when new data is added to it. A data
warehouse is kept separate from the operational database and therefore frequent changes in operational
database is not reflected in the data warehouse.
• Note − A data warehouse does not require transaction processing, recovery, and
concurrency controls, because it is physically stored and separate from the operational
database.

Thursday, January 2, 2025 Dr. Amol 11


Components of Data warehouse:
• Four components of Data Warehouses are:
• Load manager:
– Load manager is also called the front component. It performs with all the operations associated with the extraction
and load of data into the warehouse. These operations include transformations to prepare the data for entering into
the Data warehouse.
• Warehouse Manager:
– Warehouse manager performs operations associated with the management of the data in the warehouse. It performs
operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization
and aggregations, transformation and merging of source data and archiving and baking -up data.
• Query Manager:
– Query manager is also known as backend component. It performs all the operation operations related to the
management of user queries. The operations of this Data warehouse components are direct queries to the
appropriate tables for scheduling the execution of queries.
• End-user access tools:
– This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4.
EIS tools, 5. OLAP tools and data mining tools.

Thursday, January 2, 2025 Dr. Amol 12


Who needs Data warehouse?
• DWH (Data warehouse) is needed for all types of users like:
– Decision makers who rely on mass amount of data
– Users who use customized, complex processes to obtain information from multiple data
sources.
– It is also used by the people who want simple technology to access the data
– It also essential for those people who want a systematic approach for making decisions.
– If the user wants fast performance on a huge amount of data which is a necessity for
reports, grids or charts, then Data warehouse proves useful.
– Data warehouse is a first step If you want to discover ‘hidden patterns’ of data-flows and
groupings.

Thursday, January 2, 2025 Dr. Amol 13


How Data warehouse works?
• Works as a central repository where information arrives from one or more data sources.
• Data flows into a data warehouse from the transactional system and other relational
databases.
– Data may be: Structured, Semi-structured, and Unstructured data.
• The data is processed, transformed, and ingested so that users can
access the processed data in the Data Warehouse through Business
Intelligence tools, SQL clients, and spreadsheets.
• A data warehouse merges information coming from different sources into
one comprehensive database.
– By merging all of this information in one place, an organization can analyze its
customers more holistically. This helps to ensure that it has considered all the
information available.
– Data warehousing makes data mining possible. Data mining is looking for patterns in
the data that may lead to higher sales and profits.

Thursday, January 2, 2025 Dr. Amol 14


Types/models of Data Warehouse:
• Three main types of Data Warehouses (DWH) are:
– Enterprise Data Warehouse (EDW):
• a centralized warehouse and provides decision support service across
the enterprise.
– Operational Data Store:
• Data warehouse is refreshed in real time. Hence, it is widely preferred
for routine activities like storing records of the Employees.
– Data Mart:
• a subset of the data warehouse and specially designed for a particular
line of business, such as sales, finance, sales or finance.

Thursday, January 2, 2025 Dr. Amol 15


Data Warehouse Architecture and Components:
• A data warehouse architecture is a method of defining the overall architecture of data
communication processing and presentation that exist for end-clients computing within the
enterprise.
• Each data warehouse is different, but all are characterized by standard vital components.
• Data warehouses and their architectures very depending upon the elements of an
organization's situation.
• Three common architectures are:
– Basic
– With Staging Area
– With Staging Area and Data Marts

Thursday, January 2, 2025 Dr. Amol 17


Data Warehouse Architecture: Basic
• Operational System:
– An operational system is a method used in data warehousing
to refer to a system that is used to process the day-to-day
transactions of an organization.
• Flat Files:
– A Flat file system is a system of files in which transactional
data is stored, and every file in the system must have a
different name.
• Meta Data:
– A set of data that defines and gives information about other
data.
– Meta Data used in Data Warehouse for a variety of purpose,
including:
– Meta Data summarizes necessary information about data,
which can make finding and work with particular instances of
data more accessible. For example, author, data build, and
data changed, and file size are examples of very basic
document metadata.
– Metadata is used to direct a query to the most appropriate
data source.

Thursday, January 2, 2025 Dr. Amol 18


Data Warehouse Architecture: Basic
• Lightly and highly summarized data:
– The area of the data warehouse saves all the predefined lightly and highly summarized (aggregated) data
generated by the warehouse manager.
– The goals of the summarized information are to speed up query performance. The summarized record is updated
continuously as new information is loaded into the warehouse.
• End-User access Tools:
– The principal purpose of a data warehouse is to provide information to the business managers for strategic
decision-making. These customers interact with the warehouse using end-client access tools.

• The examples of some of the end-user access tools can be:


• Reporting and Query Tools
• Application Development Tools
• Executive Information Systems Tools
• Online Analytical Processing Tools
• Data Mining Tools

Thursday, January 2, 2025 Dr. Amol 19


Data Warehouse Architecture: With Staging Area
• We must clean and process your operational information before put it into the warehouse.
• We can do this programmatically, although data warehouses uses a staging area (A place
where data is processed before entering the warehouse).
• A staging area simplifies data cleansing and consolidation for operational method coming
from multiple source systems, especially for enterprise data warehouses where all
relevant data of an enterprise is consolidated.
• Data Warehouse Staging Area is a temporary location where a record from source
systems is copied.

Thursday, January 2, 2025 Dr. Amol 20


Data Warehouse Architecture: With Staging Area
and Data Marts
• We may want to customize our warehouse's architecture for multiple groups within our
organization.
• We can do this by adding data marts. A data mart is a segment of a data warehouses
that can provided information for reporting and analysis on a section, unit, department or
operation in the company, e.g., sales, payroll, production, etc.
• The figure illustrates an example where purchasing, sales, and stocks are separated. In
this example, a financial analyst wants to analyze historical data for purchases and sales
or mine historical information to make predictions about customer behavior.

Thursday, January 2, 2025 Dr. Amol 21


Cont.…
• Inmon’s approach (Top-down):
– Within Inmon’s approach, firstly, a centralized repository for enterprise information is designed according to
a normalized data model, where atomic data is stored in tables that are grouped together by subject areas
with the help of joins. After the enterprise data warehouse is built, the data stored there is used to structure
data marts.
– Inmon’s approach is more preferable in cases when you need to:
• Get a single source of truth while ensuring data consistency, accuracy and reliability
• Quickly develop data marts with no effort duplication for extracting data from original sources, cleansing, etc.
– However, one of the major constraints of this method is that the setup and implementation is more time and
resource-consuming compared to Kimball’s approach.

Thursday, January 2, 2025 Dr. Amol 22


Cont.…
• Kimball’s approach (Bottom-up):
– Kimball’s approach suggests that dimensional data marts should be created first, then if required, a company
may proceed with creating a logical enterprise data warehouse.
– The advocates of this approach point out that since dimensional data marts require minimal normalization, such
data warehouse projects take less time and resources. On the other hand, we may find duplicate data in tables
and have to repeat ETL activities, as each data mart is created independently.

Thursday, January 2, 2025 Dr. Amol 23


Cont.…
• Top-down approach (Inmon Bill):
– Advantages of Top-Down Approach –
• Since the data marts are created from the data warehouse, provides consistent dimensional view of data
marts.
• Also, this model is considered as the strongest model for business changes. That’s why, big organizations
prefer to follow this approach.
• Creating data mart from data warehouse is easy.
– Disadvantages of Top-Down Approach –
• The cost, time taken in designing and its maintenance is very high.
• Bottom-up approach (Kimball):
– Advantages of Bottom-Up Approach –
• As the data marts are created first, so the reports are quickly generated.
• We can accommodate more number of data marts here and in this way data warehouse can be extended.
• Also, the cost and time taken in designing this model is low comparatively.
– Disadvantage of Bottom-Up Approach –
• This model is not strong as top-down approach as dimensional view of data marts is not consistent as it is in
above approach.

Thursday, January 2, 2025 Dr. Amol 24


Types of Data Warehouse Architectures:
• Single-Tier Architecture
– Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount of data stored to
reach this goal; it removes data redundancies.

– The figure shows the only layer physically available is the source layer. In this method, data warehouses are
virtual. This means that the data warehouse is implemented as a multidimensional view of operational data
created by specific middleware, or an intermediate processing layer.

Thursday, January 2, 2025 Dr. Amol 25


Cont.…
• Two-Tier Architecture: aka client-server model
• Although it is typically called two-layer architecture to highlight a separation between
physically available sources and data warehouses, in fact, consists of four
subsequent data flow stages:
– Source layer: A data warehouse system uses a heterogeneous source of data.
That data is stored initially to corporate relational databases or legacy
databases, or it may come from an information system outside the corporate
walls.
– Data Staging: The data stored to the source should be extracted, cleansed to
remove inconsistencies and fill gaps, and integrated to merge heterogeneous
sources into one standard schema. The so-named Extraction, Transformation,
and Loading Tools (ETL) can combine heterogeneous schemata, extract,
transform, cleanse, validate, filter, and load source data into a data warehouse.
– Data Warehouse layer: Information is saved to one logically centralized
individual repository: a data warehouse. The data warehouses can be directly
accessed, but it can also be used as a source for creating data marts, which
partially replicate data warehouse contents and are designed for specific
enterprise departments. Meta-data repositories store information on sources,
access procedures, data staging, users, data mart schema, and so on.
– Analysis: In this layer, integrated data is efficiently, and flexible accessed to
issue reports, dynamically analyze information, and simulate hypothetical
business scenarios. It should feature aggregate information navigators,
complex query optimizers, and customer-friendly GUIs.
Thursday, January 2, 2025 Dr. Amol 26
Cont.…
• Three-Tier Architecture: Top, middle and bottom
• The three-tier architecture consists of the source layer (containing multiple
source system), the reconciled layer and the data warehouse layer
(containing both data warehouses and data marts). The reconciled layer
sits between the source data and data warehouse.

• The main advantage of the reconciled layer is that it creates a standard


reference data model for a whole enterprise. At the same time, it separates
the problems of source data extraction and integration from those of data
warehouse population. In some cases, the reconciled layer is also directly
used to accomplish better some operational tasks, such as producing daily
reports that cannot be satisfactorily prepared using the corporate
applications or generating data flows to feed external processes
periodically to benefit from cleaning and integration.

• This architecture is especially useful for the extensive, enterprise-wide


systems. A disadvantage of this structure is the extra file storage space
used through the extra redundant reconciled layer. It also makes the
analytical tools a little further away from being real-time.

Thursday, January 2, 2025 Dr. Amol 27


3-tier DW Architecture:

Thursday, January 2, 2025 Dr. Amol 28


ETL Process:
• Data extraction gathers data from multiple, heterogeneous and external
Sources,
• Data cleaning detects errors in the data and rectifies them,
• Data transformation converts data from legacy format to warehouse format,
• Load sorts, summarizes, consolidates, computes views, checks integrity,
builds indices and partitions,
• Refresh propagates updates from data sources to DW.

Thursday, January 2, 2025 Dr. Amol 29


Applications of Data Warehouse:
• Airline:
– In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program
promotions, etc.
• Banking:
– It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the ma rket
research, performance analysis of the product and operations.
• Healthcare:
– Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient’s treatment reports, share da ta
with tie-in insurance companies, medical aid services, etc.
• Public sector:
– In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every
individual.
• Investment and Insurance sector:
– In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements.
• Retain chain:
– In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items, customer buying pattern,
promotions and also used for determining pricing policy.
• Telecommunication:
– A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions.
• Hospitality Industry:
– This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want
to target clients based on their feedback and travel patterns.

Thursday, January 2, 2025 Dr. Amol 30


Advantages of Data Warehouse (DWH):
• Data warehouse allows business users to quickly access critical data from some sources
all in one place.
• Data warehouse provides consistent information on various cross-functional activities. It is
also supporting ad-hoc reporting and query.
• Data Warehouse helps to integrate many sources of data to reduce stress on the
production system.
• Data warehouse helps to reduce total turnaround time for analysis and reporting.
• Restructuring and Integration make it easier for the user to use for reporting and analysis.
• Data warehouse allows users to access critical data from the number of sources in a single
place. Therefore, it saves user’s time of retrieving data from multiple sources.
• Data warehouse stores a large amount of historical data. This helps users to analyze
different time periods and trends to make future predictions.

Thursday, January 2, 2025 Dr. Amol 31


Disadvantages of Data Warehouse (DWH):
• Not an ideal option for unstructured data.
• Creation and Implementation of Data Warehouse is surely time confusing affair.
• Data Warehouse can be outdated relatively quickly.
• Difficult to make changes in data types and ranges, data source schema, indexes, and
queries.
• The data warehouse may seem easy, but actually, it is too complex for the average users.
• Despite best efforts at project management, data warehousing project scope will always
increase.
• Sometime warehouse users will develop different business rules.
• Organizations need to spend lots of their resources for training and Implementation
purpose.

Thursday, January 2, 2025 Dr. Amol 32


Thank You !!!

Thursday, January 2, 2025 Dr. Amol 33

You might also like