0% found this document useful (0 votes)
20 views38 pages

STADVDB Slides 02 - Summarizing Volumes of Data

Module 01 covers the foundational knowledge necessary for understanding Online Analytical Processing (OLAP) and its role in business intelligence. It contrasts OLAP with Online Transaction Processing (OLTP), emphasizing the differences in data handling and query complexity. The module also introduces concepts such as data warehouses, dimensional modeling, and the ETL process, which are essential for effective data analysis and decision-making.

Uploaded by

joe goldberg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views38 pages

STADVDB Slides 02 - Summarizing Volumes of Data

Module 01 covers the foundational knowledge necessary for understanding Online Analytical Processing (OLAP) and its role in business intelligence. It contrasts OLAP with Online Transaction Processing (OLTP), emphasizing the differences in data handling and query complexity. The module also introduces concepts such as data warehouses, dimensional modeling, and the ETL process, which are essential for effective data analysis and decision-making.

Uploaded by

joe goldberg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

SUMMARIZING DATA MODULE 01

Do you have the pre-requisite knowledge?

io n a l M o de l
Relat

Basic SELECT

Functio
n s in SQ
L

JOINing tables
We work with LOTS
of DIFFERENT systems…
Applicants
From
Small Businesses Students
to Human Resource
Large Enterprises, Enrollment
multiple systems
are in place Libraries
AnimoSpace
Usage of INDEPENDENT systems
leads to ISLANDS of DATA …
with possibly different formats
For example…
How can these
ISLANDS OF DATA
be utilized for
BUSINESS INTELLIGENCE
for planning and
decision-making activities?
In this
module,
you’ll
learn
about… Online Analytical Processing (OLAP) Revisiting SQL
TOPIC ONLINE ANALYTICAL PROCESSING
What did your readings
tell you about…

• OLAP?
• Comparison with
Relational models?
• Data Warehouse?
• Analytics?
• Related terms?

https://www.cleverism.com/what-is-olap/
Data
Analysis

Operations

Reports
Decision
-Making
Online Analytical Processing
(OLAP)
• Fast multidimensional analysis of large volumes of data for business intelligence and
decision support
• Extracts data from multiple relational datasets and reorganizes it into a
multidimensional format to enable fast processing and analysis

https://www.ibm.com/cloud/learn/olap

• A data discovery tool


• Enables users to perform multidimensional analysis of data from different
perspectives or points of view
https://www.cleverism.com/what-is-olap/
OLTP vs OLAP
OLTP OLAP
• Online transaction processing • Online analytical processing
• Large volume of short transaction • Low volume of very complex queries that
operations (INSERT, UPDATE, DELETE) on the involve data aggregation
database • Generated analytical reports can aid in
• Fast query processing using basic SELECT business intelligence and decision making
statements • Modern terms – data mining, data
• Maintains data integrity within a multi-user, warehousing, data analytics, business
multi-access environment intelligence
• Queries contain detailed and current data
• Give some examples
Transactional DB vs
Analytical DB
Operational Database Analytical Database, e.g., Data Warehouse
Day-to-day transaction processing Historical analytical processing
Used by operational users (clerks, DBAs, DB Used by knowledge workers (analysts, managers,
professionals) executives)
Used to run the business Used to analyze the business
Narrow, planned and simple updates and queries Broad, adhoc, complex queries and analysis
Focuses on Data In (insert, modify, retrieve) Focuses on Information Out (read only)
Based on Entity Relationship and Relational Based on Star, Snowflake and Constellation
Models Schema
Primitive and highly detailed, Summarized and consolidated,
flat relational view of data multidimensional view of data
DB size: 100MB to 100GB DB size: 100GB to 100TB
Number of users: thousands Number of users: hundreds 14

http://www.tutorialspoint.com/dwh/dwh_overview.htm
The OLAP Process

Application-oriented Unified view


heterogenous data of data

Integrates
heterogenous
data

Image courtesy of: https://smartboost.com/blog/how-to-use-online-analytical-processing-olap-in-marketing/

Connolly & Begg, 2015


Data Warehouse
Data Warehouse
• Refers to a data repository that is maintained separately from an organization’s
operational databases
• A subject-oriented, integrated, time-variant, and nonvolatile collection of data in
support of management’s decision-making process
• Generalize and consolidate data in multidimensional space
• Provide OLAP tools for business executives to systematically organize, understand,
and use their multidimensional data of varied granularities for generalization and
data mining in strategic decision-making activities

Han, Kamber & Pei, 2012


Dimensions of Data
• Business data have multiple dimensions
• Dimensions
• The entities with respect to which an enterprise preserves the records
• Example: Sales data dimensions
• Location – region, country, province, city, store
• Time – year, quarter, month, week, day
• Product – type (clothing, food, devices), brand, price

https://www.ibm.com/cloud/learn/olap
Dimensionality Modeling
• Logical design technique
• Aims to present the data in a standard, intuitive form that allows for
high-performance access
• Every dimensional model (DM) is composed of:
• One (1) Fact table with a composite primary key
• Two (2) or more Dimension tables, each with a simple primary key that references one of
the components of the composite key in the Fact table

Connelly & Begg, 2015


Dimensional Model
• Fact table
• Contains tuples of recorded factual data
• Facts
• Generated by events that occurred in the past
• Are unlikely to change, regardless of how they are analyzed

• Dimensional table:
• Contains tuples of attributes describing reference data
• Attributes are used as the constraints in data warehouse queries
• Dimensions
• The entities with respect to which an enterprise preserves the records (TutorialsPoint.com)

Connelly & Begg, 2015


Dimensional Model
• Star schema
• A logical structure that has a Fact table in the center, surrounded by
denormalized Dimension tables
• Can be used to speed up query performance by denormalizing reference information
into a single dimension table
• Excellent for adhoc queries, but bad for OLTP

Connelly & Begg, 2015


Star Schema -
Components
Fact tables contain
factual or quantitative
data

1:N relationship between Dimension tables are


dimension tables and fact denormalized to maximize
tables performance

Dimension tables contain


descriptions about the subjects of
the business
Hoffer, Ramesh & Topi, 2018
Star Schema - Example
Fact table provides statistics for
sales broken down by product,
period and store dimensions

Hoffer, Ramesh & Topi, 2018


Star Schema - Example

Hoffer, Ramesh & Topi, 2018


Dimensional Model
• Snowflake schema
• Variant of the star schema
• Some dimension tables are normalized
normalized to form a
hierarchy

normalized

Connelly & Begg, 2015 https://www.sciencedirect.com/topics/computer-science/star-schema


Surrogate Keys
• All natural keys are replaced with surrogate keys (non-intelligent and non-business
related)
• Every join between fact and dimension tables is based on surrogate keys, not
natural keys
• Business keys may change over time. Surrogate keys -
• Allow the data in the warehouse to have some independence from the data
used and produced by the OLTP systems
• Help keep track of non-key attribute values for a given production key
• Are simpler and shorter
• Can be same length and format for all keys
Connelly & Begg, 2015; Hoffer, Ramesh & Topi, 2018
Granularity of the Fact Table
• What level of detail do you want?
• Transactional grain - finest level
• Aggregated grain - more summarized

• Finer grains
• Better market analysis capability
• More dimension tables, more rows in fact table

• In Web-based commerce, finest granularity is a click

Hoffer, Ramesh & Topi, 2018


Size of the Fact Table
• Depends on the number of dimensions and the grain of the fact table
Number of rows = product of number of possible values
for each dimension associated with the fact table
• Given the following values:

Total rows calculated as follows


(assuming only half of the total products have recorded sales for a given month):

Hoffer, Ramesh & Topi, 2018


ETL
Non-volatile Data
• Data in the Warehouse
• Comes from multiple heterogenous sources
• Not updated in real-time but is refreshed from operational systems on a regular basis

• New data is always added as a supplement to the database, rather than a


replacement
• Update-driven approach

• Integrated data is available for direct querying and analysis

Connelly & Begg, 2015


Update Driven Approach: When to Gather Data?

Source Driven Destination Driven


Data sources transmit new Warehouse periodically requests
information to warehouse, either new information from data
continuously or periodically sources

Keeping warehouse exactly synchronized with data sources is too expensive.

Silberschatz, Korth & Sudarshan, 2019


Considerations in Building a DW
• The design of the DW should support ad-hoc querying
• Acquisition of data for the warehouse
• Data must be extracted from multiple heterogeneous sources
• Data must be formatted for consistency within the warehouse
• Data must be cleaned to ensure validity
• Data must be fitted into the data model of the warehouse
• Data must be loaded into the warehouse
• Ensures data storage meets the query requirements efficiently
• Gives full consideration to the environment in which the data resides
Elmasri & Navathe, 2016
Extraction Targets one or more internal data sources, e.g., OLTP
databases, personal databases and spreadsheets,
Enterprise Resource Planning (ERP) files, web usage log files
May include external sources from suppliers and customers

ETL Transformation Applies a series of rules or functions to the extracted data


to prepare them for analysis

Manage May involve data summations, data encoding, data

r
merging, data splitting, data calculations, and creation of
surrogate keys

Loading Additional constraints defined in the database schema can


be activated, e.g., uniqueness, referential integrity, and
mandatory fields

Hands-on Activity 01 Connelly & Begg, 2015


We can now perform
ANALYSIS across
HETEROGENEOUS data sources
without disrupting
TRANSACTIONAL performance
Learning
Activities
Take Ex 02: Advanced SQL Self-Assessment

Perform Hands-on Activity: H 01: ETL Tool

Take Ex 03 : OLAP Self-Assessment


References
Chapter 32: Data Warehouse Design
Connolly, T. & Begg, C. (2015). Database Systems: A Practical Approach to Design, Implementation,
and Management, 6th Edition. Harlow, Essex: Addison-Wesley
Chapter 29: Overview of Data Warehousing and OLAP
Elmasri, R. & Navathe, S. (2016). Fundamentals of Database Systems, 7th Edition. Boston:
Pearson/Addison Wesley
Chapter 9: Data Warehousing
Hoffer, J., Ramesh, V. and Topi, H. (2018). Modern Database Management, 12th Edition. Upper Saddle River,
N.J.: Pearson/Prentice Hall
Chapter 11: Data Analytics
Silberschatz, A., Korth, H. & Sudarshan, S. (2019). Database System Concepts, 7th Edition. McGraw-Hill Book Co.
Chapter 04: Data Warehousing and OLAP
Han, J., Kamber, M. & Pei, J. (2012). Data Mining, 3rd Edition. he Morgan Kaufmann Series in Data Management
Systems, ScienceDirect (DLSU Institutional Access: https://www.sciencedirect.com/book/9780123814791/data-mining-
concepts-and-techniques)

Online
www.tutorialspoint/dwh/index.htm
TutorialsPoint.com, Data Warehousing Tutorial
Miscellaneous References
• https://www.cleverism.com/what-is-olap/
• https://www.guru99.com/online-analytical-processing.html
• https://www.ibm.com/cloud/learn/olap
• https://www.stitchdata.com/resources/oltp-vs-olap/
• https://www.commbox.io/how-data-analysis-and-reports-can-improve-custo
mer-service/

You might also like