0% found this document useful (0 votes)
12 views

ch3

Module 3 covers data dimension models, including the Star, Snowflake, and Fact Constellation schemas, which organize data for analysis in data warehousing. It discusses the advantages and disadvantages of dimensional modeling, the roles of fact and dimension tables, and the concept of slowly changing dimensions. Additionally, it contrasts OLAP and OLTP systems, highlighting their purposes, data structures, and performance characteristics.

Uploaded by

shbhamare123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ch3

Module 3 covers data dimension models, including the Star, Snowflake, and Fact Constellation schemas, which organize data for analysis in data warehousing. It discusses the advantages and disadvantages of dimensional modeling, the roles of fact and dimension tables, and the concept of slowly changing dimensions. Additionally, it contrasts OLAP and OLTP systems, highlighting their purposes, data structures, and performance characteristics.

Uploaded by

shbhamare123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Module 3

Data Dimension Models


Content
• Data dimension models
• The Star schema; The Snowflake schema; Fact Constellation schema or
families of star, Fact tables and dimension tables; the fact-less fact table,
• Updates to dimension tables: slowly changing dimensions, type 1
changes, type 2 changes, type 3 changes, Large dimension tables,
rapidly changing or large slowly changing dimensions, Junk Dimensions.
• Introduction to OLAP and OLTP
• Business intelligence: Effective and timely decisions, Data, information
and knowledge, The role of mathematical models, Business intelligence
architectures.
• Dimensional data modeling is a technique used in data warehousing
to organize and structure data in a way that makes it easy to analyze
and understand. In a dimensional data model, data is organized into
dimensions and facts.
• By providing a simple and intuitive structure for the data, the
dimensional model makes it easy for users to access and understand
the data they need to make informed business decisions
Advantages of Data Dimension
Model
• Simplified Data Access: Dimensional data modeling enables users to easily access
data through simple queries, reducing the time and effort required to retrieve
and analyze data.
• Enhanced Query Performance: The simple structure of dimensional data
modeling allows for faster query performance, particularly when compared to
relational data models.
• Increased Flexibility: Dimensional data modeling allows for more flexible data
analysis, as users can quickly and easily explore relationships between data.
• Improved Data Quality: Dimensional data modeling can improve data quality by
reducing redundancy and inconsistencies in the data.
• Easy to Understand: Dimensional data modeling uses simple, intuitive structures
that are easy to understand, even for non-technical users.
Disadvantages of Dimensional
Data Modeling
• Limited Complexity: Dimensional data modeling may not be suitable for
very complex data relationships, as it relies on simple structures to
organize data.
• Limited Integration: Dimensional data modeling may not integrate well
with other data models, particularly those that rely on normalization
techniques.
• Limited Scalability: Dimensional data modeling may not be as scalable
as other data modeling techniques, particularly for very large datasets.
• Limited History Tracking: Dimensional data modeling may not be able to
track changes to historical data, as it typically focuses on current data.
Elements of Dimensional Modeling
• Fact
• It is a collection of associated data items, consisting of measures and context data. It typically
represents business items or business transactions.
• Dimensions
• It is a collection of data which describe one business dimension. Dimensions decide the
contextual background for the facts, and they are the framework over which OLAP is
performed.
• Measure
• It is a numeric attribute of a fact, representing the performance or behavior of the business
relative to the dimensions.
• There are two basic models which are used in dimensional modeling:
• Star Model
• Snowflake Model
Elements of Dimensional Modeling
• Fact Table
• Fact tables are used to data facts or measures in the business. Facts are the
numeric data elements that are of interest to the company.
• Characteristics of the Fact table
• The fact table includes numerical values of what we measure. For example, a
fact value of 20 might means that 20 widgets have been sold.
• Each fact table includes the keys to associated dimension tables. These are
known as foreign keys in the fact table.
• Fact tables typically include a small number of columns.
• When it is compared to dimension tables, fact tables have a large number of
rows.
• Dimension Table
• Dimension tables establish the context of the facts. Dimensional tables store fields that describe the facts.
• Characteristics of the Dimension table
• Dimension tables contain the details about the facts. That, as an example, enables the business analysts to
understand the data and their reports better.
• The dimension tables include descriptive data about the numerical values in the fact table. That is, they contain
the attributes of the facts. For example, the dimension tables for a marketing analysis function might include
attributes such as time, marketing region, and product type.
• Since the record in a dimension table is denormalized, it usually has a large number of columns. The dimension
tables include significantly fewer rows of information than the fact table.
• The attributes in a dimension table are used as row and column headings in a document or query results display.
• Example: A city and state can view a store summary in a fact table. Item summary can be viewed by brand, color,
etc. Customer information can be viewed by name and address.
Example
Fact Table Example

In this example, Customer ID column in the facts table is the foreign keys that join
with the dimension table. By following the links, we can see that row 2 of the fact
table records the fact that customer 3, Gaurav, bought two items on day 8.
Dimension Table
Multi-Dimensional Data Model
• A multidimensional model views data in the form of a data-cube. A data cube enables data
to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
• The dimensions are the perspectives or entities concerning which an organization keeps
records. For example, a shop may create a sales data warehouse to keep records of the
store's sales for the dimension time, item, and location. These dimensions allow the save to
keep track of things, for example, monthly sales of items and the locations at which the
items were sold. Each dimension has a table related to it, called a dimensional table, which
describes the dimension further. For example, a dimensional table for an item may contain
the attributes item_name, brand, and type.
• A multidimensional data model is organized around a central theme, for example, sales. This
theme is represented by a fact table. Facts are numerical measures. The fact table contains
the names of the facts or measures of the related dimensional tables.
Star Scheme
• Star Schema: Star schema is the type of multidimensional model
which is used for data warehouse. In star schema, The fact tables and
the dimension tables are contained. In this schema fewer foreign-key
join is used. This schema forms a star with fact table and dimension
tables.
Snowflake Schema
• Snowflake Schema: Snowflake Schema is also the type of
multidimensional model which is used for data warehouse. In
snowflake schema, The fact tables, dimension tables as well as sub
dimension tables are contained. This schema forms a snowflake with
fact tables, dimension tables as well as sub-dimension tables.
Which Schema to choose?
• The answer depends on your specific needs and requirements.
• If you’re looking for a simple, efficient cloud data warehouse solution,
a star schema might be the best option.
• But if you need more flexibility to accommodate changing data
requirements, a snowflake schema may be a better choice.
• No matter which schema you choose, ThoughtSpot can help you get
the most out of your data. Most BI tools require a specific schema
design to be used, ThoughtSpot has no such restrictions.
Activity
• What is Thought Spot?
• How it is different from Tableau?
Concept of Data Cube
• A data cube is a data structure that,
contrary to tables and
spreadsheets, can store data in
more than 2 dimensions. They are
mainly used for fast retrieval of
aggregated data.
• The key elements of a data cube
are dimensions, attributes, facts
and measures.
Fact Constellation
• Fact Constellation can be referred to as a collection of multiple fact
tables which share dimension tables. Hence, it can even be referred
to as a collection of stars which is also called a galaxy.
• This particular type of schema is usually used for sophisticated
applications.
• An example that refers to this Schema would be usually a Sales
scenario which is shown below.
Examples
• In reference to the example given below, there are two fact tables and
both of them share the Product and Data dimension tables.
• Therefore, the data warehouse model is a combination of two Star
Schemas.
Example

• Placement is a fact table having attributes: (Stud_roll, Company_id, TPO_id) with facts: (Number of students
eligible, Number of students placed).
• Workshop is a fact table having attributes: (Stud_roll, Institute_id, TPO_id) with facts: (Number of students
selected, Number of students attended the workshop).
• Company is a dimension table having attributes: (Company_id, Name, Offer_package).
• Student is a dimension table having attributes: (Student_roll, Name, CGPA).
• TPO is a dimension table having attributes: (TPO_id, Name, Age).
• Training Institute is a dimension table having attributes: (Institute_id, Name, Full_course_fee).
Fact Constellation
• Advantage of Fact Constellation Schema Data Warehouses

• Provides a flexible schema


• Different fact tables are explicitly assigned to the dimensions
• Disadvantage of Fact Constellation Schema Data Warehouses

• Fact Constellation solution is difficult to maintain


• Complexity of the schema involved due to the number of aggregations
Factless Fact tables
• Factless tables simply mean the key available in the fact that no
remedies are available.
• Factless fact tables are only used to establish relationships between
elements of different dimensions.
• And are also useful for describing events and coverage, meaning
tables contain information that nothing has happened. It often
represents many-to-many relationships.
• The only thing they have is an abbreviated key. They still represent a
focal phenomenon that is identified by the combination referenced in
the dimension tables.
Example
Types of Factless Fact Tables
• There are two types of factless table :
• 1. Event Tracking Tables –
• Use a factless fact table to track events of interest to the organization.
For example, attendance at a cultural event can be tracked by creating
a fact table that contains the following foreign keys (i.e. links to
dimension tables) event identifier speaker/entertainment identifier,
participant identifier, event type; Date.
Example

• a factless fact table that records each time a student attends a course or which class has the maximum attendance?
• Or what is the average number of attendance of a given course?
• All questions are based on COUNT () with group BY questions.
• So we can first count and then implement other aggregate functions like Aggress, Max, Min.
• 2. Coverage Tables –
• The second type of factless fact table is called a coverage table by
Ralph. It is used to support negative analysis reports. For example, to
create a report that a store did not sell a product for a certain period
of time, you should have a fact table to capture all possible
combinations. Then you can find out what is missing.
• Common examples of factless fact table:
• Ex-Visitors to the office.
• List of people for the web click.
• Tracking student attendance or registration events.
Update to Dimension Table :

• Every day, more and more sales take place, so more and more rows
are added to the fact table.
• Updating due to the change in fact table happens very rarely.
• Dimension tables are more stable as compared to the fact tables.
• Dimension table changes due to the change in attributes themselves,
but not because of an increase in the number of rows.
Slowly Changing Dimensions

• Dimensions are generally constant over time, but if not constant, then they change
slowly. The customer ID of the record remains the same but the marital status or
location of the customer may change over time.
• In the OLTP system, whenever such a change in attribute values happens, the old
values replace the new values by overwriting the old ones.
• But in a data warehouse, overwriting of attributes is not the solution as historical
data for analysis is always required.
• So making such changes in attributes has 3 different types –
• Type 1 Changes
• Type 2 Changes
• Type 3 Changes
• Type 1 – This model involves overwriting the old current value with
the new current value. No history is maintained.
• Type 2 – The current and the historical records are kept and
maintained in the same file or table.
• Type 3 – The current data and historical data are kept in the same
record.
Large Dimension Tables :

• Large dimension tables are very deep and wide.


• Deep means it has a very large number of rows and wide means it may have many attributes
or columns.
• To handle large dimensions, one can take out some mini dimensions from a large dimension
as per the interest. These mini-dimensions can be represented in the form of a star schema.
• For example, the above-mentioned order analysis star schema is one of the mini-dimensions
of a manufacturing company in which the marketing department of the company is
interested.
• Customers and products are generally large in dimensions.
• Large dimensions are generally slow and inefficient due to their size. They tend to have
multiple hierarchies to perform various OLAP operations like drill down or roll-up.
Rapidly Changing or Large Slowly Changing
Dimensions

• In type 2 changes, a new row is created with the new value of the changed attribute. This
preserves the history or old values of attributes.
• If there is a change again in some attribute, then again a new dimension table row is
created within the new value.
• This is feasible if the dimension changes infrequently, like once or twice a year. For
example, the product dimension, which has rows in thousands, changes rarely so it is
manageable.
• But in the case of customer dimensions, where a number of rows are millions and
changes infrequently, then type 2 changes are feasible and not very difficult. If customer
dimensions change rapidly, then Type 2 changes are problematic and difficult.
• If the dimension table is rapidly changing and large, then break that dimension table into
one or more smaller dimension tables.
Star Schema V/S Fact Constellation
schema
• Examples For Slow changing dimension <SQL Tables>
• https://www.sqlshack.com/implementing-slowly-changing-dimensions-scds-in-data-warehouses/
Introduction to OLAP and OLTP
• The primary purpose of online analytical processing (OLAP) is to
analyze aggregated data, while the primary purpose of online
transaction processing (OLTP) is to process database transactions.
• You use OLAP systems to generate reports, perform complex data
analysis, and identify trends.
• In contrast, you use OLTP systems to process orders, update inventory,
and manage customer accounts.
• Other major differences include data formatting, data architecture,
performance, and requirements. We’ll also discuss an example of
when an organization might use OLAP or OLTP.
OLAP V/S OLTP
Data formatting
• OLAP systems use multidimensional data models, so you can view the same
data from different angles.
• OLAP databases store data in a cube format, where each dimension
represents a different data attribute.
• Each cell in the cube represents a value or measure for the intersection of the
dimensions.
• In contrast, OLTP systems are unidimensional and focus on one data aspect.
• They use a relational database to organize data into tables. Each row in the
table represents an entity instance, and each column represents an entity
attribute.
OLAP V/S OLTP- Database
Architecture
• OLAP database architecture prioritizes data read over data write operations. You
can quickly and efficiently perform complex queries on large volumes of data.
Availability is a low-priority concern as the primary use case is analytics.

• On the other hand, OLTP database architecture prioritizes data write operations. It’s
optimized for write-heavy workloads and can update high-frequency, high-volume
transactional data without compromising data integrity.

• For instance, if two customers purchase the same item at the same time, the OLTP
system can adjust stock levels accurately. And the system will prioritize the
chronological first customer if the item is the last one in stock. Availability is a high
priority and is typically achieved through multiple data backups.
Performance

• OLAP processing times can vary from minutes to hours depending on


the type and volume of data being analyzed.
• To update an OLAP database, you periodically process data in large
batches then upload the batch to the system all at once.
• Data update frequency also varies between systems, from daily to
weekly or even monthly.
• In contrast, you measure OLTP processing times in milliseconds or
less. OLTP databases manage database updates in real time. Updates
are fast, short, and triggered by you or your users.
• Stream processing is often used over batch processing.
When to use OLTP and OLAP
• Let's consider a large retail company that operates hundreds of stores across the country. The
company has a massive database that tracks sales, inventory, customer data, and other key
metrics.

• The company uses OLTP to process transactions in real time, update inventory levels, and manage
customer accounts. Each store is connected to the central database, which updates the inventory
levels in real time as products are sold. The company also uses OLTP to manage customer accounts
—for example, to track loyalty points, manage payment information, and process returns.

• In addition, the company uses OLAP to analyze the data collected by OLTP. The company’s business
analysts can use OLAP to generate reports on sales trends, inventory levels, customer
demographics, and other key metrics. They perform complex queries on large volumes of historical
data to identify patterns and trends that can inform business decisions. They identify popular
products in a given time period and use the information to optimize inventory budgets.
OLTP Examples

• An example considered for OLTP System is ATM Center a person who


authenticates first will receive the amount first and the condition is
that the amount to be withdrawn must be present in the ATM. The
uses of the OLTP System are described below.
• ATM center is an OLTP application.
• OLTP handles the ACID properties during data transactions via the application.
• It’s also used for Online banking, Online airline ticket booking, sending a text
message, add a book to the shopping cart.
OLAP Examples

• Any type of Data Warehouse System is an OLAP system. The uses of


the OLAP System are described below.
• Spotify analyzed songs by users to come up with a personalized
homepage of their songs and playlist.
• Netflix movie recommendation system.
OLAP / OLTP
AWS - Redshift Database
Activity
• How can AWS support your OLAP and OLTP requirements?
• Explore how to use Amazon Redshift to connect to web application.
Business Intelligence
Effective and timely decisions, Data, information and knowledge, The role of
mathematical models, Business intelligence architectures.
What is BI
• Business intelligence (BI) is software that ingests business data and
presents it in user-friendly views such as reports, dashboards, charts
and graphs.
• 4 Concepts of Business Intelligence
• The four key concepts of business intelligence (BI) are
• data collection (Data collection involves gathering relevant information from
various sources)
• analysis,
• visualization,
• decision-making.
.
Need of BI
• Business intelligence greatly enhances how a company approaches its decision-making by
using data to answer questions of the company's past and present.
• It can be used by teams across an organization to track key metrics and organize on goals.
• Business intelligence helps organizations become data-driven enterprises, improve
performance and gain competitive advantage.
• They can: Improve ROI by understanding the business and intelligently allocating
resources to meet strategic objectives.
• Unravel customer behavior, preferences and trends, and use the insights to better target prospects
or tailor products to changing market needs.
• Monitor business operations and fix or make improvements on an ongoing basis, fueled by data
insights.
• Improve supply chain management by monitoring activity up and down the line and communicating
results with partners and suppliers.
How BI works?
• BI platforms traditionally rely on data warehouses for their baseline
information.
• A data warehouse aggregates data from multiple data sources into one
central system to support business analytics and reporting.
• Business intelligence software queries the warehouse and presents the results
to the user in the form of reports, charts and maps.
• Data warehouses can include an online analytical processing (OLAP) engine to
support multidimensional queries.
• For example: What are sales for our eastern region versus our western region this year,
compared to last year?
• newer business intelligence solutions can extract and ingest raw data directly using
technology such as Hadoop
Decision making AND BI
• The best BI software supports this decision-making process by:

• Connecting to a wide variety of different data systems and data sets including
databases and spreadsheets.
• Providing deep analysis, helping users uncover hidden relationships and
patterns in their data.
• Presenting answers in informative and compelling data visualizations like
reports, maps, charts and graphs.
• Enabling side-by-side comparisons of data under different scenarios.
• Providing drill-down, drill-up and drill-through features, enabling users to
investigate different levels of data.
Latest in BI
• Advanced BI and analytics systems may also integrate artificial
intelligence (AI) and machine learning to automate and streamline
complex tasks. These capabilities further accelerate the ability of
enterprises to analyze their data and gain insights at a deep level.
• Consider, for example, how IBM Cognos Analytics brings together data
analysis and visual tools to support map creation for reports. The
system uses AI to automatically identify geographical information. It
can then refine visualizations by adding geospatial mapping of the
entire globe, an individual neighborhood or anything in between.
BI Architecture
• A business intelligence architecture is a framework for the various
technologies an organization deploys to run business intelligence and
analytics applications.
• It includes the IT systems and software tools that are used to collect,
integrate, store and analyze BI data and then present information on business
operations and trends to corporate executives and other business users.
• The underlying BI architecture is a key element in the execution of a
successful business intelligence program that uses data analysis and reporting
to help an organization track business performance, optimize business
processes, identify new revenue opportunities, improve strategic planning
and make more informed business decisions.
Benefits of BI architecture

• In the absence of a BI architecture, businesses and enterprises are at risk of making costly errors while striving to optimize
their data utilization.
• A well-articulated BI framework can offer organizations the following key benefits:
• Technology benchmarks. A BI architecture articulates the technology standards and data management and business analytics
practices that support an organization's BI efforts, as well as the specific platforms and tools deployed.
• Improved decision-making. Enterprises benefit from an effective BI architecture by using the insights generated by
business intelligence tools to make data-driven decisions that help increase revenue and profits.
• Technology blueprint. A BI framework serves as a technology blueprint for collecting, organizing and managing BI data and
then making the data available for analysis, data visualization and reporting. A strong BI architecture automates reporting and
incorporates policies to govern the use of the technology components.
• Enhanced coordination. Putting such a framework in place enables a BI team to work in a coordinated and disciplined way to
build an enterprise BI program that meets the organization's data analytics needs. The BI architecture also helps BI and data
managers create an efficient process for handling and managing the business data that's pulled into the environment.
• Time savings. By automating the process of collecting and analyzing data, BI helps organizations save time on manual and
repetitive tasks, freeing up their teams to focus on more high-value projects.
• Scalability. An effective BI infrastructure is easily scalable, enabling businesses to change and expand as necessary.
• Improved customer service. Business intelligence enhances customer understanding and service delivery by helping track
customer satisfaction and facilitate timely improvements. For example, an e-commerce store can use BI to track order delivery
times and optimize shipping for better customer satisfaction.
Types of Business Architecture
• Self-Service Business Intelligence:
• The self-service business intelligence architecture is a way of data
analysis by nontechnical staff. Moreover, it can be done by using
simple BI tools and interfaces.
• It is an approach that enables users to access the data with ease. In
addition, the service business intelligence tools allow users to
visualize and check the data without IT teams.
• The primary purpose is to make informed decisions that can result in
positive business outcomes such as higher revenue and better
customer satisfaction.
Types of Business Architecture
• Modern Business Intelligence Architecture:
• Modern business intelligence architecture is a way to address the business reality
with appropriate tools. Furthermore, you can fulfill the needs of business users
with easy-to-use features and a BI environment.
• Modern business intelligence focuses on data analysis and reporting to share data
and optimize business results. Furthermore, modern BI solutions are more
flexible.
• The design of these tools is rapidly changing the business market by supporting
data popularity. Every business understands the objectives and needs of accurate
analysis.
• Thus, the modern BI architecture is designed to achieve business goals at all
levels.
Top 5 business intelligence tools

1. Microsoft Power BI
• One of the most popular BI tools is Power BI from software giant Microsoft. This downloadable
software allows you to run analytics on the cloud or in a reporting server. This interactive tool
syncs with sources such as Facebook, Oracle, and more to generate reports and dashboards in
minutes. It includes built-in AI capabilities, Excel integration, and data connectors. It also offers
end-to-end data encryption and real-time access monitoring.

2. Tableau
• Tableau is known for its user-friendly data visualisation capabilities, but it can do more than make
pretty charts. Their offering includes live visual analytics, an interface that allows users to drag
and drop buttons to spot trends in data quickly. The tool supports data sources such as Microsoft
Excel, Box, PDF files, Google Analytics, and more. Its versatility extends to being able to connect
with most databases.
3. Qlik
• Qlik is a BI tool that emphasises a self-service approach. It supports various analytics use cases, from guided
apps and dashboards to custom and embedded analytics. This tool also offers a user-friendly interface
optimised for touchscreens, sophisticated AI, and high-performance cloud platforms. Its associative
exploration capability, Search & Conversational Analytics, allows users to ask questions and uncover
actionable insights, which helps increase data literacy for those new to using BI tools.
4. Dundas BI
• Dundas BI is a browser-based BI tool that's been around for 25 years. Like Tableau, Dundas BI features a
drag-and-drop function that allows users to analyse data independently without involving their IT team. The
tool is known for its simplicity and flexibility through interactive dashboards, reports, and visual analytics.
Since its inception as a data visualisation tool in 1992, it has evolved into an end-to-end analytics platform
that can compete with today's new BI tools.
5. Sisense
• Sisense is a user-friendly BI tool that focuses on being simplified and streamlined. With this tool, you can
export data from sources like Google Analytics, Salesforce, and more. Its in-chip technology allows for faster
data processing compared to other tools.
Activity
• Explore any of the BI tools and perform at-least 2 functionalities.
Courses Suggested
• https://www.coursera.org/learn/business-intelligence-tools
• https://onlinecourses.swayam2.ac.in/cec19_cs01/preview

You might also like