ch3
ch3
In this example, Customer ID column in the facts table is the foreign keys that join
with the dimension table. By following the links, we can see that row 2 of the fact
table records the fact that customer 3, Gaurav, bought two items on day 8.
Dimension Table
Multi-Dimensional Data Model
• A multidimensional model views data in the form of a data-cube. A data cube enables data
to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
• The dimensions are the perspectives or entities concerning which an organization keeps
records. For example, a shop may create a sales data warehouse to keep records of the
store's sales for the dimension time, item, and location. These dimensions allow the save to
keep track of things, for example, monthly sales of items and the locations at which the
items were sold. Each dimension has a table related to it, called a dimensional table, which
describes the dimension further. For example, a dimensional table for an item may contain
the attributes item_name, brand, and type.
• A multidimensional data model is organized around a central theme, for example, sales. This
theme is represented by a fact table. Facts are numerical measures. The fact table contains
the names of the facts or measures of the related dimensional tables.
Star Scheme
• Star Schema: Star schema is the type of multidimensional model
which is used for data warehouse. In star schema, The fact tables and
the dimension tables are contained. In this schema fewer foreign-key
join is used. This schema forms a star with fact table and dimension
tables.
Snowflake Schema
• Snowflake Schema: Snowflake Schema is also the type of
multidimensional model which is used for data warehouse. In
snowflake schema, The fact tables, dimension tables as well as sub
dimension tables are contained. This schema forms a snowflake with
fact tables, dimension tables as well as sub-dimension tables.
Which Schema to choose?
• The answer depends on your specific needs and requirements.
• If you’re looking for a simple, efficient cloud data warehouse solution,
a star schema might be the best option.
• But if you need more flexibility to accommodate changing data
requirements, a snowflake schema may be a better choice.
• No matter which schema you choose, ThoughtSpot can help you get
the most out of your data. Most BI tools require a specific schema
design to be used, ThoughtSpot has no such restrictions.
Activity
• What is Thought Spot?
• How it is different from Tableau?
Concept of Data Cube
• A data cube is a data structure that,
contrary to tables and
spreadsheets, can store data in
more than 2 dimensions. They are
mainly used for fast retrieval of
aggregated data.
• The key elements of a data cube
are dimensions, attributes, facts
and measures.
Fact Constellation
• Fact Constellation can be referred to as a collection of multiple fact
tables which share dimension tables. Hence, it can even be referred
to as a collection of stars which is also called a galaxy.
• This particular type of schema is usually used for sophisticated
applications.
• An example that refers to this Schema would be usually a Sales
scenario which is shown below.
Examples
• In reference to the example given below, there are two fact tables and
both of them share the Product and Data dimension tables.
• Therefore, the data warehouse model is a combination of two Star
Schemas.
Example
• Placement is a fact table having attributes: (Stud_roll, Company_id, TPO_id) with facts: (Number of students
eligible, Number of students placed).
• Workshop is a fact table having attributes: (Stud_roll, Institute_id, TPO_id) with facts: (Number of students
selected, Number of students attended the workshop).
• Company is a dimension table having attributes: (Company_id, Name, Offer_package).
• Student is a dimension table having attributes: (Student_roll, Name, CGPA).
• TPO is a dimension table having attributes: (TPO_id, Name, Age).
• Training Institute is a dimension table having attributes: (Institute_id, Name, Full_course_fee).
Fact Constellation
• Advantage of Fact Constellation Schema Data Warehouses
• a factless fact table that records each time a student attends a course or which class has the maximum attendance?
• Or what is the average number of attendance of a given course?
• All questions are based on COUNT () with group BY questions.
• So we can first count and then implement other aggregate functions like Aggress, Max, Min.
• 2. Coverage Tables –
• The second type of factless fact table is called a coverage table by
Ralph. It is used to support negative analysis reports. For example, to
create a report that a store did not sell a product for a certain period
of time, you should have a fact table to capture all possible
combinations. Then you can find out what is missing.
• Common examples of factless fact table:
• Ex-Visitors to the office.
• List of people for the web click.
• Tracking student attendance or registration events.
Update to Dimension Table :
• Every day, more and more sales take place, so more and more rows
are added to the fact table.
• Updating due to the change in fact table happens very rarely.
• Dimension tables are more stable as compared to the fact tables.
• Dimension table changes due to the change in attributes themselves,
but not because of an increase in the number of rows.
Slowly Changing Dimensions
• Dimensions are generally constant over time, but if not constant, then they change
slowly. The customer ID of the record remains the same but the marital status or
location of the customer may change over time.
• In the OLTP system, whenever such a change in attribute values happens, the old
values replace the new values by overwriting the old ones.
• But in a data warehouse, overwriting of attributes is not the solution as historical
data for analysis is always required.
• So making such changes in attributes has 3 different types –
• Type 1 Changes
• Type 2 Changes
• Type 3 Changes
• Type 1 – This model involves overwriting the old current value with
the new current value. No history is maintained.
• Type 2 – The current and the historical records are kept and
maintained in the same file or table.
• Type 3 – The current data and historical data are kept in the same
record.
Large Dimension Tables :
• In type 2 changes, a new row is created with the new value of the changed attribute. This
preserves the history or old values of attributes.
• If there is a change again in some attribute, then again a new dimension table row is
created within the new value.
• This is feasible if the dimension changes infrequently, like once or twice a year. For
example, the product dimension, which has rows in thousands, changes rarely so it is
manageable.
• But in the case of customer dimensions, where a number of rows are millions and
changes infrequently, then type 2 changes are feasible and not very difficult. If customer
dimensions change rapidly, then Type 2 changes are problematic and difficult.
• If the dimension table is rapidly changing and large, then break that dimension table into
one or more smaller dimension tables.
Star Schema V/S Fact Constellation
schema
• Examples For Slow changing dimension <SQL Tables>
• https://www.sqlshack.com/implementing-slowly-changing-dimensions-scds-in-data-warehouses/
Introduction to OLAP and OLTP
• The primary purpose of online analytical processing (OLAP) is to
analyze aggregated data, while the primary purpose of online
transaction processing (OLTP) is to process database transactions.
• You use OLAP systems to generate reports, perform complex data
analysis, and identify trends.
• In contrast, you use OLTP systems to process orders, update inventory,
and manage customer accounts.
• Other major differences include data formatting, data architecture,
performance, and requirements. We’ll also discuss an example of
when an organization might use OLAP or OLTP.
OLAP V/S OLTP
Data formatting
• OLAP systems use multidimensional data models, so you can view the same
data from different angles.
• OLAP databases store data in a cube format, where each dimension
represents a different data attribute.
• Each cell in the cube represents a value or measure for the intersection of the
dimensions.
• In contrast, OLTP systems are unidimensional and focus on one data aspect.
• They use a relational database to organize data into tables. Each row in the
table represents an entity instance, and each column represents an entity
attribute.
OLAP V/S OLTP- Database
Architecture
• OLAP database architecture prioritizes data read over data write operations. You
can quickly and efficiently perform complex queries on large volumes of data.
Availability is a low-priority concern as the primary use case is analytics.
• On the other hand, OLTP database architecture prioritizes data write operations. It’s
optimized for write-heavy workloads and can update high-frequency, high-volume
transactional data without compromising data integrity.
• For instance, if two customers purchase the same item at the same time, the OLTP
system can adjust stock levels accurately. And the system will prioritize the
chronological first customer if the item is the last one in stock. Availability is a high
priority and is typically achieved through multiple data backups.
Performance
• The company uses OLTP to process transactions in real time, update inventory levels, and manage
customer accounts. Each store is connected to the central database, which updates the inventory
levels in real time as products are sold. The company also uses OLTP to manage customer accounts
—for example, to track loyalty points, manage payment information, and process returns.
• In addition, the company uses OLAP to analyze the data collected by OLTP. The company’s business
analysts can use OLAP to generate reports on sales trends, inventory levels, customer
demographics, and other key metrics. They perform complex queries on large volumes of historical
data to identify patterns and trends that can inform business decisions. They identify popular
products in a given time period and use the information to optimize inventory budgets.
OLTP Examples
• Connecting to a wide variety of different data systems and data sets including
databases and spreadsheets.
• Providing deep analysis, helping users uncover hidden relationships and
patterns in their data.
• Presenting answers in informative and compelling data visualizations like
reports, maps, charts and graphs.
• Enabling side-by-side comparisons of data under different scenarios.
• Providing drill-down, drill-up and drill-through features, enabling users to
investigate different levels of data.
Latest in BI
• Advanced BI and analytics systems may also integrate artificial
intelligence (AI) and machine learning to automate and streamline
complex tasks. These capabilities further accelerate the ability of
enterprises to analyze their data and gain insights at a deep level.
• Consider, for example, how IBM Cognos Analytics brings together data
analysis and visual tools to support map creation for reports. The
system uses AI to automatically identify geographical information. It
can then refine visualizations by adding geospatial mapping of the
entire globe, an individual neighborhood or anything in between.
BI Architecture
• A business intelligence architecture is a framework for the various
technologies an organization deploys to run business intelligence and
analytics applications.
• It includes the IT systems and software tools that are used to collect,
integrate, store and analyze BI data and then present information on business
operations and trends to corporate executives and other business users.
• The underlying BI architecture is a key element in the execution of a
successful business intelligence program that uses data analysis and reporting
to help an organization track business performance, optimize business
processes, identify new revenue opportunities, improve strategic planning
and make more informed business decisions.
Benefits of BI architecture
• In the absence of a BI architecture, businesses and enterprises are at risk of making costly errors while striving to optimize
their data utilization.
• A well-articulated BI framework can offer organizations the following key benefits:
• Technology benchmarks. A BI architecture articulates the technology standards and data management and business analytics
practices that support an organization's BI efforts, as well as the specific platforms and tools deployed.
• Improved decision-making. Enterprises benefit from an effective BI architecture by using the insights generated by
business intelligence tools to make data-driven decisions that help increase revenue and profits.
• Technology blueprint. A BI framework serves as a technology blueprint for collecting, organizing and managing BI data and
then making the data available for analysis, data visualization and reporting. A strong BI architecture automates reporting and
incorporates policies to govern the use of the technology components.
• Enhanced coordination. Putting such a framework in place enables a BI team to work in a coordinated and disciplined way to
build an enterprise BI program that meets the organization's data analytics needs. The BI architecture also helps BI and data
managers create an efficient process for handling and managing the business data that's pulled into the environment.
• Time savings. By automating the process of collecting and analyzing data, BI helps organizations save time on manual and
repetitive tasks, freeing up their teams to focus on more high-value projects.
• Scalability. An effective BI infrastructure is easily scalable, enabling businesses to change and expand as necessary.
• Improved customer service. Business intelligence enhances customer understanding and service delivery by helping track
customer satisfaction and facilitate timely improvements. For example, an e-commerce store can use BI to track order delivery
times and optimize shipping for better customer satisfaction.
Types of Business Architecture
• Self-Service Business Intelligence:
• The self-service business intelligence architecture is a way of data
analysis by nontechnical staff. Moreover, it can be done by using
simple BI tools and interfaces.
• It is an approach that enables users to access the data with ease. In
addition, the service business intelligence tools allow users to
visualize and check the data without IT teams.
• The primary purpose is to make informed decisions that can result in
positive business outcomes such as higher revenue and better
customer satisfaction.
Types of Business Architecture
• Modern Business Intelligence Architecture:
• Modern business intelligence architecture is a way to address the business reality
with appropriate tools. Furthermore, you can fulfill the needs of business users
with easy-to-use features and a BI environment.
• Modern business intelligence focuses on data analysis and reporting to share data
and optimize business results. Furthermore, modern BI solutions are more
flexible.
• The design of these tools is rapidly changing the business market by supporting
data popularity. Every business understands the objectives and needs of accurate
analysis.
• Thus, the modern BI architecture is designed to achieve business goals at all
levels.
Top 5 business intelligence tools
1. Microsoft Power BI
• One of the most popular BI tools is Power BI from software giant Microsoft. This downloadable
software allows you to run analytics on the cloud or in a reporting server. This interactive tool
syncs with sources such as Facebook, Oracle, and more to generate reports and dashboards in
minutes. It includes built-in AI capabilities, Excel integration, and data connectors. It also offers
end-to-end data encryption and real-time access monitoring.
2. Tableau
• Tableau is known for its user-friendly data visualisation capabilities, but it can do more than make
pretty charts. Their offering includes live visual analytics, an interface that allows users to drag
and drop buttons to spot trends in data quickly. The tool supports data sources such as Microsoft
Excel, Box, PDF files, Google Analytics, and more. Its versatility extends to being able to connect
with most databases.
3. Qlik
• Qlik is a BI tool that emphasises a self-service approach. It supports various analytics use cases, from guided
apps and dashboards to custom and embedded analytics. This tool also offers a user-friendly interface
optimised for touchscreens, sophisticated AI, and high-performance cloud platforms. Its associative
exploration capability, Search & Conversational Analytics, allows users to ask questions and uncover
actionable insights, which helps increase data literacy for those new to using BI tools.
4. Dundas BI
• Dundas BI is a browser-based BI tool that's been around for 25 years. Like Tableau, Dundas BI features a
drag-and-drop function that allows users to analyse data independently without involving their IT team. The
tool is known for its simplicity and flexibility through interactive dashboards, reports, and visual analytics.
Since its inception as a data visualisation tool in 1992, it has evolved into an end-to-end analytics platform
that can compete with today's new BI tools.
5. Sisense
• Sisense is a user-friendly BI tool that focuses on being simplified and streamlined. With this tool, you can
export data from sources like Google Analytics, Salesforce, and more. Its in-chip technology allows for faster
data processing compared to other tools.
Activity
• Explore any of the BI tools and perform at-least 2 functionalities.
Courses Suggested
• https://www.coursera.org/learn/business-intelligence-tools
• https://onlinecourses.swayam2.ac.in/cec19_cs01/preview