0% found this document useful (0 votes)
4 views

Data Cube

The document discusses Data Cubes, which are multidimensional matrices used to organize and analyze data across various dimensions such as time, item, and location. It explains the structure of data cubes, including fact and dimension tables, and outlines operations like roll-up, drill-down, slice, dice, and pivot that facilitate data analysis. Additionally, it compares OLAP (Online Analytical Processing) with OLTP (Online Transaction Processing) and describes different types of OLAP systems, their advantages, and disadvantages.

Uploaded by

mish9976.pc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Cube

The document discusses Data Cubes, which are multidimensional matrices used to organize and analyze data across various dimensions such as time, item, and location. It explains the structure of data cubes, including fact and dimension tables, and outlines operations like roll-up, drill-down, slice, dice, and pivot that facilitate data analysis. Additionally, it compares OLAP (Online Analytical Processing) with OLTP (Online Transaction Processing) and describes different types of OLAP systems, their advantages, and disadvantages.

Uploaded by

mish9976.pc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Data Cube

• Data is grouped or combined in


multidimensional matrices called Data Cubes.
• The data cube method has a few alternative
names or a few variants, such as
"Multidimensional databases," "materialized
views," and "OLAP (On-Line Analytical
Processing).“
• For instance, All Electronics can create a sales
data warehouse to maintain records of the
store’s sales-related dimensions time, item,
branch, and location.
• These dimensions enable the store to
maintain track of things like monthly sales of
items and the branches and locations at which
the items were sold.
• Each dimension can have a table related to it.
It is known as a dimension table, which
further represents the dimension.
• For instance, a dimension table for an item
can include the attributes item name, brand,
and type.
• Dimension tables can be determined by users
or professionals, or automatically created and
adjusted established on data distributions.
• A multidimensional data model is generally organized
around a central design, like sales, for instance.
• This design is defined by a fact table.
• Facts are mathematical measures.
• Examples of facts for a sales data warehouse
contains dollars sold (sales amount in dollars), units
sold (number of units sold), and the amount
budgeted.
• The fact table includes the names of the facts or
measures and keys to each of the associated
dimension tables.
• A data cube is generated from a subset of
attributes in the database.
• Data cubes can be sparse in some cases
because not every cell in each dimension can
have corresponding information in the
database.
• For example, a relation with the schema sales
(part, supplier, customer, and sale-price) can
be materialized into a set of eight views.
• psc indicates a view consisting of aggregate
function value computed by grouping three
attributes part, supplier, and customer,
• p indicates a view composed of the
corresponding aggregate function values
calculated by grouping part alone, etc.
• A data cube enables data to be modeled and
viewed in multiple dimensions.
• A multidimensional data model is organized
around a central theme, like sales and
transactions.
• A fact table represents this theme.
• Facts are numerical measures.
• Dimensions are a fact that defines a data cube.
• Facts are generally quantities, which are used
for analyzing the relationship between
dimensions.
Example:
In the 2-D representation, we will look at
the All Electronics sales data for items sold
per quarter in the city of Vancouver. The
measured display in dollars sold (in
thousands).
3-Dimensional Cuboids
• We would like to view the sales data with a third
dimension.
• For example, suppose we would like to view the
data according to time, item as well as the
location for the cities Chicago, New York, Toronto,
and Vancouver.
• The measured display in dollars sold (in
thousands).
• The 3-D data of the table are represented as a
series of 2-D tables.
• We would like to view our sales data with an
additional 4-D, such as a supplier.
• In data warehousing, the data cubes are n-
dimensional.
• The cuboid which holds the lowest level of
summarization is called a base cuboid.
• For example, the 4-D cuboid in the figure is
the base cuboid for the given time, item,
location, and supplier dimensions.
• 4-D data cube representation of sales data, according to
the dimensions time, item, location, and supplier. The
measure displayed is dollars sold (in thousands).
• The topmost 0-D cuboid, which holds the highest level of
summarization, is known as the apex cuboid.
• In this example, this is the total sales, or dollars sold,
summarized over all four dimensions.
• The lattice of cuboid forms a data cube.
• The figure shows the lattice of cuboids creating 4-D data
cubes for the dimension time, item, location, and
supplier.
• Each cuboid represents a different degree of
summarization.
OLAP
• Online Analytical Processing Server (OLAP) is
based on the multidimensional data model.
• It allows managers, and analysts to get an insight
of the information through fast, consistent, and
interactive access to information.
• OLAP applications are used by a variety of the
functions of an organization.
– Finance and accounting
– Sales and Marketing
– Production
Working of OLAP
• OLAP has a very simple concept.
• It pre-calculates most of the queries that are
typically very hard to execute over tabular
databases, namely aggregation, joining, and
grouping.
• These queries are calculated during a process that is
usually called 'building' or 'processing' of the OLAP
cube.
• This process happens overnight, and by the time end
users get to work - data will have been updated.
OLAP Guidelines
Difference between OLTP and OLAP
• OLTP (On-Line Transaction Processing) is featured by
a large number of short on-line transactions (INSERT,
UPDATE, and DELETE).
• The primary significance of OLTP operations is put on
very rapid query processing, maintaining record
integrity in multi-access environments, and
effectiveness consistent by the number of
transactions per second.
• In the OLTP database, there is an accurate and
current record, and schema used to save transactional
database is the entity model (usually 3NF).
• OLAP (On-line Analytical Processing) is
represented by a relatively low volume of
transactions.
• Queries are very difficult and involve
aggregations.
• Response time is an effectiveness measure.
• OLAP applications are generally used by Data
Mining techniques.
• In OLAP database there is aggregated, historical
information, stored in multi-dimensional schemas
(generally star schema).
OLAP Operations in the Multidimensional
Data Model
• In the multidimensional model, the records are
organized into various dimensions, and each dimension
includes multiple levels of abstraction described by
concept hierarchies.
• This organization support users with the flexibility to
view data from various perspectives.
• A number of OLAP data cube operation exist to
demonstrate these different views, allowing interactive
queries and search of the record at hand.
• Hence, OLAP supports a user-friendly environment for
interactive data analysis.
Basic analytical operations of OLAP
Four types of analytical OLAP operations are:
 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-Up
• The roll-up operation (also known as drill-up
or aggregation operation) performs
aggregation on a data cube, by climbing down
concept hierarchies, i.e., dimension reduction.
• Roll-up is like zooming-out on the data cubes.
Example
• Consider the following cubes illustrating
temperature of certain days recorded weekly:
Temp 64 65 68 69 70 71 72 75 80 81 83 85
Week 1 0 1 0 1 0 0 0 0 0 1 0
1
Week 0 0 0 1 0 0 1 2 0 1 0 0
2
• Consider that we want to set up levels (hot
(80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
• To do this, we have to group column and add
up the value according to the concept
hierarchies. This operation is known as a roll-
up.
• By doing this, we contain the following cube:
Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the


information by levels of
temperature.
Drill-Down

• The drill-down operation (also called roll-


down) is the reverse operation of roll-up.
• Drill-down is like zooming-in on the data cube.
It navigates from less detailed record to more
detailed data.
• Drill-down can be performed by
either stepping down a concept hierarchy for
a dimension or adding additional dimensions.
• Figure shows a drill-down operation performed on the
dimension time by stepping down a concept hierarchy
which is defined as day, month, quarter, and year.
• Drill-down appears by descending the time hierarchy
from the level of the quarter to a more detailed level of
the month.
• Because a drill-down adds more details to the given
data, it can also be performed by adding a new
dimension to a cube.
• For example, a drill-down on the central cubes of the
figure can occur by introducing an additional
dimension, such as a customer group.
Example
• Drill-down adds more details to the given data
Temperature cool mild hot

Day 1 0 0 0
Day 2 0 0 0
Day 3 0 0 1
Day 4 0 1 0
Day 5 1 0 0
Day 6 0 0 0
Day 7 1 0 0
Day 8 0 0 0
Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
Slice

• A slice is a subset of the cubes corresponding


to a single value for one or more members of
the dimension.
• For example, a slice operation is executed
when the customer wants a selection on one
dimension of a three-dimensional cube
resulting in a two-dimensional site.
• So, the Slice operations perform a selection
on one dimension of the given cube, thus
resulting in a subcube.
• For example, if we make the selection,
temperature=cool we will obtain the following
cube: Temperature cool
Day 1 0
Day 2 0
Day 3 0
Day 4 0
Day 5 1
Day 6 1
Day 7 1
Day 8 1
Day 9 1
Day 11 0
Day 12 0
Day 13 0
Day 14 0
Here Slice is functioning for the dimensions "time" using
the criterion time = "Q1".
It will form a new sub-cubes by selecting one or more
dimensions.
Dice

• The dice operation describes a subcube by


operating a selection on two or more
dimension.
• For example, Implement the selection (time =
day 3 OR time = day 4) AND (temperature =
cool OR temperature = hot) to the original
cubes we get the following subcube (still two-
dimensional)
Temperature cool hot

Day 3 0 1

Day 4 0 0

The dice operation on the cubes based on the following


selection criteria involves three dimensions.
•(location = "Toronto" or "Vancouver")
•(time = "Q1" or "Q2")
•(item =" Mobile" or "Modem")
Pivot

• The pivot operation is also called a rotation.


• Pivot is a visualization operations which
rotates the data axes in view to provide an
alternative presentation of the data.
• It may contain swapping the rows and
columns or moving one of the row-dimensions
into the column dimensions.
Type of OLAP Explanation:
• Relational OLAP(ROLAP):
ROLAP is an extended RDBMS along with multidimensional
data mapping to perform the standard relational operation.

• Multidimensional OLAP (MOLAP)


MOLAP Implements operation in multidimensional data.

• Hybrid Online Analytical Processing (HOLAP)


In HOLAP approach the aggregated totals are stored in a
multidimensional database while the detailed data is stored in
the relational database.
This offers both data efficiency of the ROLAP model and
the performance of the MOLAP model.
• Desktop OLAP (DOLAP)
In Desktop OLAP, a user downloads a part of the data from the
database locally, or on their desktop and analyze it.
DOLAP is relatively cheaper to deploy as it offers very few
functionalities compares to other OLAP systems.

• Web OLAP (WOLAP)


Web OLAP which is OLAP system accessible via the web
browser. WOLAP is a three-tiered architecture. It consists of three
components: client, middleware, and a database server.
• Mobile OLAP:
Mobile OLAP helps users to access and analyze OLAP data using
their mobile device.
• Spatial OLAP :
SOLAP is created to facilitate management of both spatial and
non-spatial data in a Geographic Information system (GIS).
ROLAP

• ROLAP works with data that exist in a relational database.


• Facts and dimension tables are stored as relational tables.
• It also allows multidimensional analysis of data and is the
fastest growing OLAP.
Advantages of ROLAP model:
• High data efficiency. It offers high data efficiency because
query performance and access language are optimized
particularly for the multidimensional data analysis.
• Scalability. This type of OLAP system offers scalability for
managing large volumes of data, and even when the data is
steadily increasing.
Drawbacks of ROLAP model:
• Demand for higher resources: ROLAP needs
high utilization of manpower, software, and
hardware resources.
• Aggregately data limitations. ROLAP tools use
SQL for all calculation of aggregate data.
However, there are no set limits to the for
handling computations.
• Slow query performance. Query performance
in this model is slow when compared with
MOLAP
MOLAP
• MOLAP uses array-based multidimensional storage
engines to display multidimensional views of data.
Basically, they use an OLAP cube.
Hybrid OLAP
• Hybrid OLAP is a mixture of both ROLAP and MOLAP.
It offers fast computation of MOLAP and higher
scalability of ROLAP. HOLAP uses two databases.
• Aggregated or computed data is stored in a
multidimensional OLAP cube
• Detailed information is stored in a relational
database.
Benefits of Hybrid OLAP:
• This kind of OLAP helps to economize the disk space, and it also
remains compact which helps to avoid issues related to access
speed and convenience.
• Hybrid HOLAP’s uses cube technology which allows faster
performance for all types of data.
• ROLAP are instantly updated and HOLAP users have access to this
real-time instantly updated data. MOLAP brings cleaning and
conversion of data thereby improving data relevance. This brings
best of both worlds.
Drawbacks of Hybrid OLAP:
• Greater complexity level: The major drawback in HOLAP systems is
that it supports both ROLAP and MOLAP tools and applications.
Thus, it is very complicated.
• Potential overlaps: There are higher chances of overlapping
especially into their functionalities.
Advantages of OLAP
• OLAP is a platform for all type of business includes planning,
budgeting, reporting, and analysis.
• Information and calculations are consistent in an OLAP cube. This
is a crucial benefit.
• Quickly create and analyze “What if” scenarios.
• Easily search OLAP database for broad or specific terms.
• OLAP provides the building blocks for business modeling tools,
Data mining tools, performance reporting tools.
• Allows users to do slice and dice cube data all by various
dimensions, measures, and filters.
• It is good for analyzing time series.
• Finding some clusters and outliers is easy with OLAP.
• It is a powerful visualization online analytical process system
which provides faster response times.
Disadvantages of OLAP
• OLAP requires organizing data into a star or
snowflake schema. These schemas are
complicated to implement and administer.
• You cannot have large number of dimensions in
a single OLAP cube.
• Transactional data cannot be accessed with
OLAP system.
• Any modification in an OLAP cube needs a full
update of the cube. This is a time-consuming
process

You might also like