0% found this document useful (0 votes)
7 views71 pages

$RRWYO9T

Data warehousing is a process that involves collecting and managing data from various sources in a structured manner to support decision-making. It includes features such as being subject-oriented, integrated, time-variant, and non-volatile, allowing for efficient reporting and analysis. The document also distinguishes between Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), highlighting their different purposes and characteristics.

Uploaded by

popww654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views71 pages

$RRWYO9T

Data warehousing is a process that involves collecting and managing data from various sources in a structured manner to support decision-making. It includes features such as being subject-oriented, integrated, time-variant, and non-volatile, allowing for efficient reporting and analysis. The document also distinguishes between Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP), highlighting their different purposes and characteristics.

Uploaded by

popww654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

DATA WAREHOUSING

Introduction:
 Our capabilities of both generating and collecting data have been increasing rapidly in the
last several decades.
 Contributing factors include the widespread use of bar codes for most commercial
products, the computerization of many business, scientific and government transactions,
and advances in data collection tools ranging from scanned text and image platforms to
satellite remote sensing systems.
 Popular use of World Wide Web as a global information system has flooded us with
tremendous amount of data and information.
 This explosive growth in stored data has generated an urgent need for new techniques
and automated tools that can intelligently assist us in transforming the vast amounts of data
into useful information and knowledge.
 Management of data is one of the important objective of computer science.

 Data for efficient management requires to be stored in better architecture.

 Data warehousing helps in this respect which stores data in multiple dimensions.
• Definition:
1.A Data Warehouse is a repository of information collected from multiple
sources, stored under a unified schema and which usually resides at a
single site.

2.A Data Warehouse is a repository of subjectively selected and adapted


operational data which can answer any ad hoc, complex, statistical or
analytical queries.

3. A Data Warehouse is a subject-oriented, integrated, time- variant and


non- volatile collection of data in support of management’s decision
making process.
• Data Warehouse refers to a database that is maintained separately
from an organization’s operational databases.
• Data Warehouse systems allow for the integration of a variety of
application systems.
• They support information processing by providing a solid platform
of consolidated historical data for analysis.
• Data Warehouse is a repository of an organization’s electronically
stored data.
• Data Warehouses are designed to facilitate reporting & analysis.
• Features:
1. Subject Oriented:
 Data is arranged and optimized to provide answer to questions from
diverse functional areas.
 DW is organized around major subjects like customer, supplier, product and
sales.
 It focuses on the modeling and analysis of data for decision makers and not
on day to day operations and transaction processing of an organization.
 DW typically provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision support process.
 For example, to learn more about your company's sales data, you can build
a warehouse that concentrates on sales. Using this warehouse, you can
answer questions like "Who was our best customer for this item last year?"
2. Integrated:
 DW is constructed by integrating multiple, heterogeneous data
sources such as relational databases, flat files, on-line
transaction Records.
 They must resolve problems such as naming conflicts and
inconsistencies among units of measure.
 Data cleaning and data integration techniques are applied to
ensure consistency in naming conventions, encoding structures,
attribute measures, etc. among different data sources.

E.g., Hotel price currency when data is moved to the warehouse,


it is converted.
3. Time Variant:
 The time horizon for the data warehouse is significantly longer
than that of operational systems.
 Operational database: current value data.
 Data warehouse data: provide information from a

historical perspective (e.g., past 5-10 years)


 Every important element in the data warehouse contains time
either explicitly or implicitly.
4. Nonvolatile:
 Nonvolatile means that, once entered into the warehouse, data should not
change.
 This is logical because the purpose of a warehouse is to enable you to
analyze what has occurred.
 DW is a physically separate store of data transformed from the operational
environment.
 As operational update of data does not occur in the data

warehouse environment it does not require transaction processing, recovery,


and concurrency control mechanisms.
 It requires only two operations in data accessing:

• Initial loading of data

• Access of data
5. Accessible:
 The primary purpose of data warehouse is to provide readily accessible
information to end users.

6. Process Oriented:
 It is important to view data warehousing as a process for delivery of
information.
 The maintenance of DW is ongoing and iterative in nature.
Characteristics:
• Smaller number of users.
• Instant response is less important (only for interactively composing reports.
• Read-only access by users.
• Most data access will be targeted at a small partition of the data: the last
month or quarter.
• Database access less frequent but executing large and complicated queries
that access many rows per table.
• Inconsistent, primarily long- running and complex read-only transactions
instead of high constant transaction rate.
• Load from operational data store will only insert new records, existing ones
do not get changed (updated).
• Bulk load from operational data store, no single-record inserts (at most once
daily).
• Database design partly de-normalized and redundant for better performance,
using a star or snowflake schema. Database design is data-driven, not
workflow-driven.
• Large storage capacity for historical data .
• May also contain aggregate data.
Benefits of data warehousing
Some of the benefits that a data warehouse provides are as follows:
• A data warehouse provides a common data model for all data of interest
regardless of the data's source.
• DW makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales invoices,
order receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse users
so that, even if the source system data is purged over time, the information in
the warehouse can be stored safely for extended periods of time.
• Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
• Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management
(CRM) systems.
• Data warehouses facilitate decision support system applications such as trend
reports (e.g., the items with the most sales in a particular area within the last
two years), exception reports, and reports that show actual performance versus
goals.
Data Warehousing:
• Data warehousing is a process of constructing and using
data warehouses.
• The classic definition of the data warehouse focuses on
data storage.
• However, the means to retrieve and analyze data, to
extract, transform and load data, and to manage the
data dictionary are also considered essential
components of a data warehousing system.
• Many references to data warehousing use this broader
context.
• Thus, an expanded definition for data warehousing
includes business intelligence tools (, tools to extract,
transform, and load data into the repository, and tools to
manage and retrieve metadata.
Extract, Transform, and Load (ETL) is a process in data
warehousing that involves:
• extracting data from outside sources,
• transforming it to fit business needs
• loading it into the end target, i.e. the data warehouse.
1) Extract:
– The first part of an ETL process is to extract the data from the source
systems.
– Most data warehousing projects consolidate data from different source
systems.
– Each separate system may also use a different data organization
format.
– Common data source formats are relational databases and flat files.
– Extraction converts the data into a format for transformation
processing.
• An intrinsic part of the extraction is the parsing of extracted data,
resulting in a check if the data meets an expected pattern or
structure. If not, the data may be rejected entirely.
2) Transform:
• The transform stage applies to a series of rules or functions to the extracted
data.
• Some data sources will require very little or even no manipulation of data.
• In other cases, one or more of the following transformations types to meet the
business and technical needs of the end target may be required:
– Selecting only certain columns to load (or selecting null columns not to
load).
– Translating coded values (e.g., if the source system stores 1 for male and 2
for female, but the warehouse stores M for male and F for female) .
– Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M)
– Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
– Filtering
– Sorting
– Joining together data from multiple sources.
– Aggregation.
– Transposing or pivoting (turning multiple columns into multiple rows or vice
versa)
– Splitting a column into multiple columns (e.g., putting a comma-separated
list specified as a string in one column as individual values in different
columns)
3) Load:
• The load phase loads the data into the end target, usually
being the data warehouse.
• Depending on the requirements of the organization, this
process ranges widely. Some data warehouses might
weekly overwrite existing information with cumulative,
updated data, while other DW (or even other parts of the
same DW) might add new data in a historized form, e.g.
hourly.
• As the load phase interacts with a database, the
constraints defined in the database schema as well as in
triggers activated upon data load apply (e.g. uniqueness,
referential integrity, mandatory fields), which also
contribute to the overall data quality performance of the
ETL process.
OLAP & OLTP
OLTP: Online transaction processing.
 OLTP refers to a class of systems that facilitate and manage
transaction-oriented applications, typically for data entry and
retrieval transaction processing.
OLTP is used to refer to processing in which the system responds
immediately to user requests.
The major task of OLTP is to perform online transaction and query
processing.
They cover most of the day to day operations of an organization
such as purchasing, inventory, manufacturing, banking, payroll,
registration and accounting.
An automatic teller machine (ATM) for a bank is an example of a
commercial transaction processing application.
• Benefits
• Online Transaction Processing has two key benefits:
simplicity and efficiency.
• Reduced paper trails and the faster, more accurate
forecasts for revenues and expenses are both examples
of how OLTP makes things simpler for businesses.
• It also provides a concrete foundation for a stable
organization because of the timely updating.
• OLTP is proven efficient because it vastly broadens the
consumer base for an organization, the individual
processes are faster.
• Disadvantages
• It is a great tool for any organization, but in using OLTP,
there are a few things to be wary of: the security issues
and economic costs.
• One of the benefits of OLTP is also an attribute to a
potential problem. The worldwide availability that this
system provides to companies makes their databases that
much more susceptible to intruders and hackers.
• Another economic cost is the potential for server failures.
This can cause delays or even wipe out an immeasurable
amount of data.
• OLAP: Online Analytical Processing.
• Online Analytical Processing, or OLAP, is an approach to quickly
provide answers to analytical queries that are multi-dimensional
in nature.
• OLAP organizes and presents data in various formats in order to
accommodate the diverse needs of the different users.
• It serves users or knowledge workers in the role of data analysis
and decision making.
• The typical applications of OLAP are in business reporting for
sales, marketing, management reporting, business process
management (BPM), budgeting and forecasting, financial
reporting and similar areas.
• Databases configured for OLAP employ a multidimensional data
model, allowing for complex analytical and ad-hoc queries with a
rapid execution time.
The major distinguishing features between OLTP & OLAP are:
• Users & System Orientation:
OLTP: is customer- oriented and is used for transaction
processing.
OLAP: is market oriented and is used for data
analysis .

• Data contents:
OLTP: manages current data that are too detailed.
OLAP: manages large amounts of historical data, provides
facility for summarization & aggregation.
• Database Design:
OLTP: adopts Entity Relationship data model.
OLAP: adopts either a star or snowflake model.

• View:
OLTP: focuses mainly on the current data within an enterprise or
department.
OLAP: focuses on historical data.

• Access Patterns:
OLTP: consists of short, atomic transactions.
Requires concurrency and recovery
mechanisms.
OLAP: are mostly read only operations.
Feature OLTP OLAP
Characteristics Operational processing Information processing
Orientation transaction analysis
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
DB design ER based Star/snowflake
data current, up-to-date historical
View Detailed summarized
Focus Data in Information out
usage repetitive ad-hoc
access read/write lots of scans
unit of work short, simple transaction complex query
records accessed tens millions
users thousands hundreds
DB size 100MB-GB 100GB-TB
Multidimensional Data Model:
• A data model is a way to describe data and to issue queries
against it.
• DW & OLAP tools are based on a multi-dimensional data
model.
• This model views data in the form of a data cube.

Data Cube:
• Data cube allows data to be modeled and viewed in multiple
dimensions.
• It is defined by dimensions and facts.
Dimensions:
• Dimensions are perceptive or entities with respect to which an
organizations wants to keep records.
• For eg. A sales data warehouse in order to keep records of the
store’s sales with respect to dimensions time, item, branch, and
location.
• Each dimension may have table associated with it called a
dimension table, which further describes the dimension.
• For eg. A dimension table for item may contain attributes
item_name, brand, type etc.
Facts:
• Multidimensional data model is organized around a
central theme called as facts.
• Eg. Sales
• This theme is represented by a fact table.
• Facts are numerical measures.
• They are the quantities by which we want to analyze
relationships between dimensions.
• Eg. Facts for a sales DW include dollars_sold, units_sold,
amt_budgeted.
• The fact table contains the names of the facts, or
measures as well as keys to each of the related
dimension tables.
2 D View
location= “vancouver”
item (type)

Time (quarter) home ent. Computer phone security


Q1 605 825 14 400
Q2 680 925 31 512
Q3 812 1023 30 501
Q4 927 1038 38 580
3 D View
location=”Chicago” ” Location= “New York”
item (type) item (type)

Time (quarter) home ent. Computer phone security Time (quarter) home ent. Computer phone security

Q1 605 825 14 400 Q1 1087 968 38 872


Q2 680 925 31 512 Q2 1130 1024 41 925
Q3 812 1023 30 501 Q3 1034 1048 45 1002
Q4 927 1038 38 580 Q4 1142 1091 54 984

location= “Toronto” ” Location= “vancouver”


item (type) item (type)

Time (quarter) home ent. Computer phone security Time (quarter) home ent. Computer phone security

Q1 818 746 43 591 Q1 605 825 14 400


Q2 894 769 52 682 Q2 680 925 31 512
Q3 940 795 58 728 Q3 812 1023 30 501
Q4 978 864 59 784 Q4 927 1038 38 580
A 3D data cube representation of the data according to the
dimensions time, item and location. The measures displayed is
dollars_sold.
n
io

Chicago 854 882 89 623


at

New York
c

1087 968 38 872


Lo

Toronto 818 746 43 591


Vancouver 1087 968
968
605 825 14 400
Q1
682
Q2 925 31 512
Time

680

812 1023 501


Q3 30

Q4 927 1038 580


38

Home Ent Comp Phone Security

items
• In data warehousing literature, a data cube such
as each of the above is referred to as a cuboid.
• The cuboid that holds the lowest level of
summarization is called base cuboid.
• The top most 0-D cuboid, which holds the
highest-level of summarization, is called the apex
cuboid.
Cube: A Lattice of Cuboids

all
0-D(apex) cuboid

time item location supplier


1-D cuboids

time,item time,location item,location location,supplier


2-D cuboids
time,supplier item,supplier

time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier

4-D(base) cuboid
time, item, location, supplier
Schemas for Multidimensional Database
•Database schema consists of a set of entities and the relationships
between them.
•The entity relationship data model is commonly used in the design
of relational database.
•Such data model is appropriate for on-line transaction processing.
•A data warehouse requires a concise, subject oriented schema that
facilitates on-line data analysis.
•The most popular data model for a data warehouse is a
multidimensional model.
•Such a model can exists in the form of a star schema, snowflake
schema or a fact constellation schema.
Star Schema:
• It is the most common modeling paradigm.
• In star schema DW contains:
• A large central table (fact table) containing bulk of data
with no redundancy.
• Facts are numerical measures.
• A set of smaller attendant tables (dimension tables) one
for each dimension.
• Dimensions are perceptive or entities with respect to
which an organizations wants to keep records.
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
• Sales are considered along four dimensions i.e. time, item, branch,
location.
• The schema contains a central fact table for sales that contains
keys to each of the four dimensions, along with measures:
dollars_sold and units_sold, avg_sales.
• In star schema each dimension is represented by only one table,
and each table contains a set of attributes.
• For eg. The location dimension table contains the attribute set
{location_key, street, city, province-or_state, country}.
• This constraint may introduce some redundancy.
• For eg. “vancouver” and “victoria” are both cities in the Canadian
province of British Columbia, Canada).
• Entries for such cities in the location dimension table will create
redundancy among attributes province_or_state and country.
Snowflake schema:
• It is a variant of the star schema model.
• Here dimension tables are normalized thereby further splitting the
data into additional tables.
• The resulting schema graph forms a shape similar to a snowflake.
• The dimension tables of the snowflake model may be kept in
normalized form to reduce redundancies.
• Such table is easy to maintain.
• Snowflake structure can reduce the effectiveness of browsing since
more joins will be needed ti execute a query.
• System performance may be adversely impacted.
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
• The sales fact table is identical to that of the star
schema.
• The main difference between these two schemas is in
the definition of dimension tables.
• The single dimension table for item in the star schema is
normalized in snow flake schema resulting into new item
and supplier tables.
Fact Constellation:
• Sophisticated applications may require multiple fact
tables to share dimension tables.
• This kind of schema can be viewed as a collection of
stars, and hence is called a galaxy schema or a fact
constellation.
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_street
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from a set of
low level concepts to higher level, more general concepts.
• Consider a concept hierarchy for the dimension location.
• City values for location include Vancouver, Toronto, New York, and
Chicago.
• Each city can be mapped to province or state to which it belongs.
• For e.g.. Vancouver can be mapped to British Columbia, and
Chicago to Illinois.
• The state can in turn be mapped to the country to which they
belong such as Canada or USA.
• These mappings from a concept hierarchy for the dimension
location mapping a set of low level concepts (i.e. cities) to higher-
level , more general concepts (i.e. countries)
• Location
all
all

Country Canada USA

Province or
state
New York Illinois
British Columbia Ontario

Ottawa New York Buffalo Chicago


Vancouver Victoria Toronto

City
Country Year

Quarter
Province or
state

City Week
Month

Street Day

A lattice for time


Hierarchy for location
OLAP Operations
- Multidimensional data model allows data to be
stored in multiple dimensions.
- Each dimension contains multiple levels of
abstraction defined by concept hierarchies.
- This facilitates users to view data in different
perceptive.
- There are number of OLAP operations to materialize
this different views.
n
io

Chicago 440
at

New York
c

1560
Lo

Toronto 395
Vancouver 1087 968
968
Q1 605 825 14 400
682
Q2
Time

Q3

Q4

Home Ent Comp Phone Security

items
Roll Up:
• Is also called as Drill up operation.
• Performs aggregation on data cube either by
climbing up a concept hierarchy or for a
dimension by dimension reduction.
Roll Up
On location from cities to
countries
n
io
c at
Lo

USA
2000
Canada
968
Q1 1000
682
Q2
Time

Q3

Q4

Home Ent Comp Phone Security


• The roll up operation shows aggregates the data by
ascending the location hierarchy from the level of city to
the level of country.
• Rather than grouping data by city the resulting cube
groups the data by country.
• When roll up is performed by dimension reduction, one
or more dimensions are removed from the given cube.
• E.g.. Consider sales data cube containing two dimensions
location and time. Roll up may be performed by removing
time dimension.
• This results in an aggregation of the total sales by
location.
Drill Down:
• It is the reverse of roll up.
• It navigates from less detailed data to more detailed
data.
• It can be done either by stepping down concept
hierarchy or introducing additional dimensions.
Drill Down

n
On time from quarters to months

io
at
c
Lo
Jan
1087 968
968 150
Feb 100
Mar 150
Apr
May
June
Time

July
Aug

Sept
Oct
Nov

Dec

Home Ent Comp Phone Security


• Drill down occurs by descending the time hierarchy from
the level of quarter to more detailed level of month.
• The resulting data cube details the total sales per month
rather than summarized by quarters.
• Since drill down adds more detail to the given data it can
also be performed by adding new dimensions to a cube.
• Eg. additional dimension customer_type can be
introduced.
Slice :
• The slice operation performs a selection on one dimension of the
given cube resulting in a sub cube.
• Following figure shows a slice operation where a sales data are
selected from the central cube for the dimension time using
criterion the criterion time = “Q1”.

Slice for time = “Q1”

Chicago
Location

New York

Toronto

Vancouver 605 825 14 440


Dice:
• The dice operation defines a sub cube by performing a selection on two or more
dimensions.
• Following figure shows a dice operation on the central cube based on following
selection criterion that involves 3 dimensions:
• (location= “Toronto” or “Vancouver”) and (time= “Q1” or “Q2”) and (item=
“home entertainment” or “computer”).
n
io
at

Toronto 395
c
Lo

Vancouver
968
Q1 605
Time

Q2

Home Ent Comp

Item (type)
Pivot:
• Pivot is also called as rotate.
• It is a visualization operation that rotates the data axes in
view in order to provide an alternative presentation of
the data.
• Following figure shows a pivot operation where the item
and location axes in a 2- D slice are rotated.
Home Ent 605

Computer 825
Item (Type)

Phone 14

Security 400

Chicago New York Toronto Vancouver

Location (cities)
Other OLAP operations:
• Drill across:- Executes queries involving more than one
fact table.
• Drill through:- Makes use of relational SQL facilities to
drill through the bottom level of a data cube down to its
back-end relational tables.
• Ranking the top N or bottom N items in lists.
• Computing moving averages, growth rates, interests,
internal rates of returns, depreciation, currency
conversions, and statistical functions.
• OLAP offers analytical modeling capabilities including a
calculation engine for deriving ratios, variance.
• It can generate summarizations, aggregations etc.
• OLAP supports functional models for forecasting, trend
analysis, and statistical analysis.
Data warehouse Architecture

Data warehouse provides business analysts:


1. Competitive advantage:
By providing relevant information from which to measure performance
and make critical adjustments to win over competitors.
2. Enhances business productivity:
It is able to quickly and effectively gather information that accurately
describes the organization.
3. Facilitates customer relationship management:
It provides a consistent view of customers and items across all lines of
business, departments, and all markets.
4. Cost reduction:
By tracking trends, patterns, and exceptions over long period of time.
A three tier Data Warehouse Architecture

Monitor
Metadata & OLAP Server
External
Repository Integrator
sources
Analysis
Operational Extract Query
Databases Transform Data Serve Reports
Load Data mining
Refresh
Warehouse

OLAP Server
Data Marts

Data Data OLAP Engine Front-End Tools


Sources Storage
Bottom Tier Middle Tier Top Tier
Bottom Tier:
• The bottom tier is a warehouse database server.
• It is almost always a relational database system.
• Data from operational database and external source are extracted
using application program interface known as gateways.
• A gateway allows client programs to generate SQL code to be
executed at a server.
• E.g. of gateways ODBC (Open Database Connection), OLE-DB (Open
Linking and Embedding for DB) by MS and JDBC (Java Database
Connection).
Middle Tier:
• Middle Tier is an OLAP server
• It is typically implemented using either:

1. ROLAP: an extended relational DBMS that maps operations on


MD data to standard relational operations.

2. MOLAP: a special purpose server that directly implements


multidimensional data and operations.

Top Tier:
• The top tier is a client which contains query and reporting tools,
analysis tools, and data mining tools as trend analysis,
prediction, etc.
Data Warehouse Models:
• There are three data warehouse models:

1. Enterprise warehouse:
• It collects all of the information about subjects spanning the entire
organization.
• It provides corporate wide data integration form one or more
operational systems or external sources.
• It is cross functional in scope.
• It contains detailed data as well as summarized data.
• It requires extensive business modeling and may take years to
design and build.
2. Data Marts:
• It contains a subset of corporate wide data that is of value to
specific group of users.
• The scope is limited to specific selected subjects.
• E.g. Marketing data mart may confine its subjects to customer,
item, sales etc.
• The data in data mart is summarized.
• Depending on the source of data, data marts can be categorized
as:
A. Independent:
• These are sourced from data captured from one or more
operational systems or external sources, or from data generated
locally within a particular department.
B. Dependent:
• These are sourced directly from enterprise data warehouse.
OLAP SERVER

• Online Analytical Processing, a category of software tools that provides analysis of


data stored in a data warehouse.
• OLAP tools enable users to analyze different dimensions of multidimensional data.

• For example, it provides time series and trend analysis views.


• The chief component of OLAP is the OLAP server, which sits between a client and a
database management systems (DBMS).
• The OLAP server understands how data is organized in the database and has
special functions for analyzing the data.
• An OLAP server is a high-capacity, multi-user data manipulation engine specifically
designed to support and operate on multi-dimensional data structures.
• A multi-dimensional structure is arranged so that every data item
is located and accessed based on the intersection of the dimension
members which define that item.
• OLAP server present business users with multidimensional data
from data warehouse or data marts without concerns regarding
how or where the data are stored.
• The design of the server and the structure of the data are
optimized for rapid ad-hoc information retrieval in any orientation,
as well as for fast, flexible calculation and transformation of raw
data based on formulaic relationships.
• The OLAP Server may either physically stage the processed multi-
dimensional information to deliver consistent and rapid response
times to end users, or it may populate its data structures in real-
time from relational or other databases, or offer a choice of both.
Relational OLAP (ROLAP):
• Are intermediate servers that stands in between a
relational back end server and client front end tools.
• It uses relational DBMS to store and manage
warehouse data and OLAP middle ware to support
missing pieces.
• This methodology relies on manipulating the data
stored in the relational database to give the
appearance of traditional OLAP's slicing and dicing
functionality.
• In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL
statement.
ROLAP Server
• Relational OLAP Server sale prodId
p1
date
1
sum
62
p2 1 19
p1 2 48

tools

ROLAP
utilities
server

relational
DBMS
Advantages:
1. Can handle large amounts of data:
The data size limitation of ROLAP technology is the limitation
on data size of the underlying relational database. In other words,
ROLAP itself places no limitation on data amount.

2. Can leverage functionalities inherent in the relational database:


Often, relational database already comes with a host of
functionalities. ROLAP technologies, since they sit on top of the
relational database, can therefore leverage these functionalities.
Disadvantages:
1. Performance can be slow:

Because each ROLAP report is essentially a SQL query (or


multiple SQL queries) in the relational database, the query time can
be long if the underlying data size is large.

2. Limited by SQL functionalities:

Because ROLAP technology mainly relies on generating SQL


statements to query the relational database, and SQL statements
do not fit all needs (for example, it is difficult to perform complex
calculations using SQL), ROLAP technologies are therefore
traditionally limited by what SQL can do.
Multidimensional OLAP (MOLAP):
• This is the more traditional way of OLAP analysis.
• These servers support multidimensional views of data through
array based multidimensional storage engines.
• They map multidimensional views directly to data cube array
structure.
MOLAP Server
• Multi-Dimensional OLAP Server
Sales

ty
B

Ci
A
milk

Product
M.D. tools soda
eggs
soap

1 2 3 4
Date

utilities
multi-
dimensional
server
Advantages:
1. Excellent performance:
MOLAP cubes are built for fast data retrieval, and is optimal
for slicing and dicing operations.

2. Can perform complex calculations:


All calculations have been pre-generated when the cube is
created. Hence, complex calculations are not only doable, but
they return quickly.
Disadvantages:
1. Limited in the amount of data it can handle:
Because all calculations are performed when the cube is
built, it is not possible to include a large amount of data in the
cube itself. But in this case, only summary-level information will
be included in the cube itself.
2. Requires additional investment:
Cube technology are often proprietary and do not already
exist in the organization. Therefore, to adopt MOLAP
technology, chances are additional investments in human and
capital resources are needed.
Hybrid OLAP (HOLAP):
• Combines ROLAP and MOLAP technology.
• Benefits from greater scalability of ROLAP and the faster
computation of MOLAP.
• E.g. A HOLAP server may allow large volumes of detail data to be
stored in a relational database, while aggregations are kept in a
separate MOLAP store.

You might also like