Q. What is data warehousing?
(4m)
The data warehouse is a collection of data that is subject-oriented, integrated, time-variant,
and non-volatile, which can be used for strategic decisions
Q. differentiate between OLAP and OLTP Operational database (online transaction processing
[OLTP]) – always insertion updation deletion operations are going on
1. Operation warehouse (Online analytical processing [OLAP])
OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision dupport
DB design Application oriented Subject oriented
Data Current, up to date, detailed, Historical, summarized,
flat relational isolated multidimensional, integrated,
consolidated
Usage Repetitive Ad-hoc
Access Read/ write Lots of scans
Index/hash
Unit of work Short, simple Complex query
#records accessed Tens Millions
#users Thousands hundreds
DB size 100 MB - GB 100 GB - TB
metric Transaction throuhput Query throughput, response
Data extraction
o Get data from multiple, heterogenous, and external sources
Data cleaning
o Detect errors in the data and rectify them when possible
Data transformation
o Convert data from legacy to host format to warehouse format
Load
o Sort, summarize, consolidate, compute views, check integrity, and build indices
and partition
Refresh
o Propagate the updates from the data source to the warehouse
Normalization
Min-Max normalization
V= value to change
Z score/zero mean normalization
OLAP
ROLAP MOLAP HOLAP
ROLAP: Relational Online Analytical Processing
MOLAP: Multidimensional Online Analytical Processin
HOLAP: Hybrid Online Analytical Processing
S.NO ROLAP MOLAP
ROLAP stands for Relational While MOLAP stands
1. Online Analytical for Multidimensional Online
Processing. Analytical Processing.
ROLAP is used for large data While it is used for limited data
2.
volumes. volumes.
3. The access of ROLAP is slow. While the access of MOLAP is fast.
In ROLAP, Data is stored in While in MOLAP, Data is stored in
4.
relation tables. multidimensional array.
In ROLAP, Data is fetched While in MOLAP, Data is fetched
5.
from data-warehouse. from MDDBs database.
In ROLAP, Complicated sql While in MOLAP, Sparse matrix is
6.
queries are used. used.
In ROLAP, Static While in MOLAP, Dynamic
7. multidimensional view of data multidimensional view of data is
is created. created.
Basis ROLAP MOLAP
Storage location for summary Relational database is used as Multidimensional database is
aggregation storage location for summary used as storage location for
aggregation summary aggregation
Processing time Processing time of ROLAP is Processing time of MOLAP is
very slow fast
Storage space and Large storage space Medium storage space
requirement requirement in ROLAP as requirement in ROLAP as
compared to MOLAP and compared to MOLAP and
HOLAP HOLAP
Storage location for detail Relational database is used as Multidemensional database
data storage location for detail is used as storage location for
data detail data
Latency Low latency in ROLAP as High latency in MOLAP as
compared to MOLAP and compared to ROLAP and
HOLAP HOLAP
Query response time Slow query response time in Fast query response time in
ROLAP as compared to MOLAP as compared to
MOLAP and HOLAP ROLAP and HOLAP
Multidimensional data cuboid model
Multidimensional model –
1. Star schema:
a fact table in the middle connected to a set of dimension tables
2. Snowflake schema
3. Fact constellation
OLAP operations
Roll-up operator
Performs aggregation on a data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction
Drill down operator
It can be realized by either stepping down a concept hierarchy, for a dimension or
introducing additional dimensions
Transpose = Pivot (rotate)
Visualization operation that rotates the data access in view In order to provide an
alternative presentation of the data
Slice operation
Performs a selection on 1 dimension of given cube, resulting in a sub-cube
Dice operation
Defines a sub-cube by performing a selection on 1 or more than one dimensions
Data warehouse information flow
Inflow – processes
Cleaning includes removing inconsistencies, adding missing fields, cross-checking for data
integrity
Transforming includes adding date/time stamp fields, summarizing detailed data, deriving new
fields to the calculated data
upflow
process which adds value to the data in warehouse through
Summarizing
o Choose, project, join, group data
o Summarize – identify trends, clustering, sampling
Packaging
o Converting data to summarized info – spreadsheet, doc, chart, graphs, db,
animation etc
Distribution in groups to increase availability and accessibility
Bitmap Indices
Used in situation where the types of values small. Ex. Gender – M,F
Special type of index for
Used in places which has less unique values