0% found this document useful (0 votes)
96 views

Overview of Data Warehousing and OLAP

This document provides an overview of data warehousing and OLAP. It discusses the key characteristics of a data warehouse, including being subject oriented, integrated, and non-volatile. It also describes the general architecture of a data warehouse, including extracting data from various sources, cleaning and transforming the data, and storing it in a multidimensional structure for analysis. Finally, it compares different types of data warehouse models like virtual warehouses, data marts, and enterprise warehouses.

Uploaded by

Shiva Parvanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Overview of Data Warehousing and OLAP

This document provides an overview of data warehousing and OLAP. It discusses the key characteristics of a data warehouse, including being subject oriented, integrated, and non-volatile. It also describes the general architecture of a data warehouse, including extracting data from various sources, cleaning and transforming the data, and storing it in a multidimensional structure for analysis. Finally, it compares different types of data warehouse models like virtual warehouses, data marts, and enterprise warehouses.

Uploaded by

Shiva Parvanda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Overview of Data Warehousing and OLAP

Payal Wankhede, Aswini Chinnenahalli Siddareddy, Divyasri Gundala, Varasri Boddupalli, Juhi
Parvanda

Oakland University, Database Systems I, Winter 2017

Abstract-Data warehouse and OLAP are


important elements in decision support. Data Integrated-Data warehouse is integrated because it
warehouse is a system for reporting and data provides data that is gathered into the data
analysis purpose. This paper focus on the warehouse from a variety of sources and merged into
overview of data warehousing and OLAP. It also a coherent whole to process further operation.
discusses in detail about data warehousing
architecture, characteristics, data modelling and Non-volatile-Data is stable in a data warehouse.
functionalities. It also compares Views with Data More data is added but data is never removed. This
warehouses and further list down difficulties of enables management to gain a consistent picture of
implementing data warehouses. This paper also the business.
covers ideas from different research papers to
Time Variant-Historical data is kept in a data
enrich its content.
warehouse. focus on change over time is what is
meant by the term time variant.
Index Terms-Data Warehouse, Data modelling,
Views, OLAP & OLTP systems. Online Analytical Processing Server (OLAP)
● Online Analytical Processing Server
I. INTRODUCTION (OLAP) is based on the multidimensional
DATA Warehouse is a single, complete and data model.
consistent store of data obtained from a variety of ● It allows managers, and analysts to get an
different sources made available to end users in a that insight of the information through fast,
they can understand and use it in a business context. consistent, and interactive access to
● A data warehouse is a database, which is information.
kept separate from the organization's
operational database so that if any changes
made in operational database will not affect II. CHARACTERISTICS OF DATA
data warehouse and vise versa. WAREHOUSES
● Operational database undergoes frequent
updation but in data warehouse there is no ● Multidimensional conceptual view.
frequent updating done . ● Unlimited dimensions and aggregation levels.
● It possesses consolidated historical data, ● Unrestricted cross-dimensional operations.
which helps the organization to analyze its ● Dynamic sparse matrix handling.
business. ● Client/server architecture.
● A data warehouse helps executives to ● Multiuser support.
organize, understand, and use their data to ● Accessibility.
take strategic decisions. ● Transparency.
● Data warehouse systems help in the ● Intuitive data manipulation.
integration of diversity of application ● Inductive and deductive analysis.
systems. ● Flexible distributed reporting Because they
Characteristics of a data warehouse encompass large volumes[1].

Subject Oriented-Data warehouse is subject oriented


because it provides data that gives information about
a particular subject instead of about a company’s
ongoing operations.
General architecture of a data warehouse Data Warehouse Models
From the perspective of data warehouse architecture,
we have the following data warehouse models −

● Virtual Warehouse
● Data mart
● Enterprise Warehouse
Virtual Warehouse
● The view over an operational data warehouse is
known as a virtual warehouse.
● It is easy to build a virtual warehouse.
● Building a virtual warehouse requires excess capacity
on operational database servers.
MartData
Data mart contains a subset of organization-wide
data. This subset of data is valuable to specific
Fig 1: Data Warehouse architecture groups of an organization.
As we can see, The first layer is the Data Source
Points to remember about data marts −
layer, which refers to various data stores in multiple
formats like relational database, Excel file and ● The implementation data mart cycles is measured in
others. These stores can consists of different types of short periods of time, i.e., in weeks rather than
data – Operational data including business data like months or years.
Sales, Customer, Finance, Product and others, web
● Data marts are small in size.
server logs, Internet research data and data relating
to third party like census, survey. ● Data marts are customized by department.
● The source of a data mart is departmentally
The next step is Extract, where the data from data structured data warehouse.
sources is extracted and put into the warehouse ● Data mart are flexible.
staging area. The extracted data is minimally cleaned Enterprise Warehouse
with no major transformations.

Then comes the Staging area, which is divided into ● An enterprise warehouse collects all the information
two stages – data cleaning and data ordering. As the and the subjects spanning an entire organization
name suggests, this layer takes care of data ● It provides us enterprise-wide data integration.
processing methods, i.e. cleaning (removing data ● The data is integrated from operational systems and
redundancy, filtering bad data) and ordering external information providers.
(allowing proper integration) of data. Overall, this ● This information can vary from a few gigabytes to
stage allows application of business intelligent logic hundreds of gigabytes, terabytes or beyond.
to transform transactional data into analytical data. It
is indeed the most time consuming phase in the III. DATA MODELLING FOR DATA
whole DWH architecture and is the chief process WAREHOUSES
between data source and presentation layer of DWH.
“A data model is a graphical view of data created
Finally, we have the Data Presentation layer, which
for analysis and design purposes”. There are three
is the target data warehouse – the place where the
levels of data modelling-Conceptual data model,
successfully cleaned, integrated, transformed and
logical data model and physical data model. In below
ordered data is stored in a multi-dimensional
figures we can see the conceptual, logical, and
environment. Now, the data is available for analysis
physical levels of a single data model.
and query purposes. The information is also available
to end-users in the form of data marts.
A. Conceptual data model
This model helps in identifying the highest-level we always first start with the conceptual model so
relationships among the different entities that we can better understand the entities in our data
and how they relate to each other. Next, we move on
to logical model to understand the detail of data and
then finally we look into physical model to know
how to implement our data model in database.
TABLE I
COMPARISON OF DIFFERENT LEVELS OF DATA MODEL

Fig 2: A conceptual data model

B. Logical data model


This is a fully-attributed data model which describes
data in full detail and is independent of DBMS,
technology, data storage or organizational D. Benefits of using data model in data warehouse
constraints. ● Reduced cost
● Quicker time to market.
● Clearer scope
● Faster performance
● Better documentation
● Fewer application errors

E. Dimensional model
Data Warehouses combine several different data
sources in multidimensional structures in support of
the decision-making process. Generally traditional
database deal with two-dimensional data which is
similar to spreadsheet. But when compare to two-
dimensional model multi-dimensional data storage
Fig 3: A logical data model model is much more efficient in query performance.
Some of the examples of dimensions used in a
C. Physical data model
Corporate data warehouse include fiscal periods,
This is a fully-attributed data model which denotes
product categories, geographic regions.
how the model will be built in the database and is
Below figures shows the example of two-dimensional
dependent on a specific version of a data persistence
and multidimensional model.
technology.

Fig 5: A two-dimensional matrix model

Fig 4: A physical data model


We can see the complexness growth from
conceptual to logical to physical. This is the reason
“Drill-down” refers to the process of grouping data
into increased detail.

Fig 9: Drill-down operation


Fig 6: A three-dimensional data cube model In the above example, after applying drill-down
We can change data cube from one dimensional to operation, products are divided into styles and
another and we can analyse data in multiple views regions into sub-regions, which helps us to look into
using certain operations such as pivot, roll-up, drill- more details.
down and slice and so on. The “Slice” operation
The “Pivot” operation The Slice operation is mainly used to refer to two-
Pivot operation is mainly used to change data cube dimensional view of three or higher dimensional data
from one dimension to another dimension.Pivot is cube.
also called as rotation. The term “ Slice & Dice” implies breaking body of
data into smaller chunks or views to examine it from
multiple angles or viewpoints.
E. Advantage of multi dimensional over relational
database
Relational model
A relational model mainly uses two-dimensional
structure of rows and columns to store data in tables.
Tables can be linked by common key values.
Fig 7: Pivot operation
In the above example or figure we pivoted the data
cube to show regional sales revenue as rows, fiscal
quarter revenue totals as columns and company
products in third dimension. By doing so, it presents
data in terms of region vs. fiscal quarter : product by
product.
The “Roll-up” operation
The two basic hierarchical operations to view data at
multiple combinations are the roll-up and drill-down
operations. Roll-up refers to the process of grouping Fig 10: A relational model
data into larger units. From the above table we can easily identify that the
relationship table structure can just tell us that there
are three fields student, name and result and there are
nine records, nothing more than that.
Multidimensional model
The Concept of data modelling developed by Ralph
Kimball and is comprised of fact and dimension
tables. In a multidimensional database model, the
data is presented to the user in such a way that the
Fig 8: The roll-up operation multi-dimensional array represents higher level
In the above figure the roll-up operation aggregates organization where each individual data value is
data for all products numbered 123, 124,125 and so contained within a cell accessible by multiple
on into 1XX, etc. indexes.In fact table,each tuple is a recorded fact and
The “Drill down” operation dimension table consists of tuples of attributes of the
dimension.A fact table is as collective view of Fig 13: A snowflake schema.
transaction data whereas each dimension table Fact constellation is a set of tables that share same
represents “master data” that those transactions dimension tables. However, fact constellations limit
belonged to. the possible queries for the warehouse.

Fig 14: A fact constellation.

In the above figure we can see that the Product


Fig 11: A multi dimensional model
dimension table being shared by two Fact tables.Data
We can see from above figure that, there is no need warehouse utilizes indexing to support high
to have result as a dimension, because the exam performance access.Bitmap indexes are widely used
results will be there within the cells of the database in data warehousing environments, primarily
structure. intended for data warehousing applications where
Multi-dimensional model helps in removing users query the data rather than update it. Master
duplicates in the relational table where each student Data Management (MDM) refers to the process of
name is repeated three times for each exam. Where as creating and managing data that an organization must
in multi-dimensional view. The student name and the have as a single master copy, called the master data.
exam become dimensions, or in effect indexes into The purpose of MDM is to define standards,
that data. processes, policies and governance issues related to
F. Multidimensional Schemas critical data elements entities of the organization.
The two common multidimensional schemas G. A CMCD framework for data warehouse
include star schema and snowflake schema. Star modelling
schema consists of a fact table with a single table for
each dimension. Snowflake Schema is a variation of There are so many researchers who are working
star schema, in which the dimensional tables from a on data warehousing concepts. Several surveys
star schema are organized into a hierarchy by show that a significant percentage of data
normalizing them.In the following figures we can see warehouse projects fail to meet their business
the examples of star and snowflake schema. needs.The main reason behind that is lack of any
formal relationship between the contextual
requirements analysis and the data warehouse
conceptual modeling phase which means to say that
during requirement analysis phase they will get
only partial view of problem and builds a gap
between It specialists and business analysts. So it is
important to consider business goals during
requirements analysis phase. In the research paper
[2], they provided a CMCD framework(from
Fig 12: A star schema with fact and dimensional tables conceptual model to conceptual design) to
automatically extract data warehouse
multidimensional model from goal modeling
frameworks and business process diagrams which
helps the designer to reduce the risk of inaccuracies
between contextual business requirements analysis
and DW modeling design, and also leading to a
more successful decision support project.
IV. BUILDING A DATA WAREHOUSE

In developing a data warehouse, builders should take


a wide perspective of the foreseen utilization of the
warehouse. There is no real way to suspect every
single conceivable question or investigations amid
the outline stage.However, the design should
specifically support ad hoc querying; that is,
accessing data with any combination of values for the
attributes that would be meaningful in the dimension Fig 15: Backflushing

or fact tables.
Data Loading: Data Loading fetches the prepared
data,applies it to the data warehouse and stores it in
Major Building Blocks of Data warehouse
the database.
A. Extraction Transformation and Loading
Data Extraction Types of Loading
The process of extracting data in distributed
application from business and departmental units Initial Load Populates all the data warehouse tables
across the organization and importing them into data for the first time.
warehouse is called ETL.The initial step of the
procedure includes the extraction of data from Incremental Load Applying ongoing changes as
operational information sources.These data sources necessary in a periodic manner
are normally databases however sometimes
information is put away in flat or XML documents. Refresh Data Completely erases the contents of one
or more tables and reloading with fresh data.
Data Extraction Strategies
● Full Extraction B. Storing data in Data warehouse
● Partial Extraction- with update notification ● Storing the data according to the data model of the
● Partial Extraction-without update notification warehouse
● Creating and maintaining required data structures
Data Transformation ● Creating and maintaining appropriate access
The Transformation procedure requires change and paths,Initially constructing the warehouse is
standardization of information. This procedure can be simple,but update of sheer volume of data in the
computerized with ETL software.This software warehouse generally makes it impossible to reload
supports the application of extracted data functions the warehouse entirely.
and series of rules in the data warehouse.The series ● Providing for time-variant data as new data are added
of rules installed in ETL software ensures that the Data may come from different systems, language
data is in correct format and error free.This process areas and time-zones
of transformation is otherwise called as cleansing. ● Supporting the updating of warehouse
data.Refreshing the data Alternatives include
Data extracted to the server is raw data and cannot selective (partial)refreshing of data and separate
be used as it is and should be cleansed,mapped and warehouse versions (which requires double storage
transformed and the transformation tasks to be capacity for the warehouse).
performed are selection,matching,data cleansing or ● Purging data Data may need to be purged
consolidation.The process of returning cleaned data periodically.
to the source is called Backflushing.
C. Data Warehouse Design Considerations
Usage Projections : In prior to the design of
warehouse,expecting about who will use it and how
they will use it.
The fit of the data model: The data comes from
various operational sources which should represented
in the data model.
Modular component design :
Modular design is a practical necessity to allow the TABLE II
warehouse to evolve with the organization and its
COMPARISON OF TWO APPROACHES
information environment.
Design for manageability and change: A well-built Inmon Kimball
data warehouse should be designed for
maintainability,enabling the warehouse managers to Building a data Time Takes less time
plan,change,manage and provide optimal support to warehouse consuming
users.
Distributed warehouse: Distributed data warehouse maintenance easy Difficult
deals with issues related to distributed database.
Distributed architecture can provide benefits cost High initial Low initial
particularly important to warehouse performance, cost cost
such as improved load balancing, scalability of
performance and higher availability. Time High initial Shorter time
time for initial setup
Federated warehouse is an autonomous data
warehouses,each with its own repository.
Skill Specialist team Generalist
Metadata component: The metadata repository is a
requirement team
key data warehouse component which includes both
technical and business data. Enterprise-wide
Integration Individual
Technical Data:Technical data covers details of
requirements business-areas
acquisition, processing,storage structures, data
descriptions, warehouse operations ,maintenance and
access support.
Business Data:Business data includes the relevant
business rules and organizational details supporting
the warehouse.

Recent Research

Design Considerations required for an Fig 16: Bill-Inmon’s top-down approach


implementation of data warehouse in an
educational institution[D1]

Diagnosis: Different management frameworks that


can be used for this purpose,The Open Group
Architecture Framework (TOGAF)

Information need analysis: Analysis of the


information supply of the organization,objective
information needs and the business models,processes
and strategies of the organization. Fig 17:Ralph-Kimball's Bottom-up approach

Selection of methodology:The design methodology


V. TYPICAL FUNCTIONALITY OF A DATA
should be opted based on the requirement of the
WAREHOUSE
project.
A. Functionality of a Data warehouse
Setting up the technological infrastructure: Data warehouse makes it easy to run complex, data-
Choosing the BI tools, databases, reporting and data intensive and ad hoc queries. Data warehouses must
mining software.Feasibility analysis should be provide far greater and more efficient query support
performed than is demanded of transactional databases. The data
warehouse access component supports enhanced
Data warehouse design:The conceptual, logical and spreadsheet functionality, efficient query processing,
physical design of the DW must be modeled. structured queries, ad hoc queries, data mining, and
materialized views. In particular enhanced
spreadsheet offer preprogrammed functionalities such C. Data warehouse Applications:
as the following [1]:
Roll-up (Drill-up): Roll up uses an aggregate
function to summarize the data along a dimension
hierarchy to obtain measures at a coarser
granularity[12].
Drill-down : Drill-down moves data from a more
general level to a more detailed level in a hierarchy
Pivot (Rotation): Pivot rotates the axes of a cube to
provide an alternative presentation of data
Slice: Slice removes a dimension in a cube.
Dimension will be dropped by fixing a single value in
the level; other dimensions unchanged. So a cube of
n-1 dimensions is obtained from a cube of n
dimensions.[12]
Dice: Dice selects two or more dimensions from a Fig 18: Applications of Data warehouse[7]
given cube and provides a new sub-cube by keeps the
cells of a cube that satisfy a Boolean condition Φ Data Warehouses have deep-rooted applications in
where Φ is a Boolean condition over dimension every industry which uses historical data for
levels, attributes, and measures.[12] prediction, statistical analysis, and decision
Sorting: Sorting returns a cube where the members making[7].
of a dimension have been sorted in either ascending Below are some of the Data warehouse applications
or descending order. across different industries.
Selection: Selection is used to filter data by value or ● Retail Industry: They use data warehouse to track
range the items, to analyze sales and customer buying
Derived (computed) attributes: Attributes are trends.
computed by operations on stored and derived values ● Manufacturing and Distribution Industry: This
industry is one of the most important sources of
Other functionalities that a data warehouse provides income for any state. Here they use data warehouses
includes intersection and union of indexes, SQL to predict market changes, analyze current business
extensions for aggregation, advanced join methods trends, to maintain inventory and ultimately take
and intelligent scanning. better decisions.
● Banking Industry: Most banks also use warehouses
B. Data warehouse usage: to manage their available resources, for market
A data warehouse finds its usage in the following research, performance analysis of each product,
three ways [6]. interchange and exchange rates, and to develop
Information Processing: A data warehouse allows marketing programs.
processing the data stored in it. The data can be ● Sports Industry: Now a day’s sports industry also
processed by means of querying, basic statistical uses a warehouse to analyze the game strategies, to
analysis, reporting using crosstabs, tables, charts, or find the winning player combinations etc.
graphs. ● Government: Government uses data warehouses to
Analytical Processing: A data warehouse supports maintain and analyze tax records, health policy
analytical processing of the information stored in it. records and their respective providers, and also their
The data can be analyzed by means of basic OLAP entire criminal law database is connected to the
operations, including slice-and-dice, drill down, drill state’s data warehouse. Criminal activity is predicted
up, and pivoting. from the patterns and trends, results of the analysis of
Data Mining: Data mining supports knowledge historical data associated with past criminals [7]
discovery by finding hidden patterns and
associations, constructing analytical models, D. Data warehouse Tools
performing classification and prediction. These Few of the most popular data warehouse tools that
mining results can be presented using the are available in the market are Amazon Redshift,
visualization tools. Teradata, IBM Infosphere, Oracle 12c etc. These
tools are used by most of the competitive companies
for providing insights, analytics to users and for
decision making purposes.These tools offers high
scalability, high performance, and optimization in
data warehousing[8].

E. Data warehouse and OLAP


Data warehouse and OLAP are the two terms which
are often used interchangeably. But they both are two
different components of a decision support system.
Below is the brief summary of differences between a
data warehouse and OLAP system.
Data warehouse: MOLAP: This is the classical form of the OLAP
Data warehouse is a database composed of historical system which stores detailed level data and
data of the organization.Data in data warehouse is aggregated data in array based multidimensional
organized in summarized, aggregated, subject format. As it stores the data in compressed format the
oriented, non volatile patterns to support end user storage requirements are low and the query
analysis. performance is high in this system.
OLAP (Online Analytical Processing) ROLAP: In this system, the data is stored in
OLAP is the technology that enables a data relational database format, so the space requirements
warehouse to be used effectively for online analysis, are high and query performance is usually low in
providing rapid responses to iterative complex these systems.
analytical queries.OLAP’s multidimensional data HOLAP: This system stores detailed level data in
model and data aggregation techniques help organize relational database format and aggregated data in
and summarize large amounts of data in data multidimensional format. Space requirements and
warehouse. query performance is medium in this system.

F. Types of OLAP systems G. OLAP Tools


There are different types of OLAP systems and these IBM Cognos, Micro strategy, Palo OLAP server,
are generally distinguished by a letter tagged onto the Apache Kylin, icCube, Mondrian are the most
front of the acronym “OLAP”. Among those types popular OLAP tools that are used by most of the
ROLAP (Relational On-line Analytical Processing), organizations [10]. These OLAP tools enable users to
MOLAP (Multidimensional On-line Analytical analyze different dimensions of multidimensional
Processing) and HOLAP (Hybrid On-line Analytical data and distribute the business specific insights
Processing- which is a combination of ROLAP and
MOLAP) are the big players. Other types represent a H. OLAP Vs. OLTP
TABLE IV
little more than the marketing programs on the part of OLAP Vs. OLTP [11]
the vendors to distinguish themselves.[9]

Fig 19: Types of OLAP systems

Below table shows the comparison of three major VI. DATA WAREHOUSE Vs. VIEWS
OLAP systems i.e. MOLAP, ROLAP, HOLAP
TABLE III
Data warehouse consist of data extracted from a
COMPARISON OF MOLAP, ROLAP, HOLAP SYSTEMS
different data sources. Whereas, Views are just
temporary tables, which consist of a data extracted c. Quality Control:
from a table or different sets of table in a database. ● Both quality and consistency of data as well as data
management are major concerns.
Views and data warehouse are similar in a way, that ● Melding data from heterogeneous and disparate
they both have read-only extracts from the databases. sources is a major challenge given differences in
Thus, data from both cannot be edited or updated. naming, domain definitions, identification numbers,
Many people believe that data warehouses are the and the like. [1]
extensions of views, but in reality, views only
provide a subset of the functions and capabilities of 2. Quality Assurance
data warehouses.[1] ● The end user of data warehousing who is using Big
Data reporting will expect 100% accuracy in data.
However, data warehouses are different from views
● This requires testing to be a higher priority which
in the following ways:
consequently require a lot of resources. [13]
● Data Warehouses exist as persistent storage instead
of being materialized on demand. Whereas, views are
3. Performance
virtual tables and do not hold any place in disk or
● The initial overall design must be carefully thought
memory.
out to provide a stable foundation from which to
● Data Warehouses are not just relational, but rather
start. [13]
multi-dimensional with multiple levels of
aggregation. Whereas, views are relational.
4. User Acceptance
● Data Warehouses can be indexed for optimal
● People are not keen to changing their daily routine
performance. Views cannot be indexed directly.
especially if the new process is not intuitive. [13]
● Data Warehouses provide specific support of
functionality; views cannot.
5. Cost
● Data Warehouses deals with large volumes of
● All the above mentioned factors, ultimately increase
integrated data that is contained generally in more
computational cost.
than one database, whereas views are an extract of a
database.
RECENT RESEARCH:
● Data warehouses bring in data periodically from
multiple sources via a complex ETL process, whereas Literature Review of Issues in Data Warehousing and
views are an extract from a database through a OLTP, OLAP Technology
predefined query. [1]
Issue and Problems discussed:
VII. DIFFICULTIES OF IMPLEMENTING DATA
WAREHOUSE Storing historical data: Data warehouse contains very
old data in its repository. To main such a volume of
Challenges that needs to be considered before data is difficult.
building a data warehouses are as follows: ● Storing transactional data: There are larger number of
transactions per day in any organization. Again, to
1. Operational Challenge manage data of per day transactions is overhead.
a. Construction: ● Mismatch in data type of data: To merge incoming
● Lead time is huge in building a data warehouse data from different data sources leads to data type
● Potentially it takes years to build, implement and mismatch issue.
efficiently maintain a data warehouse. [1] ● Costing Problem: To manage data, security, and
resources to large computational cost.
b. Administration:
● Representation of data to user: Dashboards and
The administration of a data warehouse is reports generated for data analysis from data
proportional to the size and complexity of the warehousing has to be modified to make it more user
warehouse. (non-technical) friendly.
● On updating the source database, warehouse’s ● Data Profiling: Data profiling is all about the pattern,
schema and component.s must be able to handle these format matching of data with stored data and shown
changes. [1] data.
● Real time data feed and access: This issue is 3. Data movement:
regarding mostly with storing the current data which
is not analyzed properly which can create difficult to ● There exist potential security implications while
forecast information. moving the data.
● Data backtracking: For backtrack all data in such a ● When the data is loaded into the data warehouse,
huge large data warehouse it creates issue for any the following questions are raised −
changes in data and backtrack that data in repository. ● Where the data file will be stored?
[14] ● Who has access to that disk space?

VIII. SECURITY REQUIREMENTS IN DATA 4. Documentation: The security requirements need to be


WAREHOUSE properly documented. This will be treated as a part
of justification. This document can contain all the
For effective performance it is important to determine information gathered from −
the security requirement at early design stage. It is
very difficult to implement security once the data ● Data classification
warehouse is gone live. The following possibilities ● User classification
during the design phase should be considered: ● Network requirements
● Whether the new data sources will require new ● Data movement and storage requirements
security and/or audit restrictions to be implemented? ● All auditable actions [15]
● Whether the new users added who have restricted
access to data that is already generally available? IX. FUTURE OF DATA WAREHOUSE
● Enterprises will be needed to build “operational data
It becomes difficult in some scenarios to forecast warehouses” to combine data from multiple sources
future user activities. in real time and go beyond dashboards and reports to
actually use their data in day-to-day operations.
The following activities get affected by security ● The data warehouse will become an integrated
measures − enterprise processing engine, fusing multi-structured
data and analytics, while incorporating multiple
1. User access: We need to first classify the data and
procedural and scripting languages to ensure multiple
then classify the users on the basis of the data they
user communities have immediate access to relevant
can access. It follows the following path:
insight.
a. Data Classification
⬇️ ● Processing data and analytics in the cloud will
become an essential requirement. [16]
b. User Classification
2. Network Requirements: X. SUMMARY
Data can be classified according to its sensitivity and
function. Users can be classified as per the hierarchy Paper first discuss about the introduction of data
of users or their roles. warehousing, which includes the explanation of
a. Is it necessary to encrypt data before transferring it terminology used. Then data warehouse characteristic
to the data warehouse? features and modelling in data warehousing is
The process of encryption and decryption will discussed. It then focuses on building steps of data
increase overheads. It would require more processing warehousing and different phases. Then it shows the
power and processing time. comparison between database views and data
warehousing. Paper is concluded with highlighting
b. Are there restrictions on which network routes the major challenges in implementing data warehousing,
data can take? security and future of data warehousing.
It is necessary to identify restrictions at the design
XI. REFERENCES
phase to reduce overheads in future.
[1] Fundamentals of Database Systems (7th edition)
by Ramez Elmasri, Shamkant B. Navathe, 2016.
[2] H. Chakiri, M. E. Mohajir and N. Assem, [16] The Future of Data Warehousing: 7 Industry
“CMCD: A data warehouse modeling framework Experts Share Their Predictions, Matt Satell, Nov 5,
based on goals and business process models,” in 2014
2017. DOI: 10.1109/AFRCON.2017.8095605.
[3] M. Blaha, “Data Models Have Many Benefits. XII. BIBLIOGRAPHY
Here Are 10 of Them: DATAVERSITY”,
DATAVERSITY,2018.[Online].Available:http://ww [1] Payal Wankhede : Worked on Introduction of
w.dataversity.net/data-models-many-benefits-10/ Data warehouse, Characteristics of Data Warehouse,
[4] J. collins “Comparison of Relational and General Architecture of Data Warehouse and
Multidimensional Database Structures ” different types of Data Warehouse Models.
Alphadevx.com 2018 [online] available
http://www.alphadevx.com/a/36-Comparison-of- [2] Aswini Chinnenahalli Siddareddy: Worked on
Relational-and-Multi-Dimensional-Database- data model features, advantages of multi dimensional
Structures model over relational model, operations in
[5]Data warehouse design for educational dimensional model, multidimensional schemas. In
Data mining Dec 01,2016. OswaldoMoscoso- this paper, conducted study on data modelling
Zea,Andres-Sampedro,Sergio Luján-Mora approaches and introduced CMCD approach for data
https://ieeexplore.ieee.org/document/7760754/ modelling in data warehouse.
[6]“Types of Data warehouse” https:/
/www.tutorialspoint.com/dwh/dwh_quick_guide.htm [3] Divyasri Gundala: Worked on major building
[7]“12 applications of Data warehouse” blocks of extracting,transforming and loading data
http://whatisdbms.com/12-applications-of-data- into data warehouse from various operational
warehouse/ sources,approach on design considerations and
[8] “Top 10 popular Data warehouse tools and testing storing a data warehouse. An additional research was
technologies ”http://www.softwaretestinghelp .com/ studied on data warehouse design considerations for
data-warehouse-tools/ an educational institution on comparing and
[9] K. Dhanasree , C. Shoba Bindu “A Survey on analysing Kimball and Inmon Approach.
OLAP” in 2016 IEEE International Conference
Computational Intelligence and Computing [4] Varasri Boddupalli: Worked on Typical
Research(ICCIC) functionality of a Data warehouse and presented each
[10]“Top 10 Best OLAP tools” functionality with a real time data cube example.
http://www.softwaretestinghelp.com/best-olap-tools/ Additional research was done on topics related to
[11]“OLAP vs OLTP” Data warehousing and OLAP such as Usage,
https://www.tutorialspoint.com/dwh/dwh_olap.htm Applications, Tools of Data warehouse and OLAP,
[12] Helena Galhardas “OLAP Operations” and types of OLAP systems and comparison between
https://fenix.tecnico.ulisboa.pt/downloadFile/282093 them. Also included the differences between OLAP
452036070/14_OLAPOperations.pdf and OLTP systems.
[13] 7 Challenges to Consider when Building a Data
Warehouse, Austin Wentzlaff, July 27, 2014. [5] Juhi Parvanda: Worked on comparison between
[14]‘Literature Review of Issues in Data Views and Data Warehouses and difficulties in
Warehousing OLTP,OLAP Technology’,Astha implementing Data warehouses. Also, reviewed and
Varshney&Pravin Matkewar,Imperial Journal of presented a related research paper on challenges of
Interdisciplinary Research Vol-2, Issue-9, 2016. Data Warehousing. Additionally, presented security
[15]https://www.tutorialspoint.com/dwh/dwh_securit of Data Warehouse and future of Data Warehouse.
y.htm

You might also like