0% found this document useful (0 votes)
5 views

07 Conceptual Design

Uploaded by

muhammadsaldy03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

07 Conceptual Design

Uploaded by

muhammadsaldy03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Warehouse

07. Conceptual Design


Munawar, PhD
Agenda 01 Conceptual Design?

02 Conceptual Design Process

03 Example of Conceptual Design

04 QA ???
Conceptual Design
Conceptual Design

• Conceptual Design
– Transforms data requirements to conceptual model

– Conceptual model describes data entities, relationships, constraints, etc.


on high-level
• Does not contain any implementation details
• Independent of used software and hardware
Conceptual Design
Is intended to derive an implementation-independent and
expressive conceptual schema for a data mart (DM) or DW
It accommodates a high degree of abstraction in representing the
process and architecture of a DW in all facets involved and is
intended to realise independent implementation
It allows having closer ideas about the ways that a user can
perseive an application domain (Saxena and Agarval, 2014).
It enables developers to describe the requirements for DW
development from a user’s perspective
Conceptual Model
• Highest conceptual grouping of ideas
– Data tends to naturally cluster with data from the same or
similar categories relevant to the organization
• The major relationships between subjects have been
defined
– Least amount of detail
Conceptual Model
Conceptu
• Conceptual design al
Design
ER-
– Entity-Relationship (ER) Modeling diagram,
UML, …

• Entities - “things” in the real world


– E.g. Car, Account, Car Account
Product – property of an entity
• Attributes Product
, entity type,
or
relationship Car
type
– E.g. color of a car, balance of anColor
account, price of a product
• Relationships – between entities there can be
relationships, which also can have attributes
– E.g. Person owns Car
Person own
s
Car
Conceptual Model
day room
of
registratio time week semester id
n
number
N N 1 N
Lecture
Studen attend
s instanc
teache Professor
s
t e
1 nam
1
e nam departmen
instantiates e t
title credit
N s
N
enroll
id Lecture
s N N
prereq
.
part of
N N curriculu
Course of m
Study semester
nam
e
Conceptual Model
• Conceptual design in usually done using
the
Unified Modeling Language (UML) Conceptu
al

– Class Diagram, Component Diagram, Design


ER-
diagram,
Object Diagram, Package Diagram… UML, …

– For Data Modeling only Class Diagrams are


used
• Entity type becomes class CLASS
• Relationships become NAME 1 :
attribute
domain
associations …
attribute n :
domain
operation 1

operation
m
Conceptual Design Process
Conceptual Design Process
DW Development
Input Processes Quality drivers Tools Deliverables
Phase

Conceptual Design  to abstract the users’ request to some information structures, which act as the bridge connecting the real world and the
machine world
 Multidimensional  Fact  Identifying the fact of interest  Comprehensive  ME/R Dimensional Schema
Modeling  Preliminary  Identifying the dimensions ness  ER Model Dimensional schema is
workload hierarchies  Currency  DFM designed to store data in a
 Identifying measures  Speed  UML class way that:
 Identifying aggregations diagram  Emphasizes
understandibility
 Enhances query
performance
 Accommodates change
Basic Concept …
• A fact is a collection of data items related to business
transactions or represent business items. A facts consist of
measures and context data.
• A dimension is a collection of data related to one business
dimension. Contextual background for the facts are defined by the
dimensions; parameters to perform OLAP are also defined by
dimensions.
Basic Concept …
• A measure is a numerical attribute of a fact. Performance or behavior of the
business can be represented by a measure relative to the dimensions. An essential
decision in a measure definition is the lowest level of detail (sometimes called the
grain) in order to determine the type of analysis that can be performed.
• Aggregation is pre-calculated summaries of data came from the most granular
fact table. Aggregation is applied in the case when the analysis needs computation
through a number of dimensions and lots of rows of each dimension to calculate
metrics of fact table. Query performance can be improved using aggregate fact
tables without increasing overall storage space.
Modeling for DW
• For D W the models have to offer support for
multidimensional data
• In the relational model the classical goal is to
– Remove redundancy
– Allow efficient retrieval of individual records
• In the case of D W
– Redundancy is necessary to speed up queries
– OLAP queries usually involve multiple records
(range queries) and aggregates
Tools…
• Entity/relationship-based (E/R-based)
• Object-oriented
• Ad hoc models (Sen and Sinha, 2005).
Benefit of ER Extensions
• E/R has been tested for considerable time (years);
• E/R is a commonly used tool by many designers;
• a variety of application domains can be flexibly adapted by
E/R;
• substantial research results have been derived for E/R (Sapia
et al, 1999; Tryfona et al, 1999).
Benefits of OO Model
• the static and dynamic properties of information systems can be
better represented with these models;
• requirements and constraints can be expressed in a powerful
mechanism;
• data modelling is currently dominated by object-oriented
approaches;
• UML, in particular, is a standard and is extendable (Lujan-Mora et
al, 2002).
Multidim Conceptual Model
• Modeling business queries
– Define the purpose of the D W and decide on
the
subject(s) Time

– Identify questions of
interest
• Who bought the Customers
Busine
ss
Employees

products?
(customers and their structure)
Model

• Who sold the product? (sales


organization) Products

• What was sold? (product structure)


• When was it sold? (time structure)
Benefits of Adhoc Models
• notational economy is more effectively achieved;
• specific multidimensional modelling can be appropriately
emphasised, thereby making ad hoc models
• the most intuitive representations and the most readable by
non-expert users (Rizzi, 2009).
Sales Fact Model
• M E/R (Sapia et al,
1999)
• UML class diagram
(Lujan-Mora et al,
2002)
• fact schema
(Husemann et al, 2000)
• dimensional fact model
(DFM) (Golfarelli, 2010)
Components of Conceptual Model
• Components of conceptual design for D W
– Facts: a fact is a focus of interest for decision-making,
e.g., sales, shipments..
– Measures: attributes that describe facts from
different points of view, e.g. , each sale is measured
by its revenue
– Dimensions: discrete attributes which determine
the granularity adopted to represent facts, e.g.
product, store, date
– Hierarchies: are made up of dimension
attributes
• Determine how facts may be aggregated and selected, e.g.
, day – month – quarter - year
Facts Identification
Fact identification is the most difficult task in the DW design
process and is commonly done manually. Some techniques that
can be used to find facts are as follows:
• A fact table of a star schema can be derived from the many-to-
many relationships in an E/R model that contains numeric and
additive non-key facts (Kimball, 1997).
• Candidate measures can be found in business queries for data
items that indicate business performance (Ballard et al, 1998).
• The most frequently updated entities can be identified as facts
(Golfarelli et al, 1998).
• Fact properties can be summarised (or aggregated); thus, they
are usually found in numerical data (Tryfona et al, 1999).
Dimensional Rule Mapping

UML diagram Snowflake schema


components
UML classes Fact and dimension tables
Aggregation classes Dimension hierarchies
Class attributes Measures/dimension
attributes
Generalisations Aggregation levels
Identification of Dimension Hierarchies
Data in dimensions should be organised into hierarchical levels
(Agrawal et al, 1997).
Navigation path for drilling up and drilling down can be defined by a
hierarchy.
• A simple hierarchy contains precisely one linear aggregation path
within a dimension (e.g. path day  month  year in dimension
time).
• A multiple dimension hierarchy consists of at least two different
aggregation paths in a dimension (e.g. path account can be
account  customer and account  organisation in a dimension
account).
Measures Identification
• Measures are properties of fact collected about business
operation which calculations (e.g. sum, count, average,
minimum, maximum) can be made (Kimball, et. all, 2008).
• Measures are organized by dimensions.
Identify Aggregation
• Aggregation is the central means to summarize and condense
the information contained in the various sources (Cabot, et.all,
2010).
• Summarisability (the guaranteed correctness of aggregation
results) is an essential requirement for OLAP queries.
Consequently, any multidimensional schema should be arranged
so that summarisability is available at the highest possible level.
Furthermore, if summarisability is violated along certain
aggregation paths, then a schema should explicitly explain this
constraint.
Example of Conceptual Design
Sample
To illustrate the proposed model, a case study on the
student admission process that is specifically related to
marketing activities was conducted. A private university in
Jakarta intends to build a monitoring system for student
admissions. A series of related marketing activities have
been carried out to increase student enrolment.
Improvements in decision making related to the
admission system is the expected benefit from the
implementation of a DM.
DFM for Student Admission
DFM for Student Admission
DFM for Student Admission
Q & A?
THANK YOU
Munawar, PhD

You might also like