Conference Proceeding
Conference Proceeding
net/publication/224373315
CITATIONS READS
4 1,294
3 authors:
Christian Schieder
Ostbayerische Technische Hochschule Amberg-Weiden
78 PUBLICATIONS 243 CITATIONS
SEE PROFILE
All content following this page was uploaded by Christian Schieder on 21 November 2020.
provided in section four. Part five focuses on the understanding, measures describe the schema of facts.
essential conversion steps from conceptual towards Within the given example, the measure turnover is
logical data models. The last section discusses the determined by the dimensions product,
prototype and gives an outlook on future research organizational unit and time. Its data type should be
work. currency. Picking out individual dimension members
such as product x, organizational unit y and time
2. Multidimensional Data Modeling period z allows indexing a single turnover value, the
fact t.
Two different alternatives exist to model measures.
The ontological discrimination between schema On the one hand it is possible to create one data cube
and instance layer is a fundamental aspect in modeling for each measure. On the other hand, one cube can
multidimensional data structures. This section contain several measures by introducing a measure
elaborates on the conceptual elements used to describe dimension; with each member representing one
multidimensional data schemata [3, 20]: dimensions, measure.
hierarchies, measures, and attributes. They are used to For the remaining part of the paper it is essential
turn business data into actionable information. Indeed, that measures are not isolated from each other. Often,
a simple spreadsheets structure is a kind of they are linked via calculation rules. These rules are
multidimensional data structure; a two-dimensional used for the dynamic calculation of dependent facts
one. from the externally given independent facts. Complex
Dimensions represent the essential components of a dependencies within systems of financial control can
multidimensional schema. They are a partially ordered be created in this way.
set of dimension elements (also named as dimension Attributes are an integral part: business attributes in
members or dimension positions), which represent the particular give a deeper insight into multidimensional
dimensions individual values; the dimensions data structures. There are two distinct options.
instance. For example, an analysis of a companys Attaching attributes to a dimension implies that every
turnover might be parameterized by the dimensions single element within this dimension inherits these
product, organizational unit and time. Each single attributes, for example each product and each product
product sold by our company is an element of the group. Applying an attribute to a certain hierarchy
product dimension. level opens up the possibility of attributes that are valid
Dimension elements might be grouped by means of for this individual level only; each product might have
hierarchically ordered levels, so-called dimension a weight whereas a product category will not have a
hierarchies. Within the schema description dimension weight.
elements are condensed into generalized abstract
dimension levels (or hierarchy levels). To refer to the
product dimensions example, this leads to an 3. Related Work
aggregation of single products into product
subcategories and/or product categories. The examined This section categorizes some of the related work.
levels elements are connected by parent-child- Due to numerous studies in this area this can be a short
relationships. Usually this results in tree structures with overview only and makes no claim to be complete.
a root node (top level element), several leaf nodes (leaf
level elements) and multiple nodes in between on the 3.1. Modeling Frameworks
particular levels. Deviations from this ideal structure
will occur in practical use such as parallel hierarchies
Several methodologies are intended to represent
or unbalanced tree structures.
multidimensional data models on a conceptual level.
The semantics of measures are determined by the
They can be categorized into three different types [1]:
semantics of their descriptive dimensions. By spanning
extensions to the Entity-Relationship model,
a spatial structure of orthogonal dimensions and
extensions to the UML, and ad hoc models. Each one
defining cells at the dimension element intersections, a
is appropriate to represent basic multidimensional
multidimensional matrix is created. This is often
concepts but they differ significantly in their ability to
referred to as a data cube or hypercube. The contents
represent more sophisticated concepts such as irregular
of these cells, the so-called facts, are precise numerical
hierarchies.
values of the modeled business measures. Within
Entity-Relationship based models extend the basic
research community there is neither a distinct nor a
ER notation by means of multidimensional concepts.
generally accepted definition of the terms measure,
The ME/R model [4] extends the ER model by adding
measurement and fact. According to our
three elements: a fact relationship set, a dimension
2
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
level set and a rolls-up relationship set. Using this 3.2. Model Transformation
approach, only static data structures can be described;
dynamic and functional aspects are not covered. [5] Some of the above-named papers already include
uses dimensions, fact relationships, cardinalities in possibilities to transform conceptual models into
Martin notation, hierarchies and analysis criteria as logical representations. Several approaches concentrate
well as a xor operator to express hierarchy anomalies. explicitly on transforming conceptual models into
Adaptations of UML are exemplified in [6] and [7] logical ones like [5]; and sometimes the other way
as well as the customizations of the model driven round, which is actually more relevant, as seen in [10].
architecture (MDA) such as [8]. The approach in [6] These approaches are often discussed under the term
uses stereotypes to typify classes as cubes, dimension schema evolution. It has gained much interest in both
levels, measures and so on; they are displayed with the research and practice. Therefore, an online
same icons as in the ADAPT notation. [7] proposes a bibliography concentrating on this area is available
UML profile for multidimensional data modeling. The [17]. Currently 418 papers on schema evolution are
MDA approach presented in [8] tries to offer a listed.
framework for all the relevant data warehouse In most cases, the methodologies concentrate on a
components, for example ETL processes, data sources star schema or one of its variants. The prototype
and repositories. The authors extend the UML as well presented in section 5 tries to offer an open architecture
as the Common Warehouse Metamodel (CWM) and which can be extended in order to support more than
use the Query/View/Transform (QVT) language for one target schema.
establishing transformations between different models. A new level of abstraction has been introduced by
Ad hoc approaches raise the level of abstraction by the concepts of model management, as commented in
directly using domain concepts. Therefore, ad hoc section 6. Currently, the cited online bibliography [17]
approaches can be seen as Domain Specific Languages contains 61 papers on this approach.
(DSLs) [18]. They are especially useful if the modelers
themselves are not software developers; language
visualization and ease of use are emphasized. Modelers 4. ADAPT foundations
do not have to cope with stereotyped classes or
variations of entity types; they can intuitively use The ADAPT notation emerged during the 1990s in
multidimensional modeling concepts like cubes and the course of an attempt to create a graphical, business
dimensions. oriented representation of OLAP data models [2]. Due
Therefore ad hoc approaches differ from ER or to its pragmatic roots, the notation lacks any formal
UML approaches by not adapting a certain notation to foundation. Further, by the time of the publication of
new fields of application, but developing a new visual this paper, the modeling language had largely been
language in order to support modelers on a higher level ignored in scientific publishing on conceptual
of abstraction. modeling. To overcome these deficiencies we will
ADAPT ranks among these ad hoc approaches. introduce the basic building blocks of the modeling
There are several other methodologies with a formal language, demonstrate their usage and subsequently
foundation such as the Dimensional Fact model [9]. provide a formal foundation by presenting a UML-
The model consists of some fact schemes whose basic based metamodel.
elements are facts, measures, attributes, dimensions
and hierarchies. They are accompanied by several 4.1. Modeling with ADAPT
other features such as attributes or the additivity of
attributes along dimensions. The notation provides a variety of symbols which
We have chosen ADAPT for the purpose of this are depicted in figure 1. Each one of them represents a
paper because it allows the creation of semantically conceptual object of an OLAP application.
rich models due to several different model elements. A common issue of modeling in analytical contexts
Furthermore, ADAPT is relatively easy to learn and is the necessity to not just model the schema of a
mostly self-explanatory. In contrast to other ad hoc multidimensional problem (such as hierarchy levels),
modeling techniques a stencil for Microsoft Visio has but to model specific instances within this schema
been made available free of charge, thus facilitating the where appropriate (such as individual dimension
application of the modeling language in practice. members). Dynamic aspects, especially calculation
models of derived measures, are of great importance.
ADAPT closes both gaps by offering special symbols,
as described within the following paragraphs.
3
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
The basic elements of this notation are Hypercube measure dimension. Dependencies between measures
(or Cube for short) and Dimension. A Hypercube is the or scenarios can be indicated by using the Model
basic unit of storage for business data in a element. Models help to represent calculation rules
multidimensional database, physically, an n- within systems of financial control, for example the
dimensional array. A dimension is the representation of DuPont-System. To leverage diagram clearness it can
an axis of such a cube. Detailed modeling of be useful to depict one cube per measure and use the
dimensions is done via the symbols Hierarchy, Level, model element to show calculations between cubes.
Member, Attribute, Scope, and Model.
4
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
indicates partial division of the population into subsets; Figure 5 shows the entire representation of the
dimension members may be found in more than one product dimension in ADAPT as described in our
subset. In our example each product exclusively example case above. Our exposition thereby is limited
appears in one class. Therefore we apply the Fully to the application of the most important elements. For a
Exclusive operator. more detailed explanation see [2].
Description
English
4.2. ADAPT Metamodel
Description
Product
Spanish
This section introduces the ADAPT metamodel,
which is depicted in figure 6. It has been created
Category Supplier following the design science approach [19].
Furthermore, ADAPT can be seen as a Domain
Specific Languages and the development of
{ } Product
category
metamodels is a typical approach in implementing
DSLs [18]. We make use of the UML notation
{ } Product
subcategory { } Supplier whereby classes represent modeling objects as well as
operators; stereotyped associations represent the
different connector types.
{ } Product Package type
Within the first step of the development process,
{ } A class product we selected the most important elements of ADAPT
Package size
which support the basic ideas of multidimensional
{ } B class product
Weight
modeling presented above. These elements form the
classes within the metamodel. The second step
{ } C class product consisted of establishing relationships between the
modeling elements according to [2].
Figure 5. Sample product dimension The navigation directions of the metamodels
associations indicate the direction in which the arrow
By using the Attribute element we can depict heads of each connection symbol should lead. For
additional information on the characteristics of example, the association between Cube and Dimension
dimension elements. Attributes can be assigned to all is navigable from Cube towards Dimension and
elements of a dimension, elements of a certain level or stereotyped with connector. This implies the usage of a
members of a dimension scope. In our example we use connector (see figure 2) between Cube and Dimension
attributes to associate language-dependent descriptions with the arrow pointing towards the dimension. Please
to every single element of the dimension at all notice that the arrows of Strict Precedence and Loose
hierarchy levels. However, package type, package size Precedence are heading to the parent level when
and weight only make sense on single products. connecting hierarchy levels with each other.
5
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
In order to increase clarity, the classes do not modeled members. It is also not intended to limit the
contain any attributes. Every class should have at least range of attributes possible values.
a name as well as containers for storing the connected The use of identifiers should be limited. Ideally,
elements. In case of Cube this would include a each identifier is distinct within its object type. For
collection of corresponding dimensions. A name can instance, each dimension and each hierarchy level
be omitted only at Subset Operator and its child require different names. The same does not apply to
classes. elements with different semantics, for example both a
A dimension is linked with one or more members, dimension and a hierarchy can be identically named.
hierarchies or attributes via connectors. The XOR-
Constraint expresses that either individual dimension 5. Modeling Tool Prototype
members or hierarchies exist.
There might be a connector or a strict precedence
The presented prototype should be easily
between hierarchies and the uppermost hierarchy level.
extendable by means of multiple database systems that
Using strict precedence claims that there should be an
can be accessed via the modeling tool. The main
artificial overall hierarchy level. Strict precedence and
problem behind this thought is the existence of various
loose precedence as well as self precedence connect
target platforms which differ in the way they store
single hierarchy levels. Strict or loose coupling allows
multidimensional data: ROLAP, MOLAP und
more than one parent level; in case of a recursive
HOLAP. Therefore, a way of accessing multiple
relationship exactly one parent and one child level
transformation algorithms must be found, and each
exist. Subset operators divide a hierarchy level into
algorithm has to be parameterized. The right-hand side
different dimension views. A connector establishes the
of figure 7 summarizes this approach. In order to
connection between level and subset operator; more
concentrate on this requirement we use the concept of
than one subset operator can be connected to one
reutilization by not inventing a new modeling language
hierarchy level. The subset itself consists of one or
or a new editor but using ADAPT and an existing
more scopes.
ADAPT stencil for Microsoft Visio.
Attributes are, as already stated, attached to
dimensions. Alternatively, they might be linked to
hierarchy levels and scopes via connectors. 5.1. Architecture concept
Calculation models have at least two inbound
members and create one calculated value. A member is The prototype is based on a three-layer architecture
allowed to take part in more than one calculation as shown in figure 7. An editor for ADAPT is situated
model. Calculation rules between different cubes within the graphical or presentation layer. In our case
should not be considered here. this will be Microsoft Visio because of an already well-
The proposed metamodel does not support all proven stencil. Starting with Visio 2003, it is possible
aspects of the ADAPT notation. For example, scopes to save diagrams in an XML format, DatadiagramML
are bound to hierarchy levels via subset operators. The (formerly known as XML for Visio).
authors of [2] see the possibility to link scopes directly The abstraction layer acts as an intermediary
with dimensions in order to categorize individually between graphical modeling and transformations.
multidimensional
problem
T SQL-DDL
{ }
ServiceLoader API
{ }
{ } T ...
6
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
ADAPT diagrams are represented as directed typed and time are shared by both cubes. Further research
graphs in order to reduce the loss of information will include evaluation in real-world business
between diagrams and internal storage. A typed graph applications. At the moment there are negotiations on
G can be defined as the following tuple: G = (N, E, tN, how to integrate our software in a large project within
tE, s, t) with a finite set of nodes N, a finite set of edges the telecommunication industry sector.
E, an assignment of a type to each node tN: N ĺ N as
well as to each edge tE: E ĺ E, and the assignment of 5.2.1. Galaxy Schema. Within a galaxy schema, cubes
source and target for each edge s, t: E ĺ N [10]. The and dimensions correspond to fact and dimension
instances of N, E, s and t arise from the ADAPT relations, respectively. They are connected via foreign
diagram whereas E and N reflect the modeling key relationships [3].
elements of the notation itself.
The third layer provides transformational Table 1. Prefixes for identifying
functionalities. Therefore, one of the most important relations and their attributes
requirements is the extensibility by means of adding
new transformation algorithms without intervening into Model element Prefix
the source code of the modeling toolset. Fact table fact_
In version 6 of the Java Standard Edition (SE), the Dimension table dim_
ServiceLoader API has been made public [16]. It helps Primary key pk_
to find, load and use so-called service providers. Foreign key fk_
Within the context of this API a Service is a collection Variable / Measure m_
of interfaces and classes that provide access to specific Hierarchy level level_
program functionality. A service provider reflects the Parent level of recursive parent_level_
actual implementation. In the case of the prototype relationships (self precedence)
each transformation algorithm represents a service Attribute attr_
provider. They are defined by the service provider Dimension scope (connected via subset_
interface (SPI), which consists of a set of public Exclusive operator)
interfaces and abstract classes. Only the classes and Dimension scope (connected via scope_
methods contained within the SPI are visible to the Partially operator)
actual application.
To create an SPI it is necessary to find a generic In order to identify the modeling constructs in the
interface which offers capabilities for creating generated schema, prefixes which precede each
multidimensional modeling constructs. The following identifier (such as relations or columns) are introduced.
operations are suitable for our purpose: A summary is given in table 1.
• create database <database>: This operation creates A second assumption concerns the data types of
a new database as a container for all other structures each individual modeling element. Table 2 outlines
and quantitative values. them. Our future task is to use transformation
• create dimension <dimension>: Dimensions are parameters to implement a solution which allows a
essential for multidimensional data structures. Each flexible setting of data types.
dimension has to receive a distinct name.
• create measure <measure>: As a next step, Table 2. Data types for modeling elements
measures have to be instantiated within the database. within a galaxy schema
• create cube <name> <var1>, <var2>, ..., <varm>
Model element Data type
<dim1>, <dim2>, ..., <dimn>: The fourth method
Primary key INTEGER
assembles data cubes from the given dimensions and
Foreign key (has to be the same INTEGER
variables.
as primary key)
A reference to a partial graph is passed to each method.
Fact DOUBLE
Hierarchy level VARCHAR(255)
5.2. Transformation Examples Attribute VARCHAR(255)
Dimension Scope VARCHAR(255)
The prototype is evaluated by a case study, which
has been simplified for the purpose of this paper. A Each dimension corresponds to a relation
second production cube containing four dimensions dim_<dimension name> with a serially-numbered
(Product, Plant, Time and Measures) extends the primary key. Attributes of the dimension itself
example given in section 4.1. The dimensions product correspond to a separate column attr_<attribute
7
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
name>. If there are dimension members, only one of each scope: subset_<name scope 1_name scope 2_
single column is generated <dimension name>. The ... _name scope n>. An alternative would be the
members are added later on via INSERT statements. creation of a column with a SET data type containing
one entry for each scope. Overlapping operators (Fully
Product Overlapping and Partially Overlapping) result in one
Plant Category
Organizational
Unit
column for each scope. This allows an element to be
contained in different scopes. Recursive connections
conceptual ADAPT
{ } Units
foreign key relationships to dimension tables
Measures
Production Time { } Year
{ } Turnover
(fk_<dimension name>). For each measure a column
{ } Units fact_<fact name> is created. An alternative would be
{ } Month
the creation of a measure dimension; each measure
T would map onto one dimension element.
fact_production
fact_sales assumptions in table 3 apply to data types.
fk_product
fk_oranizationalunit
fk_customer The Netbeans Metadata Repository (MDR) is an
fk_plant fk_product
fk_time fk_time implementation of a Meta Object Facility (MOF)
m_units m_units
m_turnover compliant repository. It is appropriate for generating
dim_customer
pk_cusotmer
interfaces according to the Java Metadata Interface
dim_plant dim_time
# dimension Product
CREATE TABLE dim_product(...); Attribute String
Identifying Attribute Integer
code SQL DDL
# dimension Time
CREATE TABLE dim_time(...);
Fact Float
.
.
.
# cube Sales A schema has to be instantiated as a container for
CREATE TABLE fact_sales(...);
storing the model elements of the whole ADAPT
# cube Production
CREATE TABLE fact_production(...);
diagram. Each cube and each dimension map onto the
.
corresponding classes of the CWM. A
.
.
CubeDimensionAssociation connects cubes and
dimensions. The mapping of a dimension organizes as
Figure 8. Transformation steps from follows: for each dimension level an object has to be
conceptual ADAPT towards SQL DDL generated and has to be published to the according
dimension. Hierarchies are represented as instances of
According to the metamodel, hierarchies are the class LevelBasedHierarchy. Additional semantics
permitted only if there are no dimension members. can be given to hierarchies by using the class
Hierarchy levels are represented in a column ValueBasedHierarchy. It is useful for hierarchies that
level_<level name>; the same holds for attributes of have some kind of topological order within the
levels: attr_<attribute name>. Levels in parallel individual levels. Because of complexity reasons, the
hierarchies are represented by one column for each current implementation only supports instances of
level. Subsets connected via Fully Exclusive or LevelBasedHierarchy.
Partially Exclusive result in one column per subset Despite the relatively low depth of CWM, some
operator. The columns name corresponds to the names arguments can be found to build a conceptual metadata
8
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
schema on the basis of CWM. Some parts of the into different logical representations. A suitable way of
metadata can be stored according to the standard which parameterization has to be implemented.
leads to an easier exchange of metadata via XMI. For an aggregation of the Java tool and Visio into
Standard conformant extensions of the reference model one implementation the features of Eclipse Graphical
offer the possibility to adapt it to enterprise specific Editing Framework (GEF) have to be evaluated. The
needs. These extensions can be easily published and research prototype GraMMi [12] offers interesting
used by other CWM users. The usage of XMI offers thoughts about a repository-based graphical modeling
the advantage of checking files against a document toolset. Another attractive proposition to develop
type definition or an XML schema in order to discover multidimensional data models collaboratively within a
transmission errors. web browser also needs to be investigated.
In this context some consideration has to be given
Product to a quality evaluation of the data models. [11]
Plant Category
Organizational
Unit
proposes a two-dimensional approach: the first
dimension is the classification concept (syntax,
conceptual ADAPT
Product
Plant
{ } Product Product
Ogranizational Unit
includes measures (absolute and relative measures).
Time
Measures Production
Customer
Time
Measures Sales
Measures
Sales Further research work needs be checked and adapted to
ADAPT diagrams.
Time
{ } Units
Measures
Production Time { } Year
{ } Turnover
Within the transformation layer an optimization of
{ } Units
{ } Month
the generated schemata has to be implemented. Much
improvement can be made within a star or galaxy
T schema. Some suggestions are given in [5].
The prototype makes use of object-at-a-time
:LevelBasedHierarchy
operations. This can be understood as the
:Schema
transformation of the original problem into an object-
:CubeDimensionAssociation
Product: Dimension
:HierarchyLevelAssociation
oriented one and manipulating the models and
:CubeDimensionAssociation
:HierarchyLevelAssociation
mappings in that representation. To raise the level of
Production: Cube
logical CWM
:CubeDimensionAssociation :Level
and mapping-at-a-time operators. They are expected to
Sales: Cube
improve programmers productivity for metadata
:CubeDimensionAssociation Plant: Dimension
applications [13].
:CubeDimensionAssociation
Organizational Unit:
Dimension Systems supporting such functionality are
:CubeDimensionAssociation Customer: Dimension discussed under the term model management systems
(MMS). They support the creation, compilation, reuse,
evolution, and execution of mappings between
schemas represented in a wide range of metamodels
<?xml version = '1.0' encoding = 'utf-8' ?> [14]. Such an MMS is not a user-oriented tool. In fact
<XMI>
<XMI.header> it is a reusable component that can be integrated into
...
</XMI.header> specific applications relating to data programmability
<XMI.content>
problems.
code XMI
<CWMOLAP:Schema>
<CWMOLAP:Cube>
... The most relevant operators for our problem at the
</CWMOLAP:Cube>
<CWMOLAP:Dimension> current state of research are generation as well as
...
</CWMOLAP:Dimension> execution of mappings. New transformations could be
.
. generated quickly, in the best case even without
.
</CWMOLAP:Schema> writing any Java code; a graphical definition would be
</XMI>
</XMI.content>
ideal.
An important topic on our research agenda is a
Figure 9. Transformation steps from reverse engineering approach of existing data
conceptual ADAPT towards XMI warehouses which is not currently implemented and
will require further research. In this context,
6. Discussion and Future Work visualization of data models is not restricted to the
ADAPT notation, sometimes different representations
The prototype shows the possibility of finding a are more useful. Fact sheets that can be viewed with
generic interface for algorithmic transformation of Excel showing a description or the calculation of a
conceptual data models based on the ADAPT notation measure are one example.
9
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009
10
View publication stats