0% found this document useful (0 votes)

18 views

A Workload-Driven Logical Design Approach For NoSQL Document Databases

Uploaded by

olsowyverena

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

A Workload-Driven Logical Design Approach For NoSQL Document Databases

Uploaded by

olsowyverena

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299997714

A workload-driven logical design approach for NoSQL document databases

Conference Paper · December 2015

DOI: 10.1145/2837185.2837218

CITATIONS READS
22 1,668

2 authors:

Cláudio Lima Ronaldo Mello

Federal University of Santa Catarina Federal University of Santa Catarina
8 PUBLICATIONS 35 CITATIONS 78 PUBLICATIONS 519 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Collaborative Networks as Internationalization of Higher Education dynamic: a knowledge sharing model View project

SQLtoKeyNoSQL View project

All content following this page was uploaded by Ronaldo Mello on 02 February 2019.

The user has requested enhancement of the downloaded file.

A Workload-Driven Logical Design Approach
for NoSQL Document Databases
Claudio de Lima Ronaldo dos Santos Mello
Postgraduate Program in Computer Science (PPGCC) Postgraduate Program in Computer Science (PPGCC)
Informatics and Statistics Department (INE) P.P. in Methods and Management in Evaluation (PPGMGA)
Federal University of Santa Catarina (UFSC) Informatics and Statistics Department (INE)
Florianópolis/SC, Brazil 88040-900 Federal University of Santa Catarina (UFSC)
[email protected] Florianópolis/SC, Brazil 88040-900
[email protected]

ABSTRACT challenges for data management in the cloud, including how to

NoSQL databases are designed to manage large volumes of data. handle and store these data. NoSQL Databases (DBs) are
Although they do not require a default schema associated with the designed to manage large volumes of data, commonly referred to
data, they are categorized by data models. Because of this, data as Big Data, and a large number of read and write operations [5].
organization in NoSQL databases needs significant design Although NoSQL DBs do not require a default schema
decisions because they affect quality requirements such as associated with the data, they are categorized by data models (key-
scalability, consistency and performance. In traditional database value, document, columnar and graph-based) [16], demonstrating
design, on the logical modeling phase, a conceptual schema is that their data show some degree of structuring. The importance
transformed into a schema with lower abstraction and suitable to of a model associated with the data is related to the definition of
the target database data model. In this context, the contribution of better strategies for persistence and manipulation of such data in
this paper is an approach for logical design of NoSQL document the target DB. In addition, data organization in NoSQL DBs
databases. Our approach consists in a process that converts a requires significant design decisions because it affects quality
conceptual modeling into efficient logical representations for a requirements such as scalability, consistency and performance [4].
NoSQL document database. Workload information is considered In this context of data modeling, conceptual schemas and
to determine an optimized logical schema, providing a better ontologies are crucial to define data semantics, providing access
access performance for the application. We evaluate our approach to them with higher accuracy. Traditional DB design is a process
through a case study in the e-commerce domain and demonstrate consisting of three data modeling phases [1, 9]: conceptual,
that the NoSQL logical structure generated by our approach logical and physical. At the conceptual modeling phase, a schema
reduces the amount of items accessed by the application queries. with the information of a domain is represented in a high level
abstraction model. In the sequence, in the logical modeling phase,
Categories and Subject Descriptors the conceptual schema is transformed into a schema with lower
H.2.1 [Database Management]: Logical Design. abstraction but suitable to the target DB data model. This logical
design phase, specifically for NoSQL DBs, is the focus of this
paper.
General Terms Support methodologies for the logical design of NoSQL DBs is
Algorithms, Performance, Design. a topic very little explored in Database literature. Therefore, this
paper aims to contribute to this problem by proposing a
Keywords methodology for the logical design of NoSQL document DBs.
NoSQL document, logical design, conceptual schema, workload. This methodology consists in a process that converts conceptual
modeling for suitable and efficient logical representations for a
NoSQL document DB. Document-oriented DBs are an
1. INTRODUCTION appropriate category for Web applications or applications that
The immense amount of data generated daily by applications from deal with Big Data because they provide semistructured data
several domains, such as Web data management, social networks, storage and dynamic queries execution, horizontal scalability and
sensor networks and educational evaluation, brings several high availability [15]. For these reasons, we chose this NoSQL
DB category.
Our conversion approach for generating NoSQL document
Permission to make digital or hard copies of all or part of this work for logical schemas from conceptual schemas considers the expected
personal or classroom use is granted without fee provided that copies are not workload of the application. Workload information is given by the
made or distributed for profit or commercial advantage and that copies bear designer in terms of the amount of data instances estimated for the
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
NoSQL DB, as well as the main operations that will be performed
credit is permitted. To copy otherwise, or republish, to post on servers or to over these data. This information is used to determine an
redistribute to lists, requires prior specific permission and/or a fee. Request optimized logical structuring for the NoSQL DB schema,
permissions from [email protected]. contributing, in general, to a better access performance for the
iiWAS '15, December 10-12, 2015, Brussels, Belgium application. We evaluate our approach through an experimental
© 2015 ACM. ISBN 978-1-4503-3491-4/15/12…$15.00
evaluation in the e-commerce domain, where existing datasets
DOI: http://dx.doi.org/10.1145/2837185.2837218
were redesigned by our approach in order to compare the number
of accesses generated by queries over the redesigned schema as aggregates at the logical level. According to the work of [20],
well as over the workload-based schema generated by our aggregates are suitable to represent key-value, document, and
methodology. We demonstrate that the NoSQL logical structure columnar NoSQL data models. The study of [4] explores the
generated by our approach reduces the data accessing overhead. commonalities of some categories of NoSQL DBs, and although it
The remainder of this paper is organized as follows. Section 2 acts in the three design phases, it does not consider all the
discusses related work. Section 3 presents the document NoSQL conceptual constructs nor formalizes conversion processes
logical model considered by our approach and some definitions between conceptual modeling and logical representations in the
regarding workload information. Section 4 provides an overview NoAM model. The proposal of [14] presents the conversion of a
of our approach, including the conversion algorithms for mapping conceptual model into a logical model, but it does not address the
conceptual constructs to suitable structures in the NoSQL physical modeling phase and do not formalize conversion
document logical model considering the workload of an processes between conceptual and logical representations.
application. Section 5 presents the experimental evaluation and
Section 6 is dedicated to the conclusion. Table 1. Comparison of related work
# Database Design
2. RELATED WORK Conceptual Logical Physical
This section presents related work to NoSQL DBs modeling and
NoSQL Logical Model
includes approaches dealing with other non-relational schemas
[4] UML NoAM – aggregate- specific elements
(XML and Object-Oriented (OO)) in order to identify based of NoSQL DBs
contributions to the logical modeling of NoSQL DB, since the categories
specific literature to NoSQL is quite limited. XML and OO [14] IDEF1X IDEF1X – -
models, as well as of NoSQL data models, are complex data aggregate-based
models whose similarities in terms of mapping strategies, such as XML Logical Model
the treatment of multivalued and nested attributes, can be adapted [21] EER XML logical model DTD / XML
to the NoSQL DBs logical design. This section also presents a Schema
brief comparison of these approaches in order to identify the [6] EER - DTD
modeling levels (conceptual, logical and physical) attended by [12] EER - DTD
each of them. [8] EER hierarchical XML Schema
The work of [4] presents an approach to NoSQL DBs design structures
which explores the commonalities of some NoSQL DBs’ [2] UML UML+ stereotypes XML Schema
categories. The proposal introduces a data model (NoAM – OO Logical Model
NoSQL Abstract Model) for the logical level, and demonstrates [18] ER OO logical schema DDL O2
how data modeled in NoAM can be implemented in some NoSQL
[3] ER F-Logic DDL ONTOS
DBs. NoAM is based on the concept of aggregates, which is a
term of Domain-Driven Design (DDD) [10]. DDD is a widely [7] EER OO schema -
adopted OO design approach, being an aggregate a collection of [19] ER OO schema -
related objects, organized in a nested way, which can be treated as [11] EER OMT based -
a unit [20]. The approach of [4] suggests support for scalability,
consistency and performance, having four phases: (i) aggregate On considering design methodologies for XML DBs, we
design: the classes of aggregated objects needed for the observe that the work of [21] considers the three DB design
application are identified (conducted by use cases and functional phases as well as all EER conceptual model constructs to generate
requirements); (ii) aggregate partitioning: aggregates are divided an equivalent XML structure. Information regarding the estimated
into smaller data elements (conducted by use cases and load for the DBs is used to define optimizations on the XML
performance requirements); (ii) high level NoSQL DB design: structure generated by the transformation process. Regarding
aggregates are mapped to the NoAM model according to the approaches for OO logical modeling, we observe that most of
identified partitions; and (iv) implementation: the NoAM schema them consider the ER model in the conceptual modeling phase.
is converted to the schema of the target NoSQL DB. However, there is no consensus with respect to the logical model
The work of [14] uses IDEF1X (Integration DEFinition for and, in particular, in the way binary relationships and
Information Modeling), a data modeling language for the generalization types are converted.
development of semantic data models, in the conceptual Different from related work, our proposal covers all typical
modeling phase to represent the application domain, and also to conceptual constructs, details the conversion algorithms between
represent the aggregate-based NoSQL logical model obtained conceptual schemas and logical representations for NoSQL
through a conversion process between these models. This document DBs category, and additionally considers the estimated
proposal provides support for the analysis of different modeling DB workload to perform optimizations in the logical structure. It
strategies, like schema partitioning into smaller and independent is detailed in the following.
aggregates in the SOA context (Service Oriented Architecture).
Besides these specific approaches for NoSQL DBs modeling, 3. FUNDAMENTALS
the literature presents several design methodologies for XML DBs Our approach provides the conversion of conceptual schemas into
[2, 6, 8, 12, 21], as well as conversion processes of conceptual NoSQL document logical schemas. It starts with a conceptual
modeling to OO logical representations [3, 7, 11, 18, 19]. Table 1 schema and workload information given by the application
shows a comparison of related work aiming to relate the modeling designer, as shown in Figure 1. The workload information is
levels attended by each proposal. estimated over a conceptual schema, being also used as input for
Regarding specific design methodologies for NoSQL DBs, we the Logical Design phase in order to generate appropriate logical
observe that they propose logical schemes using the concept of structures. The mapping of the conceptual schema to a NoSQL
document logical schema is governed by a set of rules that to B. By the same way, the value of Avg(B,A,R1) appears next
converts each conceptual constructor to an equivalent to A.
representation in the NoSQL document logical model.
Our logical model is an abstract model to represent NoSQL
document implementation models. In the Implementation Design
phase, a NoSQL document logical schema is translated to the
common implementation model for NoSQL documents, i.e., the Figure 2. An EER schema with volume of data information.
JSON1 specification. Even though the Implementation Design
level is considered by our approach, this paper focuses on Definition 2. Application Load. Consider an EER schema Ɛ =
generating optimized NoSQL document structures from a {t1, . . . , tm} and a set of operations O = {o1, ..., on} over Ɛ such
conceptual schema in the Logical Design level. that each oi  O is applied over a list of types T = (t1, .., tp) with
T Ɛ. The application load on Ɛ is defined by a set of operations
and it is composed by two functions: (i) f(oi) is the average
frequency of oi in a period of time; and (ii) v(oi, tj) is the volume
of instances of tj accessed by oi. This volume is given for each tj 
T respecting the accessed order imposed by oi. v(oi, tj) is defined
as f(oi) when j = 1; otherwise, it is defined as v(oi, tk) × ω, where
tk is the type accessed by oi before tj, and ω is 1 if tj is an entity
type or Avg(t´, . . . , t´´, tk, tj) if tj is an relationship type, being t´,
Figure 1: An overview of the proposed approach. . . . , t´´ types associated to tk in a relationship determined by tj .

Our input conceptual schema is defined by the Extended Entity- An operation is an elementary interaction with the application,
Relationship (EER) model [1], a classical and suitable model for which includes retrieval or updating operations. Table 2 shows an
representing data concerning an application domain. Other example of a set of operations estimated as the application load.
conceptual models could be considered, like UML. Instead, we Operation O1, for example, has an average frequency of 900 times
adopt EER because it contains the essential constructs for a day. The entity and relationship types C, R2 and B are accessed,
conceptual modeling. in this sequence, by o1. Note that the initial concept C is accessed
Some definitions regarding workload information and the 900 times by o1. In the navigation sequence, the average number
logical model defined by our approach are presented in the of accessed instances of the concepts R2 and B is obtained by
following. multiplying 900 by 20, considering that Avg(C,B,R2) = 20.
Analogously, we have Avg(A,B,R1) = 2 for operation O2.
3.1 Workload Information
Workload information corresponds to the data load expected for a Table 2. Operations for the schema of Figure 2
NoSQL-based application. This information allows our Frequency Concept Access
conversion process to choose an optimized NoSQL document Operation
per day accessed volume
structure to represent a conceptual schema. According to Batini et. C 900
al. [1], we may concentrate on the 20% of the most frequent O1 900 R2 18000
operations that will be performed by the application. This B 18000
assumption is rooted on the so-called 20-80 rule, which says that A 300
20% of the operations produce 80% of the application load. Our O2 300 R1 600
workload analysis identifies the concepts frequently accessed by B 600
transactions and is based on the workload modeling methodology
defined in [1, 21, 22] as follows. Given the volume of data and the application load, the
definition of the total frequency of access (operation access) on a
Definition 1. Volume of Data. Given an EER schema Ɛ, the type in an EER schema is presented.
volume of data of Ɛ is defined by V = {N(t), Avg(Ƭ , r)}, where
N(t) is the average number of occurrences of a conceptual type t Definition 3. General Access Frequency (GAF). Given an EER
 Ɛ and, given a n–tuple Ƭ = t1, . . . , tn (n > 1) of entity types schema Ɛ, O = {o1, ..., on} is the set of operations such that each oi
associated through a relationship type r, Avg(Ƭ , r) is the average  O is applied over a list of types T  Ɛ. The GAF of a type t  Ɛ,
cardinality among the entities in Ƭ through r. where n (n  0) represents the number of operations in which t is
accessed, is defined as follows:
Figure 2 shows an example of an EER schema augmented with
volume of data. The average number of instances N expected for (1)
the conceptual types is represented in the type shape and the
average cardinality (Avg) is presented on the associations. We The GAF of the concept B is GAF(B) = 18600 by considering
omitted Avg parameters for the sake of clarity. Thus, we have the 18000 instances accessed by O1, and the 600 accessed by O2.
N(A) = 250, N(R1) = 500 and so on. For the average cardinality, In order to evaluate GAF measure, we may consider a Minimal
we have Avg(A,B,R1) = 2, Avg(B,A, R1) = 1 and so on. Access Frequency (MAF), which is given by the designer as an
Cardinality interpretation is given as follows: the average volume input of our process. MAF is a value that represents the minimal
of instances of A related to B through R1 is 2 and it appears next frequency for accesses involving operations, and values below it
are considered as insignificant frequencies. We introduce an
example as follows. Suppose the designer assume that the set of
1
A lightweight data-interchange format (json.org). considered operations in Table 2 represents 80% of the
application load and the MAF should represent 0.9%. Thus, if the restrictions, like participation in the Partner block. An identifier
sum of the GAF of all schema concepts is 38400 accesses, the attribute is an attribute that is part of a root block, like ID_code in
MAF is 0.9% applied over 80% of this value, i.e., 432 accesses. the Person collection. A reference attribute is an attribute that
Given this minimal value, we can evaluate if the GAF of a concept refers to a block identifier, like contributor_REF, that refers to the
is relevant for the workload. If GAF(B) = 18600, for example, we Contributor collection.
say that B is a concept frequently accessed by transactions
because its GAF is higher than MAF (432).

3.2 NoSQL Document Logical Model

We propose a NoSQL document logical model to represent the
document data model. Our conversion approach generates NoSQL
schemas defined by this logical model in the Logical Design
phase. The NoSQL document logical model is an abstract
representation for NoSQL document models and consists in an
adaptation of the aggregate approach [10], which is a widely
adopted OO approach. In this context, an aggregate represents a Figure 3: Example of a NoSQL document logical schema.
collection of related objects, in a nested way, which can be treated
as a unit. Such a notion is suitable to NoSQL documents given The logical model supports two types of relationships:
that they are hierarchical data structures that consist of nested data hierarchical relationship and reference relationship. A hierarchical
collections and scalar values [20]. Besides, the choice for an relationship defines the minimum and maximum occurrences of a
aggregated-based logical representation is justified by the fact that target concept in a source concept. For instance, Person (root)
they support typical NoSQL databases requirements, like block may have zero or one Student block. The default minimum
scalability and consistency, as they provide a natural unit for and maximum occurrence for target concepts is 1. The
sharding and atomic manipulation of data in distributed disjointness constraint on generalization hierarchies is represented
environments [10, 13]. by the curly bracket (brace) symbol ("{"), graphically aligned to
A NoSQL document logical schema is composed by collections, the left of the target inner blocks. An example in Figure 3 is
blocks and attributes. A schema has one or more collections, and shown for the inner blocks Student and Employee. A reference
each collection has a root block. All updates to a collection pass relationship is represented by a reference attribute (or a set of
through its respective root block, ensuring the business rules. The reference attributes), which refers to the identifier of other
root block is the only block accessible out of the collection. The collection, like contributor_REF, that refers to the collection
main concepts of the logical model are defined as follows. Contributor.
As the NoSQL document logical model is an abstract
Definition 4. NoSQL document logical schema. A NoSQL representation for NoSQL document models, a NoSQL document
document logical schema NDSi is a set of collections Ci, where logical schema can be easily converted to the common storage
each collection has a unique name in the schema. format JSON for NoSQL document. Only a few decisions must be
Definition 5. Collection. A collection cj is a non-empty set of accomplished to the translation of the identifier and reference
blocks Bj, having cj a root block rbj  Bj. attributes. Such a conversion is out of the scope of this paper.

Definition 6. Root Block. A root block rbk consists in an 4. THE CONVERSION PROCESS
attribute ak that identifies uniquely rbk in a collection ck, and a Our conversion process is based on conversion rules for mapping
non-empty set of attributes Ak and/or blocks Bk. EER constructs into equivalent NoSQL document constructs in
the logical model described in Section 3.2. Algorithm 1 presents
Definition 7. Block. A block bx is a set of attributes Ax, or a set
the overall process, which comprises two main steps: conversion
of inner blocks Bx, that supports disjointness constraints for the
of generalization types and conversion of relationship types.
inner blocks bx Bx.
Definition 8. Attribute. An attribute ay of a block by is a tuple Algorithm 1 EER-NoSQL
(cy, vy), where cy identifies uniquely ay in by, and vy is the value of Input: An EER Schema Ɛ with load data information;
ay. The Minimal Access Frequency MAF of Ɛ
Output: A NoSQL document logical schema NF
Definition 9. Hierarchical Relationship. A hierarchical
relationship hrm  sbm, where sbm is a source block, is defined H  convertHierarchies (Ɛ, MAF);
between sbm and an inner (target) block tbm  sbm. R  convertRelationships (Ɛ, MAF, H);
NF  listOfCollections (R);
Definition 10. Reference Relationship. A reference relationship
defineRootBlockIDs(NF).
rrn is represented by a reference attribute ran  sbn or a set of
reference attributes RAn  sbn, where sbn identifies a source Generalization types are converted first, followed by the
block. The reference attribute ran refers to the identifier attribute conversion of the relationships. The blocks generated by the
ao of a target collection’s root block rbo. function convertHierarchies are maintained by the function
convertRelationships. After relationship types conversion, the
Figure 3 presents a NoSQL document logical schema in
remaining root blocks are finally defined as schema's collections.
accordance to our proposed logical model. There are three types
At the end of the process, a list of collections is returned, after the
of attributes: normal, identifier and reference attribute. A normal
definition of identifier attributes for collections’ root blocks when
attribute models a block property and does not impose
necessary. Load information is considered during the conversion
of the relationship types in order to generate well-structured In the alternative defined by Rule 3, the superclass and
NoSQL document logical schemas. subclasses are explicitly represented by blocks. Hierarchical
Next sections detail the rules for converting generalization relationships are established among the superclass and subclasses
hierarchies and relationship types as well as their respective blocks to represent the relationship. The generalization constraints
functions, as presented in Algorithm 1. are represented by the minimum and maximum occurrences of the
subclasses’ blocks in the superclass block. In cases where a
4.1 Hierarchy Types Conversion subclass of a generalization type has already been converted, the
A generalization hierarchy in the EER model defines a subset relationship with the superclass block is established by a reference
relationship between a generic entity, namely superclass, and one relationship between the block previously created to represent the
or more specialized entities, namely subclasses. The disjointeness converted subclass and the superclass block. In this case, the
and completeness constraints that are set to the subclasses superclass block is defined as a referenced block.
establish four possible constraints on generalization types: total
and disjoint (t, d); partial and disjoint (p, d); total and overlapping Rule 2. Generalization Focused on Subclasses. The conversion
(t, o); and partial and overlapping (p, o) [1]. of a generalization type G proceeds as follows:
Categories or union types of the EER model can be considered given an entity Esp defined as the superclass of a generalization
restricted cases of multiple inheritance [1]. Thus, their conversion type and {Esb1, Esb2, ..., Esbn} the set of subclasses of Esp, for each
strategies are similar to the strategies for processing generalization Esbi  {Esb1, Esb2, ..., Esbn} do: generate a block bsbi and define the
types. For sake of paper space, we omit these strategies. In this attributes of the Esbi and Esp as attributes of bsbi.
section, we define alternative rules to convert generalization Rule 3. Generalization Focused on Hierarchy. The conversion
hierarchy from an EER schema to a NoSQL document logical of a generalization type G proceeds as follows:
schema. We also detail the function convertHierarchies of 1. given an entity Esp defined as the G superclass, generate a
Algorithm 1, that selects the suitable rule to be applied on each block bsp and if (G is a disjoint generalization) then generate a
occurrence of EER generalization type. disjointness constraint. The attributes of Esp are defined as
attributes of bsp;
4.1.1. Conversion Rules 2. given the set of Esp subclasses {Esb1, Esb2, ..., Esbn}, for each
Three alternatives are provided to convert generalization types Esbi  {Esb1, Esb2, ..., Esbn} do:
inspired by the relational logical design methodology [1]. The if (Esbi was not converted) then generate a block bsbi and a
difference among these alternatives is given by the different size hierarchical relationship from bsp to bsbi where the occurrence of
of a NoSQL document schema that each one generates, and the bsbi in bsp is defined as ([0-1],[1]), depending on the completeness
constraints on generalization types they are able to support. constraint of G (total or partial)
The conversion strategy defined by Rule 1 generates only one else given bsbi the block that represents Esbi, generate an
block from a generalization hierarchy. The block represents the reference attribute rasbi in bsbi which refers to bsp identifier, and
superclass and its attributes, as well as the attributes of its define bsp as a referenced block.
subclasses. Subclasses’ attributes are defined as optional in the
content model of the superclass block. On applying this rule, we Function 1 convertHierarchies
assume that the subclasses’ attributes will act as discriminating Input: An EER Schema with load data information Ɛ;
attributes to identify an instance of a subclass in the NoSQL The Minimal Access Frequency of Ɛ (MAF)
documents. The subclasses previously converted (marked) Output: A set of blocks H’ of an NoSQL logical schema
become an optional inner block of the block generated by this
rule. H  the list of generalization types of Ɛ;
Rule 1. Generalization Focused on Superclass. The conversion H’  sort H so that the generalization types at the bottom of the
of a generalization type G proceeds as follows: hierarchy with superclasses that have highest GAF appear first;
1. given an entity Esp defined as the G superclass, generate a for each hi  H’ (1  i  n) with superclass Esp and
block bsp. The attributes of Esp become attributes of bsp; subclasses{Esb1,.., Esbn} do
2. given the set of Esp subclasses {Esb1, Esb2, ..., Esbn}, for each if ( converted subclasses in hi) AND (all subclasses in hi
Esbi  {Esb1, Esb2, ..., Esbn} do: have GAF < MAF) AND ( subclasses with more than one
if (Esbi was not converted) then define the attributes of Esbi superclass) AND ( subclasses defined as referenced block)
as optional attributes in bsp then
else given bsbi the block that represents Esbi, generate a Apply Rule 1 and mark as converted all the subclasses
hierarchical relationship from bsp to bsbi where the occurrence of of hi
bsbi in bsp is defined as [0..1]. else if (GAF(Esp) < MAF) AND ( subclasses with more
than one superclass) then
The main restriction to the application of Rule 1 occurs when Apply Rule 2 and mark Esp as converted
one of the subclasses is defined as a referenced block. A else
referenced block is an entity that was previously processed and Apply Rule 3 and mark as converted all the subclasses
defined as referenced by another block. This restriction guarantees of hi
that the referenced block will be a root block, avoiding that this end if
root block be further converted to a inner block of other block. end for
The alternative defined by Rule 2 generates only NoSQL return H’
document blocks for the subclasses, and the superclass attributes
are reproduced into each subclass block.
4.1.2. Conversion Function Finally, the Rule 6 generates independent blocks for each entity
The function convertHierarchies (Function 1) is responsible to of a relationship type and reference relationships are established
choose the appropriate rule for converting each generalization among the generated blocks.
type of a conceptual schema. A generalization type is converted
by analyzing the load data and the constraints of the Rule 5. Relationship Modeled as a Hierarchy. Given a 1:N
generalization hierarchy. The function establishes a conversion relationship type R which relates the entities E1 and E2, the
order in which the entities involved in a generalization hierarchy conversion of R proceeds as follows:
must be converted. A bottom-up conversion is performed when 1. generate a block bE1 for representing E1 and define the
there is a multiple-level hierarchy, i.e., the entities are converted attributes of E1 as attributes in bE1;
from the bottom to the top of the hierarchy. Besides, when there is 2. generate a block bE2 for representing E2 as a nested block of
a multiple-inheritance case, the superclass with the highest bE1. The occurrence of bE2 in bE1 depends on the participation of
General Access Frequency (GAF) has high priority. It means that E1 in R (optional or mandatory);
the superclass that is most frequently accessed becomes the parent 3. define the attributes of E2 and R as attributes in bE2.
block of a block that represents the subclass with more than one Rule 6. Relationship Modeled as References. Given a
superclass. In this case, the remaining superclasses are referenced relationship type R and the set of entities {E1, E2, .., En} related by
by reference attributes as defined in Rule 3. In fact, generalization R, the conversion of R proceeds as follows:
types involved in multiple-inheritance cases are always converted 1. for each Ei  R (1  i  n) do: generate a block bEi and
by Rule 3. define the attributes of Ei as attributes in bEi;
Once the conversion order of the generalization types is 2. if (R is a binary relationship without attributes) AND (the
established, we apply the conversion rules for generalization types
participation of E1 in R is defined as ([0-1],1)) then generate a
(Rule 1, 2 or 3) and verify the preconditions of each one, so that
reference attribute in bE1 referring to the identifier of bE2, and
the rules that generate the smallest NoSQL document logical
define bE2 as a referenced block
fragment are verified first, as illustrated in Function 1. For Rule 1
else
and Rule 2, we verify if the GAF of the entities that will be
3. generate a block bR as a nested block of bE1 and define
omitted is lower than the Minimal Access Frequency (MAF). If
the attributes of R as attributes in bR;
the GAF is higher than MAF, it means that these entities
4. for each Ei  R (1  i  n) do: generate a reference
participate in frequent operations and the distinction between
attribute in bR referring to the identifier of bEi, and define bEi as a
superclass and subclasses must be preserved. The last option to
referenced block.
convert generalization types is Rule 3.

4.2 Relationship Types Conversion Function 2 convertRelationships

A relationship type is a common conceptual construct which Input: An EER Schema with load data information Ɛ;
establishes a correspondence among two or more entities [1]. The The Minimal Access Frequency of Ɛ (MAF);
A set of blocks H of an NoSQL logical schema generated
cardinality of a relationship type is the main constraint that is
by convertHierarchies
considered on the conversion to a NoSQL document logical
Output: A set of root blocks R’ of an NoSQL logical schema
structure. Our rules for converting EER relationship types also
proceed from logical design of traditional data models. In this R  the list of relationship types of Ɛ;
section, we present these rules and their constraints, as well as the R’  sort R so that the relationship types with the highest GAF
function that controls their execution. appear first;
for each ri  R’ do
4.2.1. Conversion Rules ri  the first unconverted relationship of R’;
We define three conversion rules that deal with specific ES  the set of entities {Eh,.., En} related by ri ;
constraints for relationship types. Rule 4 is applied only to 1:1
if (ri is binary) AND ( an unconverted entity Ei  ES with
relationships, Rule 5 regards 1:N relationships, and Rule 6 is
participation (1,1) in ri) then
applied to relationships with cardinality N:N, n-ary ones with n >
E2 is Ei and E1 is the another entity of ri
2, or in cases where 1:1 and 1:N relationships cannot be treated by
else
rules 4 and 5, respectively.
E1 is the entity that has the highest GAF in ri
Rule 4 generates only one block to represent the relationship
end if
type and its related entities.
if (ri is 1:1) AND (the participation of E2 is (1,1) ) AND (E2 is
Rule 4. Relationship Modeled as One Block. The conversion of unconverted) then
a relationship type R proceeds as follows: Apply Rule 4 (H) and mark E2 as converted
given a 1:1 relationship type R which relates the entities E1 and else if (ri is binary) AND (the participation of E2 is (1,1) )
E2, generate a block bE1 and define the attributes of E1, E2 and R AND (E2 is unconverted) AND (E2 is not defined as referenced
as attributes in bE1. block) AND (E1  E2) then
Apply Rule 5 (H) and mark E2 as converted
Rule 5 generates blocks for each related entity, where one of else
them is converted to a nested block of the other one, and the Apply Rule 6 (H)
relationship attributes are appended to the nested block. To end if
guarantee that the referenced block will be a root block, Rule 5 Mark ri as converted
cannot be applied to relationship types in which the entity with end for
participation (1,1) was previously defined as a referenced block. return R’
4.2.2. Conversion Function appear first in the list. Then, the remaining entities are added to
The function convertRelationships (Function 2) controls the the end of EL, so that the entities that have a higher number of
execution of the conversion rules for relationship types of an EER relationships appear first. The final list obtained for our case study
schema. It orders the relationship types so that relationships with is EL = {Customer, Order, Product, Category, Carrier, Item,
the highest GAF appear first. This order is established to give Supplier, CreditCard, Payment, Bill, Person}.
priority to the relationships that represent the largest impact on the It is important to notice that the generation of the conventional
application workload. Then, if there is more than one nesting and optimized schemas has the same goal, which is to generate
possibility for an entity type giving all the relationship types in compact and redundancy-free schemas and define appropriate
which it participates, we process this entity by considering the representations in the NoSQL document logical model. The main
relationship with the highest GAF first. This order ensures that difference between the conversion processes is that the optimized
relationship types involving associative entities are converted schema is generated based on the consideration of workload
after the internal relationship types of these associative entities. information to select the appropriate conversion rules.
For converting a relationship type, we first determine what The number of instances from the original e-commerce
entity will be the entity on the top of the hierarchy in the NoSQL application dataset was used to measure the volume of data in our
document logical schema. If the relationship type is binary and case study, i.e., the average number of instances of the entities and
there is an entity with participation (1,1) in the relationship, the relationships as well as the average cardinality of the entities in
top entity is the other entity of the relationship type. For other each relationship type. We omit most of the attributes to simplify
cases, the top entity is the entity type with the highest GAF in the the schema readability. The volume of data for the conceptual
relationship, i.e., we assume that the relationship type is more schema of the application is also shown in Figure 4.
frequently accessed through this entity for the considered We also obtained the main operations that comprise the
operations. application workload. They were provided by an expert user
In the next section, we evaluate our approach with a case study application considering the concepts (entity and relationship
in the e-commerce domain. types) defined in the conceptual schema. The third column of
Table 3 presents the operation load in terms of access frequency
5. EXPERIMENTAL EVALUATION for each concept of the conceptual schema.
We evaluate our approach with an experiment in the e-commerce The GAF of each concept involved in the operations on the
domain. Our intention here is to validate our conversion conceptual schema was also measured. They are shown in Table
methodology, exemplify the usage of our process and show its 4. We omit the GAF of the concepts that are not accessed by the
positive effects, in terms of processing time, on considering the considered operations. The conventional and optimized NoSQL
application workload. In fact, we show here that our method can document logical schemas generated by our approach are shown
improve query performance on NoSQL documents by reducing in Figure 5 and Figure 6, respectively.
the number of access to the NoSQL database. Experimental The EER schema, the volume of data, a set of operations and its
settings and results are presented in the following. average frequencies are given as input for our conversion process.
The volume of data is included in the conceptual schema and the
5.1 Experiment Settings operations are shown in Table 3. The application load was
We perform a reverse engineering from a real e-commerce measured over the conceptual schema according to Definition 2,
application dataset. The resulting conceptual schema is presented while GAF was generated according to Definition 3.
in Figure 4. Then, we apply our conversion process twice over the In order to obtain MAF measure, we considered that the set of
conceptual schema obtained from this reverse engineering operations produces 80% of the load. Thus, the total volume
process. In the first time, we do not consider workload generated on the conceptual schema (651990 daily accesses)
information, and the generated schema was called conventional represents 80% of the total of accesses, which can be performed
schema. In the second time, we apply our complete conversion by the application over the conceptual types. We assume that
process and the generated schema was called optimized schema. 1.15% is given as a parameter and denotes MAF in percentage
For the generation of the conventional schema, small changes in value. Such a percentage is applied over the total volume and we
convertHierarchies and convertRelationships functions were obtain 9372 accesses as MAF value.
required. In the function convertHierarchies, instead of In the following, we measure and compare the access frequency
considering GAF and MAF, we verify the existence of subclasses generated by each concept by executing the operations on the
and superclasses relationships for the application of the Rules 1 conventional and optimized schemas. The access frequency
and 2, respectively. Rule 1 assumes that the explicit distinction generated by the schemas is shown in the two last columns of
between subclasses is irrelevant for most instances of the Table 3. In order to evaluate the effects of the query processing on
superclass. The existence of relationships involving subclasses is these schemas, the operations were performed on compliant
the main constraint for the application of this rule. Rule 2 is not NoSQL documents generated and stored in the NoSQL document-
considered for cases where the superclass's relationships must be oriented database MongoDB2. We develop a Java application to
converted into relationships with each one of the subclasses. produce JSON documents defined by each collection for both
For function convertRelationships, we modify the relationship schemas. For each schema, we generate a set of documents with
types ordering: instead of comparing GAF to perform the order, the same volume of data defined by Figure 4. The tests were
we use the concept of fully functional closures [17], that carried out in a processor Core i7 2.40 GHz with 8 GB of
determines the list of entities that can be reached from a starting memory, 1 TB of disk and Windows 8.1 Pro. The MongoDB-shell
entity through relationship pathways determined by participation query specifications were defined according to the structure of the
(1,1) of entities in the relationships. After identifying the fully schemas and the sequence of accesses for the operations as
functional closures of the entities of the conceptual schema, a list
of entities (EL) is generated. These entities must be ordered in EL 2
so that the participating entities on more fully functional closures A document-oriented database (mongodb.org).
presented in Table 3. We use the trial version of NoSQL Manager Next section presents and discusses the experiment results.
for MongoDB Professional tool to execute the queries.

Figure 4. EER schema for an e-commerce application.

Figure 5. The conventional NoSQL document logical schema. Figure 6. The optimized NoSQL document logical schema.
Table 3. Operations on schemas union type involving the subclass Payment. This rule was
Concept Access Frequency considered because the GAF of the superclasses is lower than the
Conceptual Conventional Optimized
assumed MAF. Besides, for the relationship commitment, the
# associative entity Sale was represented on Order block content, as
schema logical logical
schema schema the GAF of Order entity (159685) is higher than Payment entity
Order 1,500 1,500 1,500 (1200). Due to it, the function convertRelationships chooses
request 1,500 - - Order as E1 entity and Rule 4 was processed.
Customer 1,500 1,500 1,500 In short, the main difference among the produced logical
composite 2,475 - - schemas is the representation for the Payment union type in
O1
Item 2,475 2,475 2,475 optimized schema, which was nested to the Order block. Thus, the
reference 2,475 - - optimized schema has fewer collections and reference attributes
Product 2,475 2,227,500 2,227,500 than the conventional one.
Subtotal: 14,400 2,232,975 2,232,975 These different representations generate different access
Order 900 900 900 frequencies for the operations O2 and O4, as is shown in Table 3.
request 900 - - In O2, the conventional schema generates 706,500,000 accesses
O2
Customer 900 900 900 on the block Payment because it is necessary to compare the 900
commitment 900 900 - values of the reference attribute in commitment with all the
Payment 900 706,500,000 900 instances of Payment block (785,000). It does not occur on
Subtotal: 4,500 706,502,700 2,700
performing O2 on the optimized schema because the Payment
Customer 450 450 450
block is represented in the Order block. In this case, only 900
request 13,185 - -
Order 13,185 13,185 13,185 accesses are necessary to achieve the Payment content in O2.
O3 In practice, the impact of these different structures is evaluated
delivery 13,185 13,185 13,185
Carrier 13,185 210,960 210,960 by measuring the query processing time on both schemas at
Subtotal: 53,190 237,780 237,780 MongoDB NoSQL document DB. The operations were performed
Customer 300 300 300 on the compliant NoSQL documents, as stated before, and the
commitment 300 8,790 - results are presented in Figure 7. The results are presented in two
O4
Payment 300 6,900,150,000 8,790 ways: (i) one execution, and (ii) accumulated execution.
Subtotal: 900 6,900,159,090 9,090
Product 100 100 100
reference 144,100 - -
Item 144,100 129,700,000 129,700,000
O5
composite 144,100 - -
Order 144,100 129,700,000 129,700,000
Subtotal: 576,500 259,400,100 259,400,100
Supplier 100 100 100
furnishing 600 90,000 90,000
O6
Product 600 90,000 90,000
catalog 600 - 90,000
Category 600 5,400,000 5,400,000
Subtotal: 2,500 5,580,100 5,670,100
Total: 651,990 7,874,112,745 267,552,745

Table 4. GAF of concepts of Figure 4

Concept GAF Concept GAF
Order 159,685 composite 146,575
Item 146,575 reference 146,575
Carrier 13,185 request 15,585
Product 3,175 delivery 13,185
Customer 3,150 commitment 1,200 Figure 7. Operation processing time in seconds.
Payment 1,200 catalog 600
Category 600 furnishing 600 Table 5. Operations processing total time in seconds
Supplier 100
Execution Conventional Optimized
One 12.95 12.02
5.2 Result Analysis Accumulated 5095.25 4404.20
On analyzing the last line of Table 3, we verify that the Total d
Access Frequency increases considerably on comparing the Figure 7 presents the response time in seconds for the execution
number of accesses generated by the optimized and the of each of the six operations on JSON documents conformed to
conventional schemas. Basically, such an increase is due to extra the conventional and optimized schemas. For each schema is
blocks and reference relationships generated in the conventional shown the spent time in a single run (one execution) of a query
schema to represent some conceptual relationships in the NoSQL and the daily spent system time (accumulated execution) to run a
document schemas. The main reason for this block reduction in query considering its daily frequency. For example, for
the optimized schema is that Rule 1 was applied to convert the conventional schema, a single query O1 shows the response time
of 1525 seconds. On considering the frequency of 1500 times a [5] Cattell, R. 2010. Scalable SQL and NoSQL Data Stores.
day, the daily occupation system time to perform this accumulated SIGMOD Record, volume 39 (4), pages 12–27, 2010.
operation is 2287.5 seconds (1525 * 1500). [6] Choi, M., Lim, J. and Joo, K. 2003. Developing a Unified
The results shows that optimized schema had generated query Design Methodology based on Extended Entity-Relationship
processing times close to the ones generated for the conventional Model for XML. In ICCS 2003, pages 920–929, 2003.
schema for some operations. However, the cost to perform O2 on
conventional schema is notoriously higher because it is necessary [7] Elmasri, R., James, S. and Kouramajian, V. 1993. Automatic
to retrieve Payment of Order block through value joins on a Class and Method Generation for Object-Oriented Databases.
reference relationship. This result demonstrates the positive effect In DOOD 1993, Springer LNCS 760, pages 395-414, 1993.
of avoiding the value joins. The daily total accumulated execution [8] Elmasri, R., Wu, Y, Hojabri, B., Li, C. and Fu, J. 2002.
of operations performed on NoSQL documents, as shown in Table Conceptual Modeling for Customized XML Schemas. In ER
5, demonstrates that the optimized schema produces a better 2002, pages 429–443, 2002.
response time. It raises the relevance of considering workload
[9] Elmasri, R., Navathe, S. B. 2011. Fundamentals of Database
information.
Systems. Pearson Addison Wesley, 2011.
6. CONCLUSION [10] Evans, E. 2003. Domain-Driven Design: Tackling
Indeed, NoSQL DBs are suitable solutions for data management Complexity in the Heart of Software. Addison-Wesley, 2003.
in the Web as well as in the cloud, and an associated data model [11] Fong, J. 1995. Mapping Extended Entity-Relationship Model
allows the definition of better strategies for persistence and to Object Modeling Technique. SIGMOD Record, volume
manipulation of such a data in the target DB. In this context, the 24 (3), pages 18-22, 1995.
aggregate-based logical representation is a tendency in related
work, providing support to scalability and consistency, as they are [12] Fong, J., Fong, A., Wong, H. K. and Yu, P. 2006.
a natural unit for sharding and atomic manipulation of data in Translating Relational Schema with Constraints into XML
distributed environments. Schema. In International Journal of Software Engineering
This paper presents an approach for logical design of NoSQL and Knowledge Engineering, volume 16, pages 201–244,
document DB schemas based on a conceptual schema and 2006.
workload information. NoSQL document DBs are an appropriate [13] Helland, P. 2007. Life beyond distributed transactions: an
category for Web and cloud applications that provide dynamic apostate’s opinion. In CIDR 2007, pages 132–141, 2007.
queries execution, horizontal scalability and high availability. In
[14] Jovanovic, V., Benson, S. 2013. Aggregate Data Modeling
our proposal, the estimated volume of data and workload
Style. In SAIS 2013, pages 70-75, 2013.
information are considered to generate optimized NoSQL
document structures in terms of the main application operations [15] Kaur, K. and Rani, R. 2013. Modeling and Querying Data in
and their frequency. NoSQL Databases. IEEE. In International Conference on
We evaluate our approach through an experimental evaluation Big Data, pages 1-7, 2013.
for an e-commerce application domain, being the data stored in [16] McMurtry, D., Oakley, A., Sharp, J., Subramanian, M.,
the MongoDB NoSQL document DB. The results demonstrate Zhang, H. 2013. Data Access for Highly-Scalable Solutions:
that our workload-based conversion process improves query Using SQL, NoSQL, and Polyglot Persistence. Microsoft,
performance on NoSQL documents by reducing the number of 2013. Available in: <http://www.microsoft.com/en-
DB accesses. us/download/details.aspx?id=40327>. Accessed on March of
As future work, we intend to evaluate the application of our 2014.
process over a larger volume of data in a distributed environment,
as well as to consider the NoSQL DB physical design, including [17] Mok, W. Y., E. D. W. and Rani, R. 2006. Generating
the definition of indexes. compact redundancy-free xml documents from conceptual-
We also consider the comparison of our approach with a model hypergraphs. In IEEE Transactions on Knowledge
baseline in order to evaluate application performance for the and Data Engineering, volume 18, pages 1082–1096, 2006.
logical schemas generated by each one of them. It depends on the [18] Nachouki, J., Chastang, M.P. and Briand, H. 1991. From
availability of detailed conversion algorithms by related work, Entity-Relationship Diagram to an Object-Oriented
which were not found by the time of this paper writing. Database. In ER 1991, pages 459-482, 1991.
[19] Narasimhan, B., Navathe, S. and Jayaraman, S. 1993. On
7. REFERENCES Mapping ER and Relational Models onto OO Schemas. In
[1] Batini, C., Ceri, S. and Navathe, S. B. 1992. Conceptual
ER 1993, pages 402–413, 1993.
Database Design: An Entity-Relationship Approach.
Benjamin/Cummings, 1992. [20] Sadalage, P. J. and Fowler, M. J. 2013. NoSQL Distilled.
Addison-Wesley, 2013.
[2] Bird, L., Goodchild, A. and Halpin, T. 2000. Object Role
Modeling and XML-Schema. In ER 2000, pages 661–705, [21] Schroeder, R. and Mello, R. S. 2008. Improving Query
2000. Performance on XML Documents: A Workload-Driven
Design Approach. In DocEng 2008, pages 177-186, 2008.
[3] Biskup, J., Menzel, R. and Polle, T. 1995. Transforming an
Entity-Relationship Schema into Object-Oriented Database [22] Schroeder, R., Duarte, D. and Mello, R.S. 2011. A
Schemas. In ADBIS 1995, pages 109–136, 1995. workload‐aware approach for optimizing the XML schema
design trade‐off. In iiWAS 2011, pages 12-19, ACM, New
[4] Bugiotti, F., Cabibbo, L., Atzeni, P. and Torlone, R. 2014.
York, NY, 2011.
Database Design for NoSQL Systems. In ER 2014, pages
223-231, 2014.

View publication stats

Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Architect Design and Process Workbook
No ratings yet
Google Cloud Architect Design and Process Workbook
23 pages
Mastering ScyllaDB: High-Performance NoSQL with C++
From Everand
Mastering ScyllaDB: High-Performance NoSQL with C++
Robert Johnson
No ratings yet
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
SQL and NoSQL: Building Hybrid Data Solutions for Modern Applications
From Everand
SQL and NoSQL: Building Hybrid Data Solutions for Modern Applications
Robert Johnson
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Advanced SQL Queries: Writing Efficient Code for Big Data
From Everand
Advanced SQL Queries: Writing Efficient Code for Big Data
Robert Johnson
5/5 (2)
SQL Fundamentals for New Developers: A Practical Guide with Examples
From Everand
SQL Fundamentals for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Learn SQL in 24 Hours
From Everand
Learn SQL in 24 Hours
Alex Nordeen
5/5 (4)
Mastering Database Design
From Everand
Mastering Database Design
Ted Noreux
No ratings yet
Database Design with SQL: Building Fast and Reliable Systems
From Everand
Database Design with SQL: Building Fast and Reliable Systems
Robert Johnson
No ratings yet
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
From Everand
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
Kameron Hussain
No ratings yet
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
From Everand
SQL Made Easy: Tips and Tricks to Mastering SQL Programming
Ryan Campbell
No ratings yet
Structured Query Language Simplified: Efficient and Effective Database Management
From Everand
Structured Query Language Simplified: Efficient and Effective Database Management
Angela White
No ratings yet
Learn SQL: Database Management Basics
From Everand
Learn SQL: Database Management Basics
Kiet Huynh
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Advanced Database Architecture: Strategic Techniques for Effective Design
From Everand
Advanced Database Architecture: Strategic Techniques for Effective Design
Adam Jones
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
From Everand
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
Olga Maria Stefania Cucaro
No ratings yet
Exploring the Fundamentals of Database Management Systems: Business strategy books, #2
From Everand
Exploring the Fundamentals of Database Management Systems: Business strategy books, #2
SANJIVAN SAINI
No ratings yet
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Advanced SQL Performance Tuning: Optimize Your Database Workloads
From Everand
Advanced SQL Performance Tuning: Optimize Your Database Workloads
Robert Johnson
No ratings yet
RDBMS In-Depth: Mastering SQL and PL/SQL Concepts, Database Design, ACID Transactions, and Practice Real Implementation of RDBM (English Edition)
From Everand
RDBMS In-Depth: Mastering SQL and PL/SQL Concepts, Database Design, ACID Transactions, and Practice Real Implementation of RDBM (English Edition)
Dr. Madhavi Vaidya
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Azure Data Demystified: From SQL to Synapse
From Everand
Azure Data Demystified: From SQL to Synapse
Kameron Hussain
No ratings yet
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
From Everand
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
Aman Dhingra
No ratings yet
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Mastering PrestoDB: Fast SQL Analytics at Scale
From Everand
Mastering PrestoDB: Fast SQL Analytics at Scale
Robert Johnson
No ratings yet
SQL Mastermind: Unleashing the Power of Advanced Database Programming
From Everand
SQL Mastermind: Unleashing the Power of Advanced Database Programming
Ryan Campbell
2/5 (1)
Learning SQL: Master SQL Fundamentals
From Everand
Learning SQL: Master SQL Fundamentals
Kiet Huynh
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Dzhakishev Master
No ratings yet
Dzhakishev Master
102 pages
PABS Final
No ratings yet
PABS Final
7 pages
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
SQL Database Mastery: Advanced Techniques for Database Management
From Everand
SQL Database Mastery: Advanced Techniques for Database Management
Adam Jones
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
From Everand
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
Kiet Huynh
No ratings yet
Sample Project
No ratings yet
Sample Project
36 pages
NoSQL Essentials: Navigating the World of Non-Relational Databases
From Everand
NoSQL Essentials: Navigating the World of Non-Relational Databases
Kameron Hussain
No ratings yet
NoSQL Database For Software
No ratings yet
NoSQL Database For Software
49 pages
The Art of SQL: Crafting Robust Database Solutions
From Everand
The Art of SQL: Crafting Robust Database Solutions
Richard Evans
No ratings yet
A Framework For Migrating Relational Datasets To NoSQL
No ratings yet
A Framework For Migrating Relational Datasets To NoSQL
10 pages
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Jump Start MySQL: Master the Database That Powers the Web
From Everand
Jump Start MySQL: Master the Database That Powers the Web
Timothy Boronczyk
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
From Everand
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
Vladimir Pantic
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Mastering MySQL Foundations: Insights, Internals, and Advanced Techniques
From Everand
Mastering MySQL Foundations: Insights, Internals, and Advanced Techniques
Robert Johnson
No ratings yet
NoSQL_Quiz_Answers
No ratings yet
NoSQL_Quiz_Answers
5 pages
OMAC Data Analyst
No ratings yet
OMAC Data Analyst
91 pages
Assignment - Da PDFFF
No ratings yet
Assignment - Da PDFFF
5 pages
Top MongoDB Interview Q&A
No ratings yet
Top MongoDB Interview Q&A
14 pages
UNIT 2 Notes by ARUN JHAPATE
No ratings yet
UNIT 2 Notes by ARUN JHAPATE
22 pages
Database Vs Data Warehouse Vs Data Lake
No ratings yet
Database Vs Data Warehouse Vs Data Lake
12 pages
Solution Path For Implementing A Comprehensive Architecture For Data and Analytics Strategies
No ratings yet
Solution Path For Implementing A Comprehensive Architecture For Data and Analytics Strategies
25 pages
Introduction To SQL
No ratings yet
Introduction To SQL
11 pages
Slide Sharvin's and Shashi Part
No ratings yet
Slide Sharvin's and Shashi Part
8 pages
Operation Analytics
No ratings yet
Operation Analytics
2 pages
Experment 5 AIM: To Draw The Structural View Diagram For The ATM: Class Diagram, Object Diagram
No ratings yet
Experment 5 AIM: To Draw The Structural View Diagram For The ATM: Class Diagram, Object Diagram
6 pages
Question Bank BDA CCS334
No ratings yet
Question Bank BDA CCS334
12 pages
Nosql Flowchart PDF
No ratings yet
Nosql Flowchart PDF
1 page
Statement of Purpose Jemima
No ratings yet
Statement of Purpose Jemima
4 pages
Lab06 Insert Document
No ratings yet
Lab06 Insert Document
2 pages
WWW Oracle Com Database What Is Database
No ratings yet
WWW Oracle Com Database What Is Database
3 pages
Gamechangers Adopting Scylladb
No ratings yet
Gamechangers Adopting Scylladb
11 pages
White-paper-KronoGraph
No ratings yet
White-paper-KronoGraph
13 pages
Little Riak Book
No ratings yet
Little Riak Book
105 pages
DBMS Journal Guidelines
No ratings yet
DBMS Journal Guidelines
7 pages
Be Computer Engineering Semester 5 2023 November Database Management Systems Dms Pattern 2019
No ratings yet
Be Computer Engineering Semester 5 2023 November Database Management Systems Dms Pattern 2019
2 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
6 pages
Data Science: Executive PG Programme in
No ratings yet
Data Science: Executive PG Programme in
32 pages
DP 200 - LA Practice Test
No ratings yet
DP 200 - LA Practice Test
27 pages
Excessive Privileges: Atabase Hreats
No ratings yet
Excessive Privileges: Atabase Hreats
6 pages
MODULE 3 (3)
No ratings yet
MODULE 3 (3)
14 pages
Cloud OnBoard Fundamentals Deck
100% (1)
Cloud OnBoard Fundamentals Deck
196 pages
M4 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
No ratings yet
M4 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
61 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages

A Workload-Driven Logical Design Approach For NoSQL Document Databases

Uploaded by

A Workload-Driven Logical Design Approach For NoSQL Document Databases

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A workload-driven logical design approach for NoSQL document databases

Conference Paper · December 2015

Cláudio Lima Ronaldo Mello

SEE PROFILE SEE PROFILE

SQLtoKeyNoSQL View project

The user has requested enhancement of the downloaded file.

ABSTRACT challenges for data management in the cloud, including how to

3.2 NoSQL Document Logical Model

4.2 Relationship Types Conversion Function 2 convertRelationships

Figure 4. EER schema for an e-commerce application.

Table 4. GAF of concepts of Figure 4

View publication stats

You might also like