0% found this document useful (0 votes)
42 views

2 Data Models

This document provides an overview of data modeling and different data models. It discusses key concepts of data modeling including data abstraction and database models. It describes different types of data models such as object-based models, physical models, and record-based models. Specifically, it covers hierarchical, network, and relational database models. The document emphasizes that data modeling is important as it provides a blueprint for building a database and ensures all required data is represented accurately.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

2 Data Models

This document provides an overview of data modeling and different data models. It discusses key concepts of data modeling including data abstraction and database models. It describes different types of data models such as object-based models, physical models, and record-based models. Specifically, it covers hierarchical, network, and relational database models. The document emphasizes that data modeling is important as it provides a blueprint for building a database and ensures all required data is represented accurately.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

DATA MODELING

BY MR. D SINYANGWE

Zambia ICT College


• One fundamental characteristic of the
database approach is data abstraction.
• Data abstraction refers to the suppression of
details of data organization and storage, and
the highlighting of the essential features for
an improved understanding of data.
• It supports data abstraction so that different
users can perceive data at their preferred level
of detail.
DATABASE MODEL
• Database model or data model is an
integrated collection of concepts for
describing and manipulating data,
relationships between data and constraints on
the data
• A data model-a collection of concepts that can
be used to describe the structure of a
database-provides the necessary means to
achieve this abstraction.
• A model is a representation of real-world
objects and event and their associations.
• It should provide the basic concepts and
notations that will allow database designers
and end-users to communicate
unambiguously and give them an accurate
understanding of the organizational data.
• A data model can be thought of comprising
three components:
• A structural part, consisting of set of rules
according to which databases can be
constructed;
• A manipulative part, defining the types of
operations that are allowed on the data;
• A set of integrity constraints, which ensures
that data is accurate.
Why is Data Modeling Important?
• Data modeling is probably the most labor
intensive and time consuming part of the
development process. Why bother especially
if you are pressed for time?
• It is a blueprint
• The goal of the data model is to make sure
that the all data objects required by the
database are completely and accurately
represented. Because the data model uses
easily understood notations and natural
language, it can be reviewed and verified as
correct by the end-users.
• The data model is also detailed enough to be
used by the database developers to use as a
"blueprint" for building the physical database.
• Therefore, a data model is a plan for building a
database. To be effective, it must be simple
enough to communicate to the end user the
data structure required by the database yet
detailed enough for the database design to
use to create the physical structure.
Classification of Data models
• Object Based Data Models
• Physical Data Models
• Record Based Data Models
Object Based Data Models

• OOP is based on Encapsulation and


Inheritance.
• A program in general consists of data and
code that operates on the data.
• Object-oriented programming encapsulates in
an object some data and function to operate
on the data
• Class: An entity that has a well-defined role in
the application domain, as well as state,
behavior, and identity
• Object: a particular instance of a class
• Extensibility refers to the ability to extend an
existing system without introducing changes
to it.
Benefits of object-oriented model
• The programmer need only understand OO
concepts as opposed to the combination of
OO concepts and relational database storage.
• Objects can inherit property settings from
other objects.
• Much of the application program process is
automated.
• It is theoretically easier to manage objects.
• OO data model is more compatible with OO
programming tools.
Drawbacks of object-oriented model
• Users must learn OO concepts because the OO
database does not work with traditional
programming methods.
• Standards have not been completely
established for the evolving database model.
• Stability is a concern because OO databases
have not been around for long.
Physical Data Models

• Physical data models describe how data is


stored as files in the computer by representing
information such as record formats, record
orderings, and access paths. An access path is
a structure that makes the search for
particular database records efficient.
Record based-data models

• In record based model, the database consists


of a number of fixed format records. Each
record type defines a fixed number of fields,
typically of a fixed length.
Types of record based models
• Flat-file database model
• Hierarchical database model
• Network database model
• Relational database model
Flat-file database model
• Before vendors started developing database
management systems that run on a computer,
many companies that were using computers
stored their data in flat files on a host
computer.
• A flat-file database consists of one or more
readable files, normally stored in a text
format. Information in these files is stored as
fields, the fields having either a constant
length or a variable length separated by some
character (delimiter).
• The main problem with a flat-file database
system is that not only do you have to
understand the structure of the files, but also
you must know exactly where data is
physically stored.
• Managing data relationships very difficult
Drawbacks of a flat-file database
• Flat files do not promote a structure in which
data can easily be related.
• It is difficult to manage data effectively and to
ensure accuracy.
• It is usually necessary to store redundant data,
which causes more work to accurately maintain
the data.
• The physical location of the data field within the
file must be known.
• A program must be developed to manage the
data.
The Hierarchical Model
• Hierarchical data is a collection of data where
each item has a single parent and zero or
more children (with the exception of the root
item, which has no parent).
• Each child table has a single parent table, and
each parent table can have multiple child
tables.
• Child tables are completely dependent on
parent tables; therefore, a child table can exist
only if its parent table does.
• The result of this structure is that the
hierarchical database model supports one-to-
one or one-to-many relationships.
• In a hierarchical data model, data is organized
into a tree-like structure. The tree structure
allows repeating information using
parent/child relationships.
One of the problems with this hierarchical layout
is that redundant book title information would
have to be stored in the Inventory table because
there is no direct association between Authors
and Bookstores.
• A node that has no children is called a leaf.
Nodes that are children of the same parent
are called siblings. For any node, there is a
single path, called the hierarchical path, from
the root to that node.
• In this model, each node or segment contains
one or more fields or data items that
represent attributes describing an entity.
• There are three types of fields in the
segments:
• A sequence field (or key field) is a field that
identifies and provides access to segments in
a database. In a root segment, the sequence
field also uniquely identifies the record. In
dependent segments, the sequence field can
provide unique identification, but this is not
required.
• A search field is used to search through the
database for particular values.
• An undefined field. All fields other than
sequence fields and search fields do not have
to be defined.
Advantages of Hierarchical
• Simplicity
• Security
• Database Integrity
• Efficiency
Disadvantages of Hierarchical
• Complexity of Implementation
• Difficulty in Management
• Complexity of Programming
• Poor Portability
• Operational Anomalies
Network Database Model
• Improvements were made to the hierarchical
database model in order to derive the network
model. As in the hierarchical model, tables are
related to one another.
• One of the main advantages of the network
model is the capability of parent tables to share
relationships with child tables. This means that a
child table can have multiple parent tables.
• Additionally, a user can access data by starting
with any table in the structure, navigating either
up or down in the tree. The user is not required
to access a root table first to get to child tables.
• A record is in many respects similar to an
entity in the E-R model.
• Each record is a collection of fields
(attributes), each of which contains only one
data value.
• A link is an association between precisely two
records. Thus, a link can be viewed as a
restricted (binary) form of relationship in the
sense of the E-R model.
The Network Database Model was created for
three main purposes. These purposes include:
• representing a complex data relationship
more effectively
• improving database performance
• imposing a database standard.
key terms related to the network data
model
• A node represents an object of interest.
• A link represents a relationship between two nodes. A
link may be directed (that is, have a direction) or
undirected (that is, not have a direction).
• A path is an alternating sequence of nodes and links,
beginning and ending with nodes, and usually with no.
of nodes and links appearing more than once.
(Repeating nodes and links within a path are
permitted, but are rare in most network applications.)
• A network is a set of nodes and links. A network is
directed if the links that is contains are directed, and a
network is undirected if the links that it contains are
undirected.
• A logical network contains connectivity
information but no geometric information. A
logical network can be treated as a directed
graph or undirected graph, depending on the
application.
• Cost is a non-negative numeric attribute that
can be associated with links or nodes for
computing the minimum cost path, which is
the path that has the minimum total cost from
a start node to an end node
• Reachable nodes are all nodes that can be
reached from a given node. Reaching nodes
are all nodes that can reach a given node.
• The degree of a node is the number of links to
(that is, incident upon) the node.
• Network constraints are restrictions defined
on network analysis computations.
• A spanning tree of a connected graph is a tree
(that is, a graph with no cycles) that connects
all nodes of the graph.
Link Restriction

• Only one-to-many links can be used. Many-to-


many links are disallowed to simplify the
implementation.
• If binary, many to many link is resolved using a
many-to-one link. Create a new record type
Rlink (referred to as a dummy record type).
Benefits of network database model
• Data is accessed very quickly.
• Users can access data starting with any table.
• It is easier to model more complex databases.
• It is easier to develop complex queries to
retrieve data.
Drawbacks of network database model
• The structure of the database is not easily
modified.
• Changes to the database structure definitely
affect application programs that access the
database.
• The user has to understand the structure of
the database.
Relational Database Model
• The relational database model improves on the
restriction of a hierarchical structure, not
completely abandoning the hierarchy of data.
• Another benefit of the relational database model
is that any tables can be linked together,
regardless of their hierarchical position.
• Any table can be accessed directly without having
to access all parent objects. It is based on the
concept of mathematical relations.
• Data and relationships are represented as tables,
each of which has a number of columns with
unique name. The primary unit of storage in a
database is a table, or group of related data.
• In the relational model, all data is logically structured
within relations (tables).
• It is based on the mathematical concept of a relation, which
is physically represented as a table.
• Each relation has a name and is made up of named
attributes (columns) of data. Each tuple (row) contains one
value per attribute.
• Three different types of table relationships are allowed:
one-to-one, one-to-many, and many-to-many. Different
relationships should be allowed to exist between tables in a
database.
• Table relationships are defined by referential integrity,
which suggests the use of primary key and foreign key
constraints. Referential integrity is the use of these
constraints to validate data entered into a table and manage
the relationship between parent and child tables.
Benefits of the relational model are as follows:
• Data is accessed very quickly.
• The database structure is easy to change.
• The data is represented logically, therefore users
need not understand how the data is stored.
• It is easy to develop complex queries to retrieve
data.
• It is easy to implement data integrity.
• Data is generally more accurate.
• It is easy to develop and modify application
programs.
• A standard language (SQL) has been developed.
Drawbacks of the relational database model are
as follows:
• Different groups of information, or tables,
must be joined in many cases to retrieve data.
• Users must be familiar with the relationships
between tables.
• Users must learn SQL.
Terminology for relational data model
• Relational database is a collection of normalized relations
with distinct relation names.
• Relation is a table with columns and rows. An RDBMS
requires only that the database be perceived by the user as
tables. However, this perception applies only to the logical
structure of the database.
• Table is a physical representation of a relation. A table is
the primary object used to store data in a relational database.
When data is queried and accessed for modification, it is
usually found in a table. A table is defined by columns. One
occurrence of all columns in a table is called a row of data.
• Attribute is a named column of a relation.
Relations are used to hold information about
objects to be represented in the database.
• Domain is the set of allowable values for one or
more attributes. Every attribute in a relation is
defined on a domain. The domain concept is
important, because it allows the user to define in
a central place the meaning and the source of
values that attributes can hold.
• The degree of a relation is the number of
attributes it contains. A relation with only one
attribute would have degree one and be called
a unary relation or one tuple
• The cardinality of a relation is the number of
tuples it contains.
FORMAL TERMS ALTERNATIVE 1 ALTERNATIVE 2

Relation Table File

Tuple Row Record

Attribute Column Field


• Constraint: A constraint is an object used to place rules on
data. Constraints are used to control the allowed data in a
column. Constraints are created at the column level and are
also used to enforce referential integrity (parent and child
table relationships).
• Index: An index is an object that is used to speed the
process of data retrieval on a table.
• Trigger: A trigger is a stored unit of programming code in
the database that is fired based on an event that occurs in
the database. When a trigger is fired, data might be
modified based on other data that is accessed or modified.
Triggers are useful for maintaining redundant data.
• Procedure: A procedure is a program that is stored in the
database. A procedure is executed at the database level.
Procedures are typically used to manage data and for batch
processing.
Mathematical Relations
•Suppose we have two sets, D1 and D2, where D1 = {2, 4} and
D2 = {1, 3, 5}
•The Cartesian product of these two sets, written as D1 × D2 is
the set of all ordered pairs such that the first element is a
member of D1 and the second element is a member of D2.
D1 × D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}
•Any subset of this Cartesian product is a relation. Eg we
could produce a relation R whose ordered pairs in which the
second element is 1.
R = {(2, 1), (4, 1)}
• This could be written as
Database Relations
• A named relation defined by a set of
attributes and domain name pairs is called a
relation schema.
• Let A1, A2, . . . An be attributes with domains
D1, D2, . . . , Dn.
• Then the set {A1:D1, A2:D2, . . . , An:Dn} is a
relation schema.
• A relation R defined by a relation schema S is a
set of mappings from the attribute names to
their corresponding domains.
• Thus, relation R is a set of n-tuples:
(A1:d1, A2:d2, . . . , An:dn) | d1 D1, d2 D2, . . . , dn
Dn
• When we write out a relation as a table, we
list the attribute name as column headings
and write out the tuples as rows having the
form (d1, d2, . . . , dn), where each value is
taken from the appropriate domain.
• We can think of a relation in the relational
model as any subset of the Cartesian product
of the domains of the attributes.
• A table is simply a physical representation of
such a relation.
• One of the four tuples is:
• {(B005, 22 Dorcus House, Ndola, NDL 02)}
Or more correctly:
• {(branchNo:B005, street:22 Dorcus House,
city:Ndola, postcode:NDL 02)}
• This is referred to as a relation instance. The branch
table is a convenient way of writing out all the four-
tuples that form the relation at a specific moment in
time.
• It is important to distinguish between the description
of the database and the database itself.
• The data in the database at a particular moment in
time is called a database state or snapshot. It is also
called the current set of occurrences or instances in
the database. This may change every now and then.
• The description of a database is called the database
schema, which is specified during database design and
is not expected to change frequently.
Properties of Relations
• The has a name that is distinct from all other
relation names in the relational schema;
• Each cell of the relation contains exactly one
atomic (single) value;
• Each attribute has a distinct name;
• Each tuple is distinct; there is no duplicate tuples;
• The order of attributes has no significance;
• The order of tuples has no significance,
theoretically. (However, in practice, the order
may affect the efficiency of accessing tuples.)

You might also like