Data Model
Data Model
problem domain is a clearly defined area within the real-world environment, with a well-defined scope and boundaries that will be
systematically addressed.
- is an iterative, progressive process. You start with a simple understanding of the problem domain, and as
your understanding increases, so does the level of detail of the data model.
Data model - is a relatively simple representation, usually graphical, of more complex real-world data
structures
- is an abstraction of a more complex real-world object or event.
- A model’s main function is to help you understand the complexities of the real-world environment
Data model - data model effectively is a “blueprint” with all the instructions to build a database that will
meet all end-user requirements.
- blueprint is narrative and graphical in nature, meaning that it contains both text descriptions in plain,
unambiguous language and clear, useful diagrams depicting the main data elements
- facilitate interaction among the designer, the applications programmer, and the end user
Data constitutes the most basic information employed by a system. Applications are created to manage data and to help transform data
into information, but data is viewed in different ways by different people.
Data Model Basic Building Blocks
An entity is a person, place, thing, or event about which data will be collected and stored. An entity
represents a particular type of object in the real world, which means an entity is “distinguishable”—that is,
each entity occurrence is unique and distinct.
(Ex: CUSTOMER entity would have many distinguishable customer occurrences, such as John Smith, Pedro
Dinamita, and Tom Strickland. Entities may be physical objects, such as customers or products, but entities
may also be abstractions, such as flight routes or musical concerts)
Data models use three types of relationships: one-to-many, many-to-many, and one-to-one.
Database designers usually use the shorthand notations 1:M or 1..*, M:N or *..*, and 1:1 or 1..1, respectively.
(Although the M:N notation is a standard label for the many-to-many relationship, the label M:M may also be
used.) The following examples illustrate the distinctions among the three relationships.
One-to-many (1:M or 1..*) relationship. A painter creates many different paintings, but each is painted by only
one painter. Thus, the painter (the “one”) is related to the paintings (the “many”).
Therefore, database designers label the relationship “PAINTER paints PAINTING” as 1:M.
Note that entity names are often capitalized as a convention, so they are easily identified.
Similarly, a customer (the “one”) may generate many invoices, but each invoice (the “many”) is generated by
only a single customer. The “CUSTOMER generates INVOICE” relationship would also be labeled 1:M
Many-to-many (M:N or *..*) relationship. An employee may learn many job skills, and each job skill may be
learned by many employees. Database designers label the relationship “EMPLOYEE learns SKILL” as M:N.
Similarly, a student can take many classes and each class can be taken by many students, thus yielding the
M:N label for the relationship expressed by “STUDENT takes CLASS.”
One-to-one (1:1 or 1..1) relationship. A retail company’s management structure may require that each of its
stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a
single store.
Therefore, the relationship “EMPLOYEE manages STORE” is labeled 1:1.
A constraint is a restriction placed on the data. Constraints are important because they help to ensure data
integrity. Constraints are normally expressed in the form of rules:
• An employee’s salary must have values that are between 6,000 and 350,000.
• A student’s GPA must be between 0.00 and 4.00.
• Each class must have one and only one teacher.
Business Rules
A business rule is a brief, precise, and unambiguous description of a policy, procedure, or principle within a
specific organization. In a sense, business rules are misnamed: they apply to any organization, large or
small—a business, a government unit, a religious group, or a research laboratory—that stores and uses data
to generate information.
Business rules derived from a detailed description of an organization’s operations help to create and enforce
actions within that organization’s environment
Properly written business rules are used to define entities, attributes, relationships, and constraints.
Any time you see relationship statements such as “an agent can serve many customers, and each customer can be served by only one
agent,” business rules are at work.
Examples of business rules are as follows:
• A customer may generate many invoices.
• An invoice is generated by only one customer.
• A training session cannot be scheduled for fewer than 10 employees or for more than 30
employees
Note that those business rules establish entities, relationships, and constraints.
For example, the first two business rules establish two entities (CUSTOMER and INVOICE) and a 1:M relationship between those two
entities.
The third business rule establishes a constraint (no fewer than 10 people and no more than 30 people) and two entities (EMPLOYEE and
TRAINING), and also implies a relationship between EMPLOYEE and TRAINING
The main sources of business rules are company managers, policy makers, department managers, and
written documentation such as a company’s procedures, standards, and operations manuals.
A faster and more direct source of business rules is direct interviews with end users.
The process of identifying and documenting business rules is essential to database design for several
reasons:
• It helps to standardize the company’s view of data.
• It can be a communication tool between users and designers.
• It allows the designer to understand the nature, role, and scope of the data.
• It allows the designer to understand business processes.
• It allows the designer to develop appropriate relationship participation rules and constraints and to
create an accurate data model.
The Entity Relationship Model (ER or ERM)
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with
the help of ER diagrams.
The Relational Database Model, that the entity relationship model (ERM) forms the basis of an ERD.
Peter Chen first introduced the ER data model in 1976; the graphical representation of entities and their relationships in a database
structure quickly became popular because it complemented the relational data model concepts.
The relational data model and ERM combined to provide the foundation for tightly structured database design.
ER models are normally represented in an entity relationship diagram (ERD), which uses graphical representations to model database
components.
The ER model uses the term connectivity to label the relationship types.
The name of the relationship is usually an active or passive verb.
For example, a PAINTER paints many PAINTINGs,
an EMPLOYEE learns many SKILLs, and
an EMPLOYEE manages a STORE
Crow’s Foot notation - A representation of the entity relationship diagram that uses a three-pronged
symbol to represent the “many” sides of the relationship.
Class diagram notation - The set of symbols used in the creation of class diagrams
- Unified Modeling Language (UML)
Connectivity and Cardinality
Cardinality expresses the minimum and maximum number of entity occurrences associated with one
occurrence of the related entity.
In the ERD, cardinality is indicated by placing the appropriate numbers beside the entities, using the format
(x,y).
The first value represents the minimum number of associated entities, while the second value
represents the maximum number of associated entities.
Developing an ER Diagram
Also, the UML notation uses names in both sides of the relationship.
For example, to read the relationship between PAINTER and PAINTING, note the following:
• A PAINTER “paints” one to many PAINTINGs, as indicated by the 1..* symbol.
• A PAINTING is “painted by” one and only one PAINTER, as indicated by the 1..1 symbol
Crow’s Foot notation - A representation of the entity relationship diagram that uses a three-pronged
symbol to represent the “many” sides of the relationship.
Class diagram notation - The set of symbols used in the creation of class diagrams
- Unified Modeling Language (UML)
The Object-Oriented Model (OODBMS)
Semantic data model - The first of a series of data models that models both data and their relationships in
a single structure known as an object.
Class - A collection of similar objects with shared structure (attributes) and behavior (methods).
- A class encapsulates an object’s data representation and a method’s implementation.
Classes are organized in a class hierarchy. The class hierarchy resembles an upsidedown tree in which
each class has only one parent.
The organization of classes in a hierarchical tree in which each parent class is a superclass and each child
class is a subclass.
For example, the CUSTOMER class and the EMPLOYEE class share a parent PERSON class.
Method - In the object-oriented data model, a named set of instructions to perform an action.
- represent real world actions.
Object-oriented data models are typically depicted using Unified Modeling Language (UML) class diagrams.
UML is a language based on OO concepts that describes a set of diagrams and symbols you can use to
graphically model a system.
Inheritance is the ability of an object within the class hierarchy to inherit the attributes and methods of the
classes above it.
For example, two classes, CUSTOMER and EMPLOYEE, can be created as subclasses from the class PERSON.
• The object representation of the INVOICE includes all related objects within the same object box.
Note that the connectivities (1 and M) indicate the relationship of the related objects to the INVOICE.
For example, the “1” next to the CUSTOMER object indicates that each INVOICE is related to only one CUSTOMER. The “M” next to the LINE
object indicates that each INVOICE contains many LINEs.
• The UML class diagram uses three separate object classes (CUSTOMER, INVOICE, and LINE) and two relationships to
represent this simple invoicing problem.
Note that the relationship connectivities are represented by the 1..1, 0..*, and 1..* symbols, and that the relationships are named in both
ends to represent the different “roles” that the objects play in the relationship
Indexes
Each key points to the location of the data identified by the key.
Indexes play an important role in DBMSs for the implementation of primary keys.
When you define a table’s primary key, the DBMS automatically creates a unique
index on the primary key column(s) you declared.
the PAINTER_NUM has multi ple pointer values associated with it. For example, painter number
123 points to three rows—1, 2, and 4—in the PAINTING table.)
The Extended Entity Relationship Model (EERM)
The extended entity relationship model (EERM), sometimes referred to as the enhanced entity relationship
model, is the result of adding more semantic constructs to the original ER model.
EER diagram (EERD) - The entity relationship diagram resulting from the application of extended entity
relationship concepts that provide additional semantic content in the ER model.
entity supertype - In a generalization or specialization hierarchy, a generic entity type that contains the
common characteristics of entity subtypes.
entity subtype - In a generalization or specialization hierarchy, a subset of an entity supertype. The entity
supertype contains the common characteristics and the subtypes contain the unique characteristics of
each entity.
Two criteria help the designer determine when to use subtypes and supertypes:
• There must be different, identifiable kinds or types of the entity in the user’s
environment.
• The different kinds or types of instances should each have one or more attributes that
are unique to that kind or type of instance
specialization hierarchy - A hierarchy based on the top-down process of identifying lower-level, more
specific entity subtypes from a higher-level entity supertype.
Specialization is based on grouping unique characteristics and relationships of the subtypes
Two criteria help the designer determine when to use subtypes and supertypes:
• There must be different, identifiable kinds or types of the entity in the user’s
environment.
• The different kinds or types of instances should each have one or more attributes that
are unique to that kind or type of instance
specialization hierarchy - A hierarchy based on the top-down process of identifying lower-level, more
specific entity subtypes from a higher-level entity supertype.
Specialization is based on grouping unique characteristics and relationships of the subtypes
A subtype discriminator is the attribute in the supertype entity that determines to
which subtype the supertype occurrence is related.
A subtype discriminator is the attribute in the supertype entity that determines to which subtype the
supertype occurrence is related.
Disjoint subtypes, also known as nonoverlapping subtypes, are subtypes that contain a unique subset of
the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the
subtypes.
Overlapping subtypes are subtypes that contain nonunique subsets of the supertype entity set; that is, each
entity instance of the supertype may appear in more than one subtype.
Completeness constraint - specifies whether each entity supertype occurrence must also be a member of
at least one subtype. The completeness constraint can be partial or total.
Partial completeness means that not every supertype occurrence is a member of a subtype; some
supertype occurrences may not be members of any subtype.
Total completeness means that every supertype occurrence must be a member of at least one
subtype
Generalization - is the bottom-up process of identifying a higher-level, more generic entity supertype from
lower-level entity subtypes.
- Generalization is based on grouping the common characteristics and relationships of the subtypes.
For example, you might identify multiple types of musical instruments: piano, violin, and guitar.
Using the generalization approach, you could identify a “string instrument” entity supertype to hold the common characteristics of the
multiple subtypes
Introduction to Structured Query Language (SQL)
SQL, which is pronounced S-Q-L or sequel, is composed of commands that enable users to create database
and table structures, perform various types of data manipulation and data administration, and query the
database to extract useful information.
All relational DBMS software supports SQL, and many software vendors have developed extensions to
the basic SQL command set
SQL functions fit into several broad categories:
• Data manipulation language (DML). SQL includes commands to insert, update, delete, and retrieve data
within the database tables. concentrate on the commands to retrieve data in interesting ways.
•Data definition language (DDL). SQL includes commands to create database objects such as tables,
indexes, and views, as well as commands to define access rights to those database objects.
•Transaction control language (TCL). The DML commands in SQL are executed within the context of a
transaction, which is a logical unit of work composed of one or more SQL statements, as defined by business
rules
•Data control language (DCL). Data control commands are used to control access to data objects, such as
giving a one user permission to only view the PRODUCT table, and giving another use permission to change
the data in the PRODUCT table.
to be continued...