This document provides an overview of data modeling and different data models. It discusses key concepts of data modeling including data abstraction and database models. It describes different types of data models such as object-based models, physical models, and record-based models. Specifically, it covers hierarchical, network, and relational database models. The document emphasizes that data modeling is important as it provides a blueprint for building a database and ensures all required data is represented accurately.
This document provides an overview of data modeling and different data models. It discusses key concepts of data modeling including data abstraction and database models. It describes different types of data models such as object-based models, physical models, and record-based models. Specifically, it covers hierarchical, network, and relational database models. The document emphasizes that data modeling is important as it provides a blueprint for building a database and ensures all required data is represented accurately.
• One fundamental characteristic of the database approach is data abstraction. • Data abstraction refers to the suppression of details of data organization and storage, and the highlighting of the essential features for an improved understanding of data. • It supports data abstraction so that different users can perceive data at their preferred level of detail. DATABASE MODEL • Database model or data model is an integrated collection of concepts for describing and manipulating data, relationships between data and constraints on the data • A data model-a collection of concepts that can be used to describe the structure of a database-provides the necessary means to achieve this abstraction. • A model is a representation of real-world objects and event and their associations. • It should provide the basic concepts and notations that will allow database designers and end-users to communicate unambiguously and give them an accurate understanding of the organizational data. • A data model can be thought of comprising three components: • A structural part, consisting of set of rules according to which databases can be constructed; • A manipulative part, defining the types of operations that are allowed on the data; • A set of integrity constraints, which ensures that data is accurate. Why is Data Modeling Important? • Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? • It is a blueprint • The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language, it can be reviewed and verified as correct by the end-users. • The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. • Therefore, a data model is a plan for building a database. To be effective, it must be simple enough to communicate to the end user the data structure required by the database yet detailed enough for the database design to use to create the physical structure. Classification of Data models • Object Based Data Models • Physical Data Models • Record Based Data Models Object Based Data Models
• OOP is based on Encapsulation and
Inheritance. • A program in general consists of data and code that operates on the data. • Object-oriented programming encapsulates in an object some data and function to operate on the data • Class: An entity that has a well-defined role in the application domain, as well as state, behavior, and identity • Object: a particular instance of a class • Extensibility refers to the ability to extend an existing system without introducing changes to it. Benefits of object-oriented model • The programmer need only understand OO concepts as opposed to the combination of OO concepts and relational database storage. • Objects can inherit property settings from other objects. • Much of the application program process is automated. • It is theoretically easier to manage objects. • OO data model is more compatible with OO programming tools. Drawbacks of object-oriented model • Users must learn OO concepts because the OO database does not work with traditional programming methods. • Standards have not been completely established for the evolving database model. • Stability is a concern because OO databases have not been around for long. Physical Data Models
• Physical data models describe how data is
stored as files in the computer by representing information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records efficient. Record based-data models
• In record based model, the database consists
of a number of fixed format records. Each record type defines a fixed number of fields, typically of a fixed length. Types of record based models • Flat-file database model • Hierarchical database model • Network database model • Relational database model Flat-file database model • Before vendors started developing database management systems that run on a computer, many companies that were using computers stored their data in flat files on a host computer. • A flat-file database consists of one or more readable files, normally stored in a text format. Information in these files is stored as fields, the fields having either a constant length or a variable length separated by some character (delimiter). • The main problem with a flat-file database system is that not only do you have to understand the structure of the files, but also you must know exactly where data is physically stored. • Managing data relationships very difficult Drawbacks of a flat-file database • Flat files do not promote a structure in which data can easily be related. • It is difficult to manage data effectively and to ensure accuracy. • It is usually necessary to store redundant data, which causes more work to accurately maintain the data. • The physical location of the data field within the file must be known. • A program must be developed to manage the data. The Hierarchical Model • Hierarchical data is a collection of data where each item has a single parent and zero or more children (with the exception of the root item, which has no parent). • Each child table has a single parent table, and each parent table can have multiple child tables. • Child tables are completely dependent on parent tables; therefore, a child table can exist only if its parent table does. • The result of this structure is that the hierarchical database model supports one-to- one or one-to-many relationships. • In a hierarchical data model, data is organized into a tree-like structure. The tree structure allows repeating information using parent/child relationships. One of the problems with this hierarchical layout is that redundant book title information would have to be stored in the Inventory table because there is no direct association between Authors and Bookstores. • A node that has no children is called a leaf. Nodes that are children of the same parent are called siblings. For any node, there is a single path, called the hierarchical path, from the root to that node. • In this model, each node or segment contains one or more fields or data items that represent attributes describing an entity. • There are three types of fields in the segments: • A sequence field (or key field) is a field that identifies and provides access to segments in a database. In a root segment, the sequence field also uniquely identifies the record. In dependent segments, the sequence field can provide unique identification, but this is not required. • A search field is used to search through the database for particular values. • An undefined field. All fields other than sequence fields and search fields do not have to be defined. Advantages of Hierarchical • Simplicity • Security • Database Integrity • Efficiency Disadvantages of Hierarchical • Complexity of Implementation • Difficulty in Management • Complexity of Programming • Poor Portability • Operational Anomalies Network Database Model • Improvements were made to the hierarchical database model in order to derive the network model. As in the hierarchical model, tables are related to one another. • One of the main advantages of the network model is the capability of parent tables to share relationships with child tables. This means that a child table can have multiple parent tables. • Additionally, a user can access data by starting with any table in the structure, navigating either up or down in the tree. The user is not required to access a root table first to get to child tables. • A record is in many respects similar to an entity in the E-R model. • Each record is a collection of fields (attributes), each of which contains only one data value. • A link is an association between precisely two records. Thus, a link can be viewed as a restricted (binary) form of relationship in the sense of the E-R model. The Network Database Model was created for three main purposes. These purposes include: • representing a complex data relationship more effectively • improving database performance • imposing a database standard. key terms related to the network data model • A node represents an object of interest. • A link represents a relationship between two nodes. A link may be directed (that is, have a direction) or undirected (that is, not have a direction). • A path is an alternating sequence of nodes and links, beginning and ending with nodes, and usually with no. of nodes and links appearing more than once. (Repeating nodes and links within a path are permitted, but are rare in most network applications.) • A network is a set of nodes and links. A network is directed if the links that is contains are directed, and a network is undirected if the links that it contains are undirected. • A logical network contains connectivity information but no geometric information. A logical network can be treated as a directed graph or undirected graph, depending on the application. • Cost is a non-negative numeric attribute that can be associated with links or nodes for computing the minimum cost path, which is the path that has the minimum total cost from a start node to an end node • Reachable nodes are all nodes that can be reached from a given node. Reaching nodes are all nodes that can reach a given node. • The degree of a node is the number of links to (that is, incident upon) the node. • Network constraints are restrictions defined on network analysis computations. • A spanning tree of a connected graph is a tree (that is, a graph with no cycles) that connects all nodes of the graph. Link Restriction
• Only one-to-many links can be used. Many-to-
many links are disallowed to simplify the implementation. • If binary, many to many link is resolved using a many-to-one link. Create a new record type Rlink (referred to as a dummy record type). Benefits of network database model • Data is accessed very quickly. • Users can access data starting with any table. • It is easier to model more complex databases. • It is easier to develop complex queries to retrieve data. Drawbacks of network database model • The structure of the database is not easily modified. • Changes to the database structure definitely affect application programs that access the database. • The user has to understand the structure of the database. Relational Database Model • The relational database model improves on the restriction of a hierarchical structure, not completely abandoning the hierarchy of data. • Another benefit of the relational database model is that any tables can be linked together, regardless of their hierarchical position. • Any table can be accessed directly without having to access all parent objects. It is based on the concept of mathematical relations. • Data and relationships are represented as tables, each of which has a number of columns with unique name. The primary unit of storage in a database is a table, or group of related data. • In the relational model, all data is logically structured within relations (tables). • It is based on the mathematical concept of a relation, which is physically represented as a table. • Each relation has a name and is made up of named attributes (columns) of data. Each tuple (row) contains one value per attribute. • Three different types of table relationships are allowed: one-to-one, one-to-many, and many-to-many. Different relationships should be allowed to exist between tables in a database. • Table relationships are defined by referential integrity, which suggests the use of primary key and foreign key constraints. Referential integrity is the use of these constraints to validate data entered into a table and manage the relationship between parent and child tables. Benefits of the relational model are as follows: • Data is accessed very quickly. • The database structure is easy to change. • The data is represented logically, therefore users need not understand how the data is stored. • It is easy to develop complex queries to retrieve data. • It is easy to implement data integrity. • Data is generally more accurate. • It is easy to develop and modify application programs. • A standard language (SQL) has been developed. Drawbacks of the relational database model are as follows: • Different groups of information, or tables, must be joined in many cases to retrieve data. • Users must be familiar with the relationships between tables. • Users must learn SQL. Terminology for relational data model • Relational database is a collection of normalized relations with distinct relation names. • Relation is a table with columns and rows. An RDBMS requires only that the database be perceived by the user as tables. However, this perception applies only to the logical structure of the database. • Table is a physical representation of a relation. A table is the primary object used to store data in a relational database. When data is queried and accessed for modification, it is usually found in a table. A table is defined by columns. One occurrence of all columns in a table is called a row of data. • Attribute is a named column of a relation. Relations are used to hold information about objects to be represented in the database. • Domain is the set of allowable values for one or more attributes. Every attribute in a relation is defined on a domain. The domain concept is important, because it allows the user to define in a central place the meaning and the source of values that attributes can hold. • The degree of a relation is the number of attributes it contains. A relation with only one attribute would have degree one and be called a unary relation or one tuple • The cardinality of a relation is the number of tuples it contains. FORMAL TERMS ALTERNATIVE 1 ALTERNATIVE 2
Relation Table File
Tuple Row Record
Attribute Column Field
• Constraint: A constraint is an object used to place rules on data. Constraints are used to control the allowed data in a column. Constraints are created at the column level and are also used to enforce referential integrity (parent and child table relationships). • Index: An index is an object that is used to speed the process of data retrieval on a table. • Trigger: A trigger is a stored unit of programming code in the database that is fired based on an event that occurs in the database. When a trigger is fired, data might be modified based on other data that is accessed or modified. Triggers are useful for maintaining redundant data. • Procedure: A procedure is a program that is stored in the database. A procedure is executed at the database level. Procedures are typically used to manage data and for batch processing. Mathematical Relations •Suppose we have two sets, D1 and D2, where D1 = {2, 4} and D2 = {1, 3, 5} •The Cartesian product of these two sets, written as D1 × D2 is the set of all ordered pairs such that the first element is a member of D1 and the second element is a member of D2. D1 × D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)} •Any subset of this Cartesian product is a relation. Eg we could produce a relation R whose ordered pairs in which the second element is 1. R = {(2, 1), (4, 1)} • This could be written as Database Relations • A named relation defined by a set of attributes and domain name pairs is called a relation schema. • Let A1, A2, . . . An be attributes with domains D1, D2, . . . , Dn. • Then the set {A1:D1, A2:D2, . . . , An:Dn} is a relation schema. • A relation R defined by a relation schema S is a set of mappings from the attribute names to their corresponding domains. • Thus, relation R is a set of n-tuples: (A1:d1, A2:d2, . . . , An:dn) | d1 D1, d2 D2, . . . , dn Dn • When we write out a relation as a table, we list the attribute name as column headings and write out the tuples as rows having the form (d1, d2, . . . , dn), where each value is taken from the appropriate domain. • We can think of a relation in the relational model as any subset of the Cartesian product of the domains of the attributes. • A table is simply a physical representation of such a relation. • One of the four tuples is: • {(B005, 22 Dorcus House, Ndola, NDL 02)} Or more correctly: • {(branchNo:B005, street:22 Dorcus House, city:Ndola, postcode:NDL 02)} • This is referred to as a relation instance. The branch table is a convenient way of writing out all the four- tuples that form the relation at a specific moment in time. • It is important to distinguish between the description of the database and the database itself. • The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. This may change every now and then. • The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Properties of Relations • The has a name that is distinct from all other relation names in the relational schema; • Each cell of the relation contains exactly one atomic (single) value; • Each attribute has a distinct name; • Each tuple is distinct; there is no duplicate tuples; • The order of attributes has no significance; • The order of tuples has no significance, theoretically. (However, in practice, the order may affect the efficiency of accessing tuples.)