Data Modelling
Data Modelling
Data modeling is the formalization and documentation of existing processes and events that
occur during application software design and development. Data modeling techniques and tools
capture and translate complex system designs into easily understood representations of the data
flows and processes, creating a blueprint for construction and/or re-engineering.
A data model can be thought of as a diagram or flowchart that illustrates the relationships
between data. Although capturing all the possible relationships in a data model can be very time-
intensive, it's an important step and shouldn't be rushed. Well-documented models allow stake-
holders to identify errors and make changes before any programming code has been written.
Data modelers often use multiple models to view the same data and ensure that all processes,
entities, relationships and data flows have been identified. There are several different approaches
to data modeling, including:
Enterprise Data Modeling - similar to conceptual data modeling, but addresses the unique
requirements of a specific business.
Logical Data Modeling - illustrates the specific entities, attributes and relationships involved
in a business function. Serves as the basis for the creation of the physical data model.
Below we show the conceptual, logical, and physical versions of a single data model.
We can see that the complexity increases from conceptual to logical to physical. This is why we
always first start with the conceptual data model (so we understand at high level what are the
different entities in our data and how they relate to one another), then move on to the logical data
model (so we understand the details of our data without worrying about how they will actually
implemented), and finally the physical data model (so we know exactly how to implement our
data model in the database of choice). In a data warehousing project, sometimes the conceptual
data model and the logical data model are considered as a single deliverable.
Conceptual schema: describes the semantics of a domain (the scope of the model). For
example, it may be a model of the interest area of an organization or of an industry. This
consists of entity classes, representing kinds of things of significance in the domain, and
relationships assertions about associations between pairs of entity classes. A conceptual
schema specifies the kinds of facts or propositions that can be expressed using the model.
In that sense, it defines the allowed expressions in an artificial "language" with a scope
that is limited by the scope of the model. Simply described, a conceptual schema is the
first step in organizing the data requirements.
Logical schema: describes the structure of some domain of information. This consists of
descriptions of (for example) tables, columns, object-oriented classes, and XML tags.
The logical schema and conceptual schema are sometimes implemented as one and the
same.[3]
Physical schema: describes the physical means used to store data. This is concerned with
partitions, CPUs, tablespaces, and the like.
According to ANSI, this approach allows the three perspectives to be relatively independent of
each other. Storage technology can change without affecting either the logical or the conceptual
schema. The table/column structure can change without (necessarily) affecting the conceptual
schema. In each case, of course, the structures must remain consistent across all schemas of the
same data model.
The process of designing a database involves producing the previously described three types of
schemas - conceptual, logical, and physical. The database design documented in these schemas
are converted through a Data Definition Language, which can then be used to generate a
database. A fully attributed data model contains detailed attributes (descriptions) for every entity
within it. The term "database design" can describe many different parts of the design of an
overall database system. Principally, and most correctly, it can be thought of as the logical design
of the base data structures used to store the data. In the relational model these are the tables and
views. In an object database the entities and relationships map directly to object classes and
named relationships. However, the term "database design" could also be used to apply to the
overall process of designing, not just the base data structures, but also the forms and queries used
as part of the overall database application within the Database Management System or DBMS.
In the process, system interfaces account for 25% to 70% of the development and support costs
of current systems. The primary reason for this cost is that these systems do not share a common
data model. If data models are developed on a system by system basis, then not only is the same
analysis repeated in overlapping areas, but further analysis must be performed to create the
interfaces between them. Most systems within an organization contain the same basic data,
redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can
minimize rework with minimal modifications for the purposes of different systems within the
organization[1]