DDM Assignment
DDM Assignment
by
Govind Devanand Menon – 12 A
Jowil Mehta – 14 A
Jugaan J Jacob – 15 A
Kashvi Singh – 16 A
Sahil Noor – 18 A
Submitted to
Dr. Jitendra Kumar Verma,
Assistant Professor
1
1. The Context of Database Management
a) What is the Database Approach, and how does it differ from
Traditional File Processing Systems? Discuss the advantages of the
Database Approach.
Database Approach vs. Traditional File Processing:
The traditional file processing system organizes data into separate files for each
application. This results in:
Data redundancy and inconsistency: The same data is stored in multiple locations,
which increases storage needs and leads to discrepancies.
Limited data sharing: Each application owns its data files, restricting access to others.
Lengthy development times and excessive maintenance: Each new application often
starts from scratch, consuming time and resources.
Reduced redundancy,
Greater flexibility,
2
3. Improved Data Consistency and Quality: Single source of truth reduces
inconsistency.
4. Improved Data Sharing: Multiple users and apps can access the same data.
7. Improved Decision Support: Unified and reliable data enhances reporting and
analysis.
1. Data Modeling and Design Tools: Used to define database structure graphically
and automatically generate database code.
3. DBMS (Database Management System): The core software that manages the
creation, access, update, and deletion of data.
6. User Interface: Menus, forms, and commands that allow users to interact with the
system.
7. Data and Database Administrators: Manage data resources and physical database
design.
9. End Users: Individuals who use the database directly to retrieve, enter, or
manipulate data.
Contribution to Effectiveness:
3
Integration: Centralizing metadata in the repository ensures consistency.
Access Control: The DBMS ensures data security and user access rights.
Often includes specific attributes and relationships not visible at the enterprise level.
Enterprise models provide a holistic data strategy and ensure integration across
systems.
4
1. 1960s – Flat File Systems: Early systems lacked integration; data was stored in
isolated files.
Relational Databases: Still dominant for structured data (e.g., MySQL, Oracle).
Multitier Architectures: Used in enterprise and web systems; separate layers for
user interface, business logic, and data storage.
Data Warehouses: Designed for analytical processing (OLAP), using star schemas
or cubes.
2. Database Analysis
a) Define the key constructs of the E-R Model. How are entities,
attributes, and relationships modeled using this approach?
The Entity-Relationship (E-R) Model is a logical representation of the data for an
organization and is typically visualized through E-R diagrams.
Key Constructs:
1. Entity:
5
Can be a person, place, object, event, or concept.
2. Attribute:
Types include:
Simple vs. Composite (e.g., a name with first, middle, and last)
3. Relationship:
Common approaches:
6
Example: To track changes in product line assignments over time, create an
associative entity like ASSIGNMENT with start and end dates.
These are distinct semantic relationships even though they link the same entity types.
All share common attributes: Employee Number, Name, Address, Date Hired.
Modeling Approach:
Define a supertype: EMPLOYEE
Specifying Constraints:
1. Completeness Constraints:
7
Total Specialization Rule (double line): Every instance of the supertype must
be a member of at least one subtype.
Partial Specialization Rule (single line): Some instances of the supertype may
not belong to any subtype.
2. Disjointness Constraints:
3. Subtype Discriminator:
3. Database Design
a) Discuss the logical database design and its relationship with the
relational data model. How does normalization contribute to well-
structured relations?
Logical Database Design:
Logical database design is the process of transforming the conceptual data model (e.g., E-
R or EER diagrams) into a logical model that can be implemented using a DBMS. It
focuses on defining data structures in a way that ensures consistency, integrity, and ease
of access, independent of physical considerations.
The relational model is the most commonly used logical model and represents data
in tables (relations) consisting of rows and columns.
During logical design, each entity becomes a table, relationships are expressed via
foreign keys, and attributes become table columns.
Logical design ensures that these tables are normalized, promoting data quality and
eliminating redundancy.
Normalization's Role:
It occurs in stages called normal forms (1NF to 5NF), each with specific
requirements:
8
1NF: Eliminate repeating groups.
Example:
Assume a relation:
scs
s
EMPLOYEE2(EmpID, Name, DeptName, Salary, CourseTitle,
Completion
Date)
Here, an employee might have multiple course entries, causing redundancy in storing
Name, DeptName, and Salary.
EMP_COURSE(EmpID, CourseTitle,
CompletionDate)
This decomposition removes redundancy and anomalies. Now, updating a salary happens
in one place only, preventing inconsistencies.
9
Simplifies enforcement of integrity constraints.
This step translates logical data definitions into specifications for actual storage and
performance optimization. It includes:
Supports laws like SOX (Sarbanes-Oxley) and Basel II, which demand:
Field-level constraints.
Audit logs.
Performed during system analysis to estimate current and future data size and
access frequency.
1
0
Informs:
Storage needs
Indexing strategies
Partitioning plans
Performance expectations
SQL became a standard through ANSI (1986) and ISO (1987). Since then, it has
evolved through several versions, notably SQL:1992, SQL:1999, SQL:2003,
SQL:2008, and SQL:2011, which introduced features like analytic functions, XML
support, and new data types.
DDM 1
Assignment 0
CREATE to create a database
DATABASE
CREATE to define tables with columns and constraints
TABLE
Inserting Data:
11
Updating Data:
UPDATE Product_T
SET ProductStandardPrice =
775 WHERE ProductID = 7;
Can also use SET column = or use subqueries for complex updates.
NULL
Deleting Data:
Use caution to avoid referential integrity violations when related data exists in other
tables.
Equi-Join:
Natural Join, Outer Join, and Self Join are also commonly used for different
relationship needs.
Subqueries:
A subquery is a
SELECT query nested inside another SQL statement.
Non-correlated subquery:
12
SELECT
ProductID
FROM
Product_T
WHERE ProductStandardPrice > (SELECT
AVG(ProductStandard
Correlated subquery: Price) FROM Product_T);
Views:
A view is a virtual table created by a
SELECT query.
Materialized Views store query results physically and are refreshed periodically,
often used in data warehousing.
There are typically three main application logic components in these systems:
3. Storage Logic – manages data storage and retrieval from physical devices (typically
hosted on the server).
13
Application logic is often split between client and server using a process called
application partitioning.
For example, in a fat client, most of the processing occurs on the client side. In a
thin client, the server handles most of the processing.
Two-Tier Architecture:
Consists of a client and a database server.
The client manages UI, application logic, and interacts directly with the database.
Advantages:
Disadvantages:
Three-Tier Architecture:
Includes client, application server, and database server.
Advantages:
Flexibility – code changes in the application layer don’t affect the database or client
directly.
14
Disadvantages:
3. Tier flexibility: Any of the client, application, or database layers can be hosted in
the cloud.
Service Models:
Platform-as-a-Service (PaaS): Provides tools like app servers and DBMS (e.g.,
SQL Azure).
DDM 15
Assignment
4. Stored in native XML databases – optimal for pure XML documents.
Displaying XML:
Applications:
Web services.
Standardized data exchange using XML-based languages like XBRL and SPL.
Significance:
Centralizes enterprise data from diverse sources to create a consistent,
organization-wide view.
Architecture:
The three-layer architecture of a data warehouse includes:
2. Reconciled Data Layer – Cleaned and integrated data stored in the Enterprise Data
Warehouse (EDW).
3. Derived Data Layer – Summarized or aggregated data stored in data marts for user-
specific applications.
DDM 16
Assignment
Key Components:
ETL (Extract, Transform, Load): Processes that prepare operational data for analysis.
Enterprise Data Model: Guides the design of integrated data and supports
warehouse evolution.
NoSQL Databases:
“NoSQL” means “Not Only SQL.” These are non-relational databases optimized for:
Categories include:
Handle semi-structured data like JSON/XML from web and mobile sources.
DDM 17
Assignment
Enable horizontal scaling in cloud environments.
Provide flexibility for storing web logs, IoT data, and social media streams.
Hadoop:
An open-source batch-processing framework based on the MapReduce algorithm,
designed for processing huge datasets across many computers.
Key components:
Often used alongside traditional warehouses to create data lakes for broader
analytics.
Healthcare: Supports preventive care via data from wearables and genomics.
Social Implications:
1. Privacy vs. Collective Benefits:
DDM 1
Assignment 8
2. Ethics and Regulation:
Need for frameworks ensuring ethical data use and individual rights protection.
Impact on Decision-Making:
Enables evidence-based, data-driven strategies.
Poor data quality is a leading reason for failure in business initiatives and can reduce
productivity by up to 20%.
Data Governance:
A high-level organizational process that oversees data quality, integration,
architecture, and compliance.
Requires sponsorship from senior leadership and participation from data stewards
across departments.
Establishes data access rules, quality metrics, and regulatory policies (e.g., SOX,
HIPAA).
DDM 1
Assignment 9
Characteristics of Quality Data:
1. Identity Uniqueness
2. Accuracy
3. Consistency
4. Completeness
5. Timeliness
6. Currency
7. Conformance
8. Referential Integrity.
Analytics involves using data mining, machine learning, and statistical tools for
descriptive, predictive, and prescriptive analysis.
NoSQL Databases:
DDM 2
Assignment 0
Designed for flexibility and scalability.
Types include:
Useful for semi-structured and unstructured data, especially from web and IoT
sources.
Hadoop:
An open-source framework for distributed processing of large datasets.
Components:
HDFS (storage)
MapReduce (processing)
Pig/Hive (querying)
These platforms support real-time analytics, data science, and machine learning
integration.
DDM 21
Assignment
Responsibilities:
Responsibilities:
Data Security:
Protection against unauthorized access, data breaches, and malicious activity.
Techniques:
Authorization:
Specifies who can access what data under which conditions.
Based on user roles, with fine-grained controls at table, column, or operation levels.
IT Change Management:
Involves planning and tracking changes to infrastructure and databases.
Essential for regulatory compliance (e.g., SOX), ensuring that all changes are
authorized, tested, and documented.
2. Data Sharing Across Units: Enables coordinated business operations and decisions
that span departments or geographic regions.
3. Reduced Communication Costs: Storing data closer to its usage point reduces
transmission costs and response times.
6. Support for OLTP and OLAP: Balances operational and analytical processing
needs across systems.
Disadvantages:
Software complexity due to coordination needs
DDM 23
Assignment
choice of a distributed strategy.
Data Replication:
Definition: The process of storing copies of data at multiple sites in a distributed database
system.
Types:
Advantages:
1. Reliability: Sites remain operational even if others fail.
Disadvantages:
1. High storage requirements
4. Horizontally/vertically partitioned
DDM 24
Assignment
Factors Influencing Strategy Choice:
Organizational needs and autonomy
Key Concepts:
Class: Represents an abstract definition of an entity (e.g., Car, Employee).
Inheritance: A subclass inherits attributes and methods from its superclass (e.g., Car
and Truck from Vehicle).
Polymorphism: A method behaves differently based on the object class calling it.
DDM 25
Assignment
Behavior
Not supported Supported via methods/operations
Modeling
The object-oriented model is more expressive and powerful for modeling complex
systems involving both data and operations.
DDM 26
Assignment