0% found this document useful (0 votes)
3 views27 pages

DDM Assignment

The document is a group assignment for a Bachelor of Business Administration degree, focusing on Database Management. It covers topics such as the database approach versus traditional file processing systems, components of the database environment, enterprise and project-level data models, evolution of database technologies, and SQL fundamentals. Additionally, it discusses database design, normalization, and advanced SQL topics, providing a comprehensive overview of database concepts and their applications.

Uploaded by

Betty Jijy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views27 pages

DDM Assignment

The document is a group assignment for a Bachelor of Business Administration degree, focusing on Database Management. It covers topics such as the database approach versus traditional file processing systems, components of the database environment, enterprise and project-level data models, evolution of database technologies, and SQL fundamentals. Additionally, it discusses database design, normalization, and advanced SQL topics, providing a comprehensive overview of database concepts and their applications.

Uploaded by

Betty Jijy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

DDM Group Assignment

submitted for requirement of partial fulfilment award of degree of Bachelor of


Business Administration

by
Govind Devanand Menon – 12 A
Jowil Mehta – 14 A
Jugaan J Jacob – 15 A
Kashvi Singh – 16 A
Sahil Noor – 18 A

Submitted to
Dr. Jitendra Kumar Verma,
Assistant Professor

Indian Institute of Foreign Trade,


Kakinada Campus,
Andhra Pradesh- 533003
May 2025

1
1. The Context of Database Management
a) What is the Database Approach, and how does it differ from
Traditional File Processing Systems? Discuss the advantages of the
Database Approach.
Database Approach vs. Traditional File Processing:

The traditional file processing system organizes data into separate files for each
application. This results in:

Program-data dependence: File formats are hard-coded into programs, so changing a


file's structure requires modifying every related program.

Data redundancy and inconsistency: The same data is stored in multiple locations,
which increases storage needs and leads to discrepancies.

Limited data sharing: Each application owns its data files, restricting access to others.

Lengthy development times and excessive maintenance: Each new application often
starts from scratch, consuming time and resources.

In contrast, the database approach centralizes data management using a Database


Management System (DBMS). It enables data to be defined once and shared across
applications. This allows:

Centralized control of data,

Reduced redundancy,

Better data integrity,

Greater flexibility,

Improved productivity of application development.

Advantages of the Database Approach:

1. Program-Data Independence: Metadata are stored separately from programs,


allowing structural changes without altering application code.

2. Planned Data Redundancy: Redundancy is minimized or controlled to improve


consistency.

2
3. Improved Data Consistency and Quality: Single source of truth reduces
inconsistency.

4. Improved Data Sharing: Multiple users and apps can access the same data.

5. Increased Developer Productivity: Tools and reusable components speed up


development.

6. Enforcement of Standards: Naming conventions, data definitions, and business


rules are standardized.

7. Improved Decision Support: Unified and reliable data enhances reporting and
analysis.

b) Explain the components of the database environment. How do


these components contribute to the effectiveness of a Database
Management System?
A database environment consists of nine key components that work together to manage and
utilize data effectively:

1. Data Modeling and Design Tools: Used to define database structure graphically
and automatically generate database code.

2. Repository: A centralized metadata storage containing data definitions,


relationships, screen formats, etc.

3. DBMS (Database Management System): The core software that manages the
creation, access, update, and deletion of data.

4. Database: A logically related collection of data designed to meet organizational


needs.

5. Application Programs: Facilitate data manipulation and user interaction.

6. User Interface: Menus, forms, and commands that allow users to interact with the
system.

7. Data and Database Administrators: Manage data resources and physical database
design.

8. System Developers: Design and implement applications using the database.

9. End Users: Individuals who use the database directly to retrieve, enter, or
manipulate data.

Contribution to Effectiveness:

3
Integration: Centralizing metadata in the repository ensures consistency.

Access Control: The DBMS ensures data security and user access rights.

Collaboration: Shared data promotes coordinated decision-making.

Scalability: Clear roles and tools streamline maintenance and growth.

c) Compare and contrast enterprise and project-level data models.


Why are both essential in the context of the database development
process?
Enterprise Data Model:

High-level, organization-wide model.

Defines broad entities and business rules.

Created during strategic planning.

Aims to establish a unified view of the organization's data needs.

Project-Level Data Model:

More detailed and application-specific.

Focuses on particular databases or systems.

Developed during the analysis and design phases of SDLC.

Often includes specific attributes and relationships not visible at the enterprise level.

Why Both Are Essential:

Enterprise models provide a holistic data strategy and ensure integration across
systems.

Project-level models translate enterprise goals into functional systems and


databases.

Together, they ensure consistency, completeness, and alignment of data initiatives


across an organization.

d) Trace the evolution of database technologies and systems. Discuss


different database architectures and their applications in contemporary
scenarios.
Evolution of Database Technologies:

4
1. 1960s – Flat File Systems: Early systems lacked integration; data was stored in
isolated files.

2. 1970s – Hierarchical and Network Models: Introduced structured formats; useful


but lacked flexibility.

3. 1980s – Relational Model: Popularized by E.F. Codd, relational databases (e.g.,


SQL-based) provided better data independence and ease of use.

4. 1990s – Object-Oriented and Object-Relational Models: Designed to manage


complex data such as multimedia or CAD files.

5. 2000s and beyond – Data Warehousing, Multidimensional Databases, and


NoSQL: Support for analytics, decision-making, and large-scale unstructured data.

Contemporary Database Architectures:

Relational Databases: Still dominant for structured data (e.g., MySQL, Oracle).

Multitier Architectures: Used in enterprise and web systems; separate layers for
user interface, business logic, and data storage.

Web-Enabled Databases: Support dynamic websites through client-server and


cloud architectures.

Data Warehouses: Designed for analytical processing (OLAP), using star schemas
or cubes.

NoSQL Databases: Handle large-scale unstructured or semi-structured data; used in


big data and real-time applications.

These architectures address diverse needs—from real-time web transactions to long-term


strategic data analysis.

2. Database Analysis
a) Define the key constructs of the E-R Model. How are entities,
attributes, and relationships modeled using this approach?
The Entity-Relationship (E-R) Model is a logical representation of the data for an
organization and is typically visualized through E-R diagrams.

Key Constructs:
1. Entity:

A thing in the user environment that is distinguishable from other things.

5
Can be a person, place, object, event, or concept.

Entity Type: A category of entities sharing the same properties (e.g.,


EMPLOYEE).

Entity Instance: A single occurrence of an entity type.

2. Attribute:

A property or characteristic of an entity or a relationship.

Types include:

Required vs. Optional

Simple vs. Composite (e.g., a name with first, middle, and last)

Single-valued vs. Multivalued

Stored vs. Derived (e.g., Age derived from Birth Date)

Identifier: Uniquely distinguishes entity instances.

3. Relationship:

An association between entities.

Relationship Type vs. Relationship Instance

Degree: Number of entities involved (Unary, Binary, Ternary)

Cardinality: Minimum and maximum number of entity instances involved


(e.g., One-to-Many)

May have attributes of their own (modeled via associative entities).

b) Describe the process of modeling time-dependent data and handling


multiple relationships between entity types in the context of the E-R
Model.

Modeling Time-Dependent Data:


Time-sensitive data may require tracking of historical changes.

Common approaches:

Time Stamps on attributes or relationships.

Associative Entities to represent time-bound relationships.

6
Example: To track changes in product line assignments over time, create an
associative entity like ASSIGNMENT with start and end dates.

It’s crucial when organizations need to support auditing, versioning, or regulatory


compliance.

Handling Multiple Relationships:


It’s possible to have more than one relationship between the same two entity types.

Example: Between EMPLOYEE and DEPARTMENT:

1. Has Workers: A department has many employees.

2. Is Managed By: A department is managed by one employee.

These are distinct semantic relationships even though they link the same entity types.

Proper naming and cardinality specification are key for clarity.

c) Provide an example of E-R modeling, including the representation of


supertypes and subtypes. Explain how constraints are specified in
supertypes/subtypes relationships.

EMPLOYEE Supertype and Subtypes


Suppose a company has three types of employees:

HOURLY EMPLOYEE (Hourly Rate)

SALARIED EMPLOYEE (Annual Salary, Stock Option)

CONSULTANT (Contract Number, Billing Rate)

All share common attributes: Employee Number, Name, Address, Date Hired.

Modeling Approach:
Define a supertype: EMPLOYEE

Define three subtypes: HOURLY, SALARIED, CONSULTANT

Common attributes go in the supertype; specific ones go in subtypes.

Specifying Constraints:
1. Completeness Constraints:

7
Total Specialization Rule (double line): Every instance of the supertype must
be a member of at least one subtype.

Partial Specialization Rule (single line): Some instances of the supertype may
not belong to any subtype.

2. Disjointness Constraints:

Disjoint Rule: An instance can be a member of only one subtype.

Overlap Rule: An instance may belong to multiple subtypes.

3. Subtype Discriminator:

An attribute in the supertype used to determine the subtype (e.g., "Employee


Type" with values "H", "S", "C").

3. Database Design
a) Discuss the logical database design and its relationship with the
relational data model. How does normalization contribute to well-
structured relations?
Logical Database Design:

Logical database design is the process of transforming the conceptual data model (e.g., E-
R or EER diagrams) into a logical model that can be implemented using a DBMS. It
focuses on defining data structures in a way that ensures consistency, integrity, and ease
of access, independent of physical considerations.

Relationship to the Relational Data Model:

The relational model is the most commonly used logical model and represents data
in tables (relations) consisting of rows and columns.

During logical design, each entity becomes a table, relationships are expressed via
foreign keys, and attributes become table columns.

Logical design ensures that these tables are normalized, promoting data quality and
eliminating redundancy.

Normalization's Role:

Normalization is the process of structuring relations to minimize redundancy and


avoid anomalies (e.g., insertion, update, and deletion anomalies).

It occurs in stages called normal forms (1NF to 5NF), each with specific
requirements:

8
1NF: Eliminate repeating groups.

2NF: Remove partial dependencies.

3NF: Remove transitive dependencies.

Boyce-Codd NF: Resolve remaining functional dependency anomalies.

A well-structured relation supports efficient data modification and ensures


consistency.

b) Explain the concept of normalization with examples. How does


normalization help in eliminating data redundancy?
Concept of Normalization:

Normalization is the systematic decomposition of complex, redundant relations into


simpler, well-structured ones. It is based on analyzing functional dependencies, which
define how one attribute's value determines another.

Example:
Assume a relation:

scs
s
EMPLOYEE2(EmpID, Name, DeptName, Salary, CourseTitle,
Completion
Date)

Here, an employee might have multiple course entries, causing redundancy in storing
Name, DeptName, and Salary.

Through normalization, we split this into:

EMPLOYEE1(EmpID, Name, DeptName,


Salary)

EMP_COURSE(EmpID, CourseTitle,
CompletionDate)

This decomposition removes redundancy and anomalies. Now, updating a salary happens
in one place only, preventing inconsistencies.

How It Eliminates Redundancy:

Removes duplicate data by creating separate relations.

Improves storage efficiency.

9
Simplifies enforcement of integrity constraints.

Reduces maintenance overhead by isolating changes to relevant tables.

c) Describe the physical database design process. Why is it crucial


for regulatory compliance, and how is data volume analysis
incorporated into the process?
Physical Database Design Process:

This step translates logical data definitions into specifications for actual storage and
performance optimization. It includes:

1. Choosing data types for fields.

2. Deciding file organizations (heap, sequential, indexed, hashed).

3. Defining indexes and clustering strategies.

4. Partitioning data (horizontal and vertical).

5. Designing physical files and storage locations.

6. Creating strategies for performance tuning, backup, and recovery.

Importance for Regulatory Compliance:

Acts as the foundation for data integrity, security, and auditability.

Supports laws like SOX (Sarbanes-Oxley) and Basel II, which demand:

Accuracy in financial data.

Evidence of internal controls.

Audit trails and change tracking.

Compliance requires that controls be enforced consistently through physical design


mechanisms like:

Field-level constraints.

Triggers and stored procedures.

Audit logs.

Data Volume Analysis:

Performed during system analysis to estimate current and future data size and
access frequency.

1
0
Informs:

Storage needs

Indexing strategies

Partitioning plans

Performance expectations

Helps prioritize optimizations (e.g., focus on high-access tables)

4. Database Implementation: Hands-on SQL


a) Introduce SQL and its origins. Discuss SQL data types and the
process of defining a database in SQL.

Introduction and Origins of SQL:


SQL (Structured Query Language) originated in the early 1970s from IBM's System
R project at the San Jose Research Laboratory.

Initially called SEQUEL, the language was renamed to SQL.

The first commercial implementation was Oracle in 1979.

SQL became a standard through ANSI (1986) and ISO (1987). Since then, it has
evolved through several versions, notably SQL:1992, SQL:1999, SQL:2003,
SQL:2008, and SQL:2011, which introduced features like analytic functions, XML
support, and new data types.

SQL Data Types:


SQL supports several categories of data types:

String: CHAR , VARCHA , TEXT


R
Numeric: INTEGER , NUMERIC , DECIMAL , FLOAT

Date/Time: DAT , TIME , TIMESTAMP , including time zone variants


E
CLOB
Binary: BLOB,
Boolean: BOOLEA (in newer standards)
N

SQL:2008 also added BIGINT , MULTISET


, XML
.

Defining a Database in SQL:


This is done using Data Definition Language (DDL) commands:

DDM 1
Assignment 0
CREATE to create a database
DATABASE
CREATE to define tables with columns and constraints
TABLE

ALTER to modify table structure


TABLE
DROP to remove a table
TABLE
Example:

CREATE TABLE Customer_T (


CustomerID INTEGER PRIMARY KEY,
CustomerName VARCHAR(25) NOT NULL,
CustomerAddress VARCHAR(50)
);

Constraints such as NOT NULL ,


DEFAUL
, UNIQUE
, CHEC
, REFERENCE
(foreign
T K S
key) ensure data integrity.

b) Explain the basics of inserting, updating, and deleting data using


SQL commands.
These are part of Data Manipulation Language (DML) in SQL.

Inserting Data:

INSERT INTO Customer_T


VALUES (001, 'Contemporary Casuals', '1355 S. Himes Blvd.',
'Gainesvill e', 'FL', '32601');

Can also specify only certain columns:

INSERT INTO Product_T (ProductID, ProductDescription,


ProductFinish) VALUES (1, 'End Table', 'Cherry');

Can insert from another table:

INSERT INTO CaCustomer_T


SELECT * FROM Customer_T
WHERE CustomerState =
'CA';

11
Updating Data:

UPDATE Product_T
SET ProductStandardPrice =
775 WHERE ProductID = 7;

Can also use SET column = or use subqueries for complex updates.
NULL

Deleting Data:

DELETE FROM Customer_T


WHERE CustomerState = 'HI';

To delete all rows:

DELETE FROM Customer_T;

Use caution to avoid referential integrity violations when related data exists in other
tables.

c) Explore advanced SQL topics, including processing multiple


tables, subqueries, and the use of views.

Processing Multiple Tables:


Joins allow retrieval of data across related tables:

Equi-Join:

SELECT C.CustomerName, O.OrderDate


FROM Customer_T C, Order_T O
WHERE C.CustomerID  O.CustomerID;

Natural Join, Outer Join, and Self Join are also commonly used for different
relationship needs.

Subqueries:
A subquery is a
SELECT query nested inside another SQL statement.

Non-correlated subquery:

12
SELECT
ProductID
FROM
Product_T
WHERE ProductStandardPrice > (SELECT
AVG(ProductStandard
Correlated subquery: Price) FROM Product_T);

Depends on values from the outer query, evaluated per row.

Views:
A view is a virtual table created by a
SELECT query.

CREATE VIEW ExpensiveProducts AS


SELECT * FROM Product_T WHERE ProductStandardPrice 
1000;
Views can simplify complex queries, enforce security, and support abstraction.

Materialized Views store query results physically and are refreshed periodically,
often used in data warehousing.

5. Database Implementation: Database


Applications
a) Discuss client/server architectures. What is the role of application
logic in client/server systems?
Client/server architecture is a networked computing model where tasks and workloads
are distributed between clients (user-facing systems) and servers (centralized processors
providing services).

There are typically three main application logic components in these systems:

1. Presentation Logic – manages user interface and display.

2. Processing Logic – handles application-specific tasks like business rules, data


processing, and validation.

3. Storage Logic – manages data storage and retrieval from physical devices (typically
hosted on the server).

Role of Application Logic:

13
Application logic is often split between client and server using a process called
application partitioning.

This partitioning can be optimized for performance, scalability, and


interoperability.

For example, in a fat client, most of the processing occurs on the client side. In a
thin client, the server handles most of the processing.

b) Compare and contrast two-tier and three-tier client/server


environments. What are the advantages and disadvantages of each?

Two-Tier Architecture:
Consists of a client and a database server.

The client manages UI, application logic, and interacts directly with the database.

Common in small or departmental applications.

Advantages:

Simpler and less costly to implement.

Low network load since only required data is transferred.

Disadvantages:

Scalability limitations – struggles with many users.

Business logic is tied closely to client applications, making updates harder.

Less secure and harder to manage centrally.

Three-Tier Architecture:
Includes client, application server, and database server.

The application server holds most or all of the business logic.

Advantages:

Scalability – can support more users.

Flexibility – code changes in the application layer don’t affect the database or client
directly.

Improved performance and reuse of application logic.

Better security and centralized control.

14
Disadvantages:

More complex to implement.

Higher initial costs and training requirements.

Requires middleware and potentially more specialized skills.

c) How does cloud computing impact database application


development? Discuss the role of XML in storing and displaying
data.
Cloud computing provides on-demand access to shared computing resources, including
databases, platforms, and applications.

Implications for Developers:

1. Simplified deployment: Developers no longer manage hardware/software setups.

2. Faster provisioning: Applications can be spun up quickly across different


platforms.

3. Tier flexibility: Any of the client, application, or database layers can be hosted in
the cloud.

4. Cost-efficiency: Ideal for organizations with limited IT budgets.

Service Models:

Infrastructure-as-a-Service (IaaS): Cloud providers manage hardware (e.g., Azure,


Rackspace).

Platform-as-a-Service (PaaS): Provides tools like app servers and DBMS (e.g.,
SQL Azure).

Software-as-a-Service (SaaS): Full apps hosted in the cloud (e.g., Salesforce).

Role of XML in Storing and Displaying Data:


Storing XML:

1. Shredded into relational tables.

2. Stored as BLOB/CLOB (not searchable).

3. Stored in XML columns with validation schemas (XSD).

DDM 15
Assignment
4. Stored in native XML databases – optimal for pure XML documents.

Displaying XML:

XSLT (Extensible Stylesheet Language Transformations) is used to render XML


as HTML or other formats.

XPath and XQuery enable querying and transforming XML data.

Applications:

Web services.

Mobile-responsive design using HTML5 + XML.

Standardized data exchange using XML-based languages like XBRL and SPL.

6. Data Warehousing and Integration with Big


Data and Analytics
a) Define data warehousing and its significance in modern
organizations. Explain the architecture and components of a data
warehouse.
A data warehouse is a subject-oriented, integrated, time-variant, and non-
updateable collection of data used in support of management decision-making processes and
business intelligence.

Significance:
Centralizes enterprise data from diverse sources to create a consistent,
organization-wide view.

Supports trend analysis, forecasting, and decision making.

Helps overcome data fragmentation from siloed systems.

Architecture:
The three-layer architecture of a data warehouse includes:

1. Operational Data Layer – Data from systems of record.

2. Reconciled Data Layer – Cleaned and integrated data stored in the Enterprise Data
Warehouse (EDW).

3. Derived Data Layer – Summarized or aggregated data stored in data marts for user-
specific applications.

DDM 16
Assignment
Key Components:
ETL (Extract, Transform, Load): Processes that prepare operational data for analysis.

Metadata: Describes the structure, operations, and content of the warehouse.

Enterprise Data Model: Guides the design of integrated data and supports
warehouse evolution.

b) Explore the characteristics of big data. Provide an overview of


NoSQL databases and Hadoop. How do they contribute to data
warehousing?

Characteristics of Big Data:

1. Volume – Massive data sizes (terabytes to petabytes).

2. Variety – Structured, semi-structured, and unstructured data.

3. Velocity – Real-time or near-real-time data generation.

4. Veracity – Uncertainty in data quality.

5. Value – Potential insights from analysis.

NoSQL Databases:
“NoSQL” means “Not Only SQL.” These are non-relational databases optimized for:

High scalability and performance

Schema flexibility for diverse and evolving data structures

Categories include:

Key-Value Stores (e.g., Redis)

Document Stores (e.g., MongoDB)

Wide-Column Stores (e.g., Cassandra)

Graph Databases (e.g., Neo4j).

Contribution to Data Warehousing:

Handle semi-structured data like JSON/XML from web and mobile sources.

DDM 17
Assignment
Enable horizontal scaling in cloud environments.

Provide flexibility for storing web logs, IoT data, and social media streams.

Hadoop:
An open-source batch-processing framework based on the MapReduce algorithm,
designed for processing huge datasets across many computers.

Key components:

HDFS: Distributed file system for scalable storage.

MapReduce: Parallel processing algorithm.

Hive/Pig: Tools for querying and managing large datasets.

Contribution to Data Warehousing:

Processes massive volumes of raw data before it enters the warehouse.

Often used alongside traditional warehouses to create data lakes for broader
analytics.

c) Discuss the impact of big data analytics on applications and social


implications. How does the integration of big data and analytics shape
organizational decision-making?
Big data analytics is transforming sectors such as:

Business: Enables personalization and predictive marketing.

E-Government: Drives policy decisions using social media sentiment.

Healthcare: Supports preventive care via data from wearables and genomics.

Security: Helps detect fraud, terrorism threats, and cybercrime.

Social Implications:
1. Privacy vs. Collective Benefits:

Raises concerns about data ownership, consent, and surveillance.

High-profile cases (e.g., Snowden revelations) fuel the debate on government


access to personal data.

DDM 1
Assignment 8
2. Ethics and Regulation:

Need for frameworks ensuring ethical data use and individual rights protection.

Organizations must consider the legal ramifications of storing and analyzing


personal data.

Impact on Decision-Making:
Enables evidence-based, data-driven strategies.

Facilitates real-time insights, adaptive business models, and operational


optimization.

Encourages the integration of advanced predictive and prescriptive analytics for


competitive advantage.

7. Advanced Database Topics


a) Discuss the importance of data quality and integration. Explain
the concepts of data governance, data quality characteristics, and
strategies for improvement.

Importance of Data Quality and Integration:


High-quality data is critical to decision-making, operations, regulatory compliance,
and analytics.

Poor data quality is a leading reason for failure in business initiatives and can reduce
productivity by up to 20%.

Integration is needed to merge disparate data sources (e.g., across departments,


legacy systems, and external feeds) into a unified and consistent view.

Data Governance:
A high-level organizational process that oversees data quality, integration,
architecture, and compliance.

Requires sponsorship from senior leadership and participation from data stewards
across departments.

Establishes data access rules, quality metrics, and regulatory policies (e.g., SOX,
HIPAA).

DDM 1
Assignment 9
Characteristics of Quality Data:
1. Identity Uniqueness

2. Accuracy

3. Consistency

4. Completeness

5. Timeliness

6. Currency

7. Conformance

8. Referential Integrity.

Strategies for Data Quality Improvement:


1. Data Stewardship Programs

2. Data Profiling and Audits

3. Improved Data Capture Processes

4. ETL (Extract, Transform, Load) Cleansing

5. Adoption of Master Data Management (MDM)

6. Application of Total Quality Management (TQM) principles

7. Use of metadata repositories for transparency.

b) Explore big data analytics, including NoSQL databases, Hadoop,


and integrated analytics platforms. How do these technologies
contribute to advanced database functionalities?
Big data refers to datasets that are high in volume, velocity, and variety, making
them hard to manage with traditional tools.

Analytics involves using data mining, machine learning, and statistical tools for
descriptive, predictive, and prescriptive analysis.

NoSQL Databases:

DDM 2
Assignment 0
Designed for flexibility and scalability.

Types include:

Key-value stores (e.g., Redis)

Document stores (e.g., MongoDB)

Wide-column stores (e.g., Cassandra)

Graph databases (e.g., Neo4j)

Useful for semi-structured and unstructured data, especially from web and IoT
sources.

Hadoop:
An open-source framework for distributed processing of large datasets.

Components:

HDFS (storage)

MapReduce (processing)

Pig/Hive (querying)

HBase (NoSQL storage)

Enables schema-on-read and scales horizontally for cost-effective storage and


compute.

Integrated Analytics Platforms:


Combine data ingestion, storage, and advanced analytics in one solution.

Examples: IBM Big Data Platform, HP Haven, Teradata Aster.

These platforms support real-time analytics, data science, and machine learning
integration.

c) Describe the role of data and database administration. Discuss


topics such as data security, authorization, and IT change
management.

Data Administration (DA):


Focuses on policy, planning, and coordination.

DDM 21
Assignment
Responsibilities:

Set data standards and definitions.

Manage metadata and corporate data dictionaries.

Resolve data ownership and usage disputes.

Database Administration (DBA):


More technical and operational.

Responsibilities:

Physical/logical database design.

Performance tuning, backup, and recovery.

Enforcing security, integrity, and availability.

Installing/upgrading DBMS software.

Data Security:
Protection against unauthorized access, data breaches, and malicious activity.

Techniques:

Views and authorization rules

Encryption and authentication

Auditing and logging

Role-based access controls.

Authorization:
Specifies who can access what data under which conditions.

Based on user roles, with fine-grained controls at table, column, or operation levels.

IT Change Management:
Involves planning and tracking changes to infrastructure and databases.

Essential for regulatory compliance (e.g., SOX), ensuring that all changes are
authorized, tested, and documented.

8. Distributed Database and Object-Oriented Data


Modeling
DDM 22
Assignment
a) Discuss the business reasons for distributed databases. Discuss the
advantages and disadvantages of distributed databases over centralized
databases.

Business Reasons for Distributed Databases:


1. Geographic Distribution of Business Units: Many organizations are spread across
different locations, which encourages localized control of data.

2. Data Sharing Across Units: Enables coordinated business operations and decisions
that span departments or geographic regions.

3. Reduced Communication Costs: Storing data closer to its usage point reduces
transmission costs and response times.

4. System Reliability and Recovery: Replication across nodes ensures data


availability during failures.

5. Vendor and Application Diversity: Supports environments using software from


multiple vendors.

6. Support for OLTP and OLAP: Balances operational and analytical processing
needs across systems.

Advantages Over Centralized Databases:


Increased reliability and availability

Local control for better administration

Modular growth by adding local systems

Lower communication costs

Faster response time for local queries.

Disadvantages:
Software complexity due to coordination needs

Processing overhead from inter-site communication

Data integrity challenges across distributed sites

Potentially slower responses if queries require remote data access.

b) Discuss data replication, its types, advantages, and disadvantages.


Explore distributed database strategies and factors influencing the

DDM 23
Assignment
choice of a distributed strategy.

Data Replication:
Definition: The process of storing copies of data at multiple sites in a distributed database
system.

Types:

1. Full replication: Entire database is copied at each site.

2. Partial replication: Only frequently accessed or critical parts are replicated.

3. Synchronous replication: Updates are applied across all replicas simultaneously.

4. Asynchronous replication: Updates are applied with a delay, tolerating temporary


inconsistencies.

Advantages:
1. Reliability: Sites remain operational even if others fail.

2. Fast response time: Local queries don’t rely on remote communication.

3. Reduced complexity of distributed transactions

4. Node decoupling: Local operation even with limited connectivity.

5. Reduced network traffic during peak hours.

Disadvantages:
1. High storage requirements

2. Complex data integrity maintenance

3. Update complexity across multiple copies.

Distributed Database Strategies:


1. Centralized

2. Replicated with periodic snapshots

3. Replicated with near-real-time synchronization

4. Horizontally/vertically partitioned

5. Non-integrated independent databases.

DDM 24
Assignment
Factors Influencing Strategy Choice:
Organizational needs and autonomy

Clustering of data access

Scalability and expansion potential

Node capabilities and technology constraints

Need for reliable service and critical uptime.

c) Introduce object-oriented data modeling. Compare it with EER


data modeling and discuss key concepts such as classes, objects,
associations, and inheritance.

Object-Oriented Data Modeling (OODM):


Models both data and behavior of real-world entities.

Central elements: Classes, Objects, Encapsulation, Inheritance, and


Polymorphism.

Captured using UML (Unified Modeling Language) diagrams.

Key Concepts:
Class: Represents an abstract definition of an entity (e.g., Car, Employee).

Object: A specific instance of a class, holding attribute values and behaviors.

Association: Defines relationships between classes (like foreign keys in relational


models).

Inheritance: A subclass inherits attributes and methods from its superclass (e.g., Car
and Truck from Vehicle).

Polymorphism: A method behaves differently based on the object class calling it.

Comparison with EER Model:


Feature EER Model Object-Oriented Model
Focus Data only Data + Behavior
Reuse Limited High (via inheritance)
Captured via ER notation Supports associations, aggregation, and composition
Relationships

DDM 25
Assignment
Behavior
Not supported Supported via methods/operations
Modeling

Representation Static diagrams Dynamic with class & object diagrams

The object-oriented model is more expressive and powerful for modeling complex
systems involving both data and operations.

DDM 26
Assignment

You might also like