UNIT 5-Distributed Data Bases Part-1
UNIT 5-Distributed Data Bases Part-1
The object-oriented paradigm combines both data and behavior into a single entity called an
object.
This is different from traditional relational databases, where data (in tables) and behavior (in
queries and applications) are separate.
Key features of the object-oriented paradigm include:
encapsulation:
In object-oriented systems, data (attributes) and behavior (methods or functions) are bundled
together into an object. This allows for controlled access to the data, usually through well-
defined methods (i.e., getters and setters).
The principle of encapsulation ensures that the object's internal state is hidden and only
accessible through the object's methods.
Inheritance:
•Inheritance allows new objects (or classes) to be based on existing ones, inheriting their
attributes and methods. This promotes code reuse and abstraction.
•A subclass can extend or override the behavior of a superclass, allowing for a hierarchical
structure of objects.
Polymorphism:
Polymorphism allows objects of different types to be treated as objects of a common super-
type. For instance, the same method might perform different tasks depending on the type of
object it is called on. This allows more flexibility in programming and object interactions.
Abstraction:
Abstraction simplifies complex systems by focusing on the essential characteristics of an
object, hiding the unnecessary details.
In databases, this helps by allowing users to interact with objects without worrying about the
underlying implementation.
The Object Model
The object model defines how objects are structured and interact in a database system. It
specifies how the data is organized and how operations can be performed on it.
Object Identity: Every object in an object-oriented database has a unique identity, even if
two objects have the same values for their attributes. This identity ensures that each object
can be distinctly referenced and accessed, regardless of its attribute values.
Attributes: Objects in the database have attributes that describe their state or properties. For
example, a "Student" object may have attributes like name, ID, age, and grades.
Methods: Objects have methods (or functions) that define behaviors or operations that can be
performed on the object. For example, a "Student" object might have a method like
enroll(course) to register a student in a course.
Classes: A class is a blueprint or template for creating objects. It defines the structure
(attributes) and behavior (methods) that objects of that class will have.
When an object is created, it is an instance of a class. Classes can be arranged in a hierarchical
structure, allowing inheritance. For example, Graduate Student and Undergraduate Student
could both be subclasses of the base Student class.
The object model described above also applies to distributed database systems, but
additional challenges arise because data is spread across multiple sites or nodes. The key
concerns in distributed object modeling include:
The system should handle object location and access faultlessly. The system must manage the
distribution of objects across sites, while providing a unified view of data to the user.
Replication involves storing copies of an object at multiple sites, which ensures that the object is
available even if a site goes down. However, maintaining consistency across replicas can be a
complex issue.
Distributed systems need mechanisms to ensure that updates to objects are synchronized across
sites, ensuring consistency even in the face of concurrent access.
Object Distribution Design
Object distribution design focuses on how to store and manage objects (complex data types) in a
distributed database system, especially when data is spread across multiple sites. It deals with the
following key aspects:
Minimizing Communication Costs: Data should be distributed in a way that minimizes the need
for communication between distant sites when querying or modifying objects.
Efficient Access: Objects should be placed in locations that allow efficient access based on typical
usage patterns. This involves considering both spatial and temporal locality of access.
Load Balancing: The distribution should ensure that no node becomes a bottleneck, which would
degrade performance. Each site should ideally handle a balanced amount of data and queries.
Data Integrity and Consistency: Ensuring that distributed objects maintain consistency when
accessed by multiple users or applications is critical. It includes handling replication and
concurrency control across distributed nodes.
Object Fragmentation
One of the first steps in object distribution is deciding how objects are fragmented.
Fragmentation involves dividing an object into smaller parts that can be stored on different
nodes. This can be done in two main ways:
•Horizontal Fragmentation: The object is divided based on specific attribute values. For
example, different parts of an object might be stored on different nodes, depending on the
attribute values.
• Example: A "Student" object could be fragmented based on the department (e.g., one
fragment for Computer Science students, another for Electrical Engineering students).
•Vertical Fragmentation: The object is divided into different attributes, with each
fragment containing a subset of the object's attributes.
The decision to use horizontal or vertical fragmentation depends on factors such as the
query access patterns and data characteristics. In practice, both techniques are often
combined.
Replication involves storing multiple copies of an object or its fragments across different nodes
to improve fault tolerance and accessibility.
•Advantages of Replication:
• Improved Availability: If one node fails, replicas ensure that the object is still available
elsewhere.
• Performance Improvement: Queries can be answered from the closest replica, reducing
access time and network load.
•Challenges:
• Consistency: The replicated copies must remain consistent, which can be difficult when
multiple nodes modify the object concurrently.
• Synchronization: Updates to an object or its fragments need to be synchronized across
replicas
• .
In a distributed system, replication can be full replication (storing identical copies across all
nodes) or partial replication (replicating only parts of an object).
Object Distribution Strategies
The design of a distributed object system involves choosing an appropriate distribution strategy.
There are different strategies based on the type of object access and the objectives of the system.
•Placement-Based Distribution:
• Objects are placed on sites based on the expected access patterns (e.g., placing frequently
accessed objects closer to the users).
• Heuristic techniques can be applied to predict where objects will be most frequently
accessed.
•Data-Centric Distribution:
• Objects are distributed across nodes based on the attributes of the objects, and the system
places objects where they are likely to be used together.
•Access-Centric Distribution:
• In this approach, the distribution is based on the usage patterns (queries). If certain objects
are frequently accessed together, they are placed close to each other.
Design Considerations
When designing the distribution of objects across nodes, several factors need to be considered:
•Access Locality: Objects that are accessed together should be placed on the same node or close to
each other to minimize communication costs.
•Object Size: Large objects may need to be fragmented to improve access speed, while smaller
objects can be replicated.
•Network Topology: The physical network architecture influences how objects are distributed. It
should minimize communication delays between nodes.
•Query Processing: The distribution should support efficient query processing. This may involve
determining how objects are accessed by various queries and adjusting distribution accordingly.
•Fault Tolerance: The system must be designed to handle node failures gracefully. This requires
replication and mechanisms to handle recovery.
Example of Object Distribution
•Fragmentation: The product object can be fragmented based on categories (e.g., electronics,
clothing, etc.) to reduce the amount of data transferred when a user searches for a specific
category.
•Replication: The Product object might be replicated across different nodes to ensure
availability, with replicas maintained in geographically distributed data centers to improve
access time for users from different regions.
Object Distribution and Query Processing
Efficient query processing in distributed object systems is essential for minimizing the overhead
caused by the distribution:
•Migration of Objects: Objects may be migrated between nodes to optimize query performance.
If a query often accesses an object located on a different node, the system might migrate the
object to the node that is processing the query.