Chapter 1 OO Database
Chapter 1 OO Database
Object-oriented databases aim to achieve integration of data and programs, but this is now
achieved in an adequate methodical framework, which borrows much from abstract data types.
With object database, we can define arbitrarily complex data types like their programming
language counterpart. Sets, list, and bags (multisets) and other containers are used, among other
things, to represent the result of a query that returns several objects.
An object typically has two components: state (value) and behavior (operations). It can have a
complex data structure as well as specific operations defined by the programmer. Objects in an
OOPL exist only during program execution; therefore, they are called transient objects. An OO
database can extend the existence of objects so that they are stored permanently in a database,
and hence the objects become persistent objects that exist beyond program termination and can
be retrieved later and shared by other programs.
In other words, OO databases store persistent objects permanently in secondary storage, and
allow the sharing of these objects among multiple programs and applications. This requires the
incorporation of other well-known features of database management systems, such as indexing
mechanisms to locate objects efficiently, concurrency control to allow object sharing among
concurrent programs, and recovery from failures. An OO database system will typically interface
with one or more OO programming languages to provide persistent and shared object
capabilities.
The main property required of an OID is that it be immutable; that is, the OID value of a
particular object should not change. This preserves the identity of the real-world object being
represented. Hence, an ODMS must have some mechanism for generating OIDs and preserving
the immutability property. It is also desirable that each OID be used only once; that is, even if an
object is removed from the database, its OID should not be assigned to another object. These two
properties imply that the OID should not depend on any attribute values of the object, since the
value of an attribute may be changed or corrected. We can compare this with the relational
model, where each relation must have a primary key attribute whose value identifies each tuple
uniquely. In the relational model, if the value of the primary key is changed, the tuple will have a
new identity, even though it may still represent the same real-world object. Alternatively, a real-
world object may have different names for key attributes in different relations, making it difficult
to ascertain that the keys represent the same real-world object (for example, the object identifier
may be represented as Emp_id in one relation and as Ssn in another).
It is inappropriate to base the OID on the physical address of the object in storage, since the
physical address can change after a physical reorganization of the database. However, some early
ODMSs have used the physical address as the OID to increase the efficiency of object retrieval.
If the physical address of the object changes, an indirect pointer can be placed at the former
address, which gives the new physical location of the object. It is more common to use long
integers as OIDs and then to use some form of hash table to map the OID value to the current
physical address of the object in storage.
Another feature of an ODMS (and ODBs in general) is that objects and literals may have a type
structure of arbitrary complexity in order to contain all of the necessary information that
describes the object or literal. In contrast, in traditional database systems, information about a
complex object is often scattered over many relations or records, leading to loss of direct
correspondence between a real-world object and its database representation. In ODBs, a complex
type may be constructed from other types by nesting of type constructors. The three most basic
constructors are atom, struct (or tuple), and collection.
1. Atom constructor: This includes the basic built-in data types of the object model, which
are similar to the basic types in many programming languages: integers, strings, floating
point numbers, enumerated types, Booleans, and so on. They are called single-valued or
atomic types, since each value of the type is considered an atomic (indivisible) single
value.
2. struct (or tuple) constructor: This can create standard structured types, such as the
tuples (record types) in the basic relational model. A structured type is made up of several
components, and is sometimes referred to as a compound or composite type. More
accurately, the struct constructor is not considered a type, but rather a type generator,
because many different structured types can be created. For example, two different
structured types that can be created are:
struct Name<FirstName: string, MiddleInitial: char, LastName: string>, and
struct CollegeDegree<Major: string, Degree: string, Year: date>
To create complex nested type structures in the object model, the collection type constructors are
needed. Notice that the type constructor’s atom and struct are the only ones available in the
original (basic) relational model.
3. Collection (or multivalued) type constructors: this include the set(T), list(T), bag(T),
array(T), and dictionary(K,T) type constructors. These allow part of an object or literal
value to include a collection of other objects or values when needed. These constructors
are also considered to be type generators because many different types can be created.
For example, set(string), set(integer), and set(Employee) are three different types that can
be created from the set type constructor. All the elements in a particular collection value
Finally, the dictionary constructor creates a collection of two tuples (K, V), where the value of
a key K can be used to retrieve the corresponding value V. The main characteristic of a collection
type is that its objects or values will be a collection of objects or values of the same type that may
be unordered (such as a set or a bag) or ordered (such as a list or an array).
An object definition language (ODL) that incorporates the preceding type constructors can be
used to define the object types for a particular database application. The type constructors can be
used to define the data structures for an OO database schema. Figure 1.1 shows how we may
declare EMPLOYEE and DEPARTMENT types. In Figure 1.1, the attributes that refer to other
objects—such as Dept of EMPLOYEE or Projects of DEPARTMENT—are basically OIDs
that serve as references to other objects to represent relationships among the objects. For
example, the attribute Dept of EMPLOYEE is of type DEPARTMENT, and hence is used to
refer to a specific DEPARTMENT object (the DEPARTMENT object where the employee
works). The value of such an attribute would be an OID for a specific DEPARTMENT object. A
binary relationship can be represented in one direction, or it can have an inverse reference.
The latter representation makes it easy to traverse the relationship in both directions. For
example, in Figure 1.1 the attribute Employees of DEPARTMENT has as its value a set of
references (that is, a set of OIDs) to objects of type EMPLOYEE; these are the employees who
work for the DEPARTMENT. The inverse is the reference attribute Dept of EMPLOYEE.
define type EMPLOYEE
Locations: set(string);
Employees: set(EMPLOYEE);
Projects: set(PROJECT); );
Figure 1.1 Specifying the object types EMPLOYEE, DATE, and DEPARTMENT using type
constructors.
The external users of the object are only made aware of the interface of the operations, which
defines the name and arguments (parameters) of each operation. The implementation is hidden
from the external users; it includes the definition of any hidden internal data structures of the
object and the implementation of the operations that access these structures. The interface part of
an operation is sometimes called the signature, and the operation implementation is sometimes
called the method.
For database applications, the requirement that all objects be completely encapsulated is too
stringent. One way to relax this requirement is to divide the structure of an object into visible and
hidden attributes (instance variables). Visible attributes can be seen by and are directly
accessible to the database users and programmers via the query language. The hidden attributes
of an object are completely encapsulated and can be accessed only through predefined
operations. Most ODMSs employ high-level query languages for accessing visible attributes.
The term class is often used to refer to a type definition, along with the definitions of the
operations for that type. Figure 1.2 shows how the type definitions in Figure 1.1 can be extended
with operations to define classes. A number of operations are declared for each class, and the
signature (interface) of each operation is included in the class definition.
define class EMPLOYEE
type tuple ( Fname: string;
Minit: char;
Lname: string;
Ssn: string;
Birth_date: DATE;
Address: string;
Sex: char;
Salary: float;
Supervisor: EMPLOYEE;
Dept: DEPARTMENT; );
operations age: integer;
create_emp: EMPLOYEE;
destroy_emp: boolean;
end EMPLOYEE;
Projects set(PROJECT); );
operations no_of_emps: integer;
create_dept: DEPARTMENT;
destroy_dept: boolean;
assign_emp(e: EMPLOYEE): boolean;
(* adds an employee to the department *)
remove_emp(e: EMPLOYEE): boolean;
(* removes an employee from the department *)
end DEPARTMENT;
Figure 1.2 Adding operations to the definitions of EMPLOYEE and DEPARTMENT.
A method (implementation) for each operation must be defined elsewhere using a programming
language. Typical operations include the object constructor operation (often called new), which
is used to create a new object, and the destructor operation, which is used to destroy (delete) an
object. A number of object modifier operations can also be declared to modify the states
(values) of various attributes of an object. Additional operations can retrieve information about
the object.
An operation is typically applied to an object by using the dot notation. For example, if d is a
reference to a DEPARTMENT object, we can invoke an operation such as no_of_emps by
writing d.no_of_emps. Similarly, by writing d.destroy_dept, the object referenced by d is
destroyed (deleted). The only exception is the constructor operation, which returns a reference to
a new DEPARTMENT object. Hence, it is customary in some OO models to have a default name
for the constructor operation that is the name of the class itself, although this was not used in
Figure 1.2. The dot notation is also used to refer to attributes of an object—for example, by
writing d.Dnumber or d.Mgr_Start_date.
Obviously, it is not practical to give names to all objects in a large database that includes
thousands of objects, so most objects are made persistent by using the second mechanism, called
reachability. The reachability mechanism works by making the object reachable from some
other persistent object. An object B is said to be reachable from an object A if a sequence of
references in the database lead from object A to object B.
If we first create a named persistent object N, whose state is a set (or possibly a bag) of objects
of some class C, we can make objects of C persistent by adding them to the set, thus making
them reachable from N. Hence, N is a named object that defines a persistent collection of
objects of class C. For example, we can define a class DEPARTMENT_SET (see Figure 1.3)
whose objects are of type set(DEPARTMENT). We can create an object of type
DEPARTMENT_SET, and give it a persistent name ALL_DEPARTMENTS, as shown in Figure
1.3. Any DEPARTMENT object that is added to the set of ALL_DEPARTMENTS by using the
add_dept operation becomes persistent by virtue of its being reachable from
ALL_DEPARTMENTS. Notice the difference between traditional database models and ODBs in
this respect.
The concept of subtype is useful when the designer or user must create a new type that is similar
but not identical to an already defined type. The subtype then inherits all the functions of the
predefined type, which is referred to as the supertype. For example, suppose that we want to
define two new types EMPLOYEE and STUDENT as follows:
EMPLOYEE: Name, Address, Birth_date, Age, Ssn, Salary, Hire_date, Seniority
STUDENT: Name, Address, Birth_date, Age, Ssn, Major, Gpa
Since both STUDENT and EMPLOYEE include all the functions defined for PERSON plus
some additional functions of their own, we can declare them to be subtypes of PERSON. Each
will inherit the previously defined functions of PERSON—namely, Name, Address, Birth_date,
Age, and Ssn. For STUDENT, it is only necessary to define the new (local) functions Major and
Gpa, which are not inherited. Presumably, Major can be defined as a stored attribute, whereas
Gpa may be implemented as an operation that calculates the student’s grade point average by
accessing the Grade values that are internally stored (hidden) within each STUDENT object as
hidden attributes. For EMPLOYEE, the Salary and Hire_date functions may be stored attributes,
whereas Seniority may be an operation that calculates Seniority from the value of Hire_date.
Therefore, we can declare EMPLOYEE and STUDENT as follows:
An extent is a named persistent object whose value is a persistent collection that holds a
collection of objects of the same type that are stored permanently in the database. The objects
can be accessed and shared by multiple programs. It is also possible to create a transient
collection, which exists temporarily during the execution of a program but is not kept when the
program terminates. For example, a transient collection may be created in a program to hold the
result of a query that selects some objects from a persistent collection and copies those objects
into the transient collection. The program can then manipulate the objects in the transient
collection, and once the program terminates, the transient collection ceases to exist. In general,
numerous collections—transient or persistent—may contain objects of the same type. The
inheritance model discussed in this section is very simple.