0% found this document useful (0 votes)

69 views103 pages

RAC Presentation Oracle10gR2

The document discusses Real Application Clusters (RAC) in Oracle Database 10g. RAC allows a single database to run across multiple clustered nodes in a grid, pooling processing resources. It provides high availability, scalability, and performance using low-cost servers. Key components of RAC include the Global Cache Service, Global Enqueue Service, and Global Resource Directory.

Uploaded by

hemalroha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views103 pages

RAC Presentation Oracle10gR2

Uploaded by

hemalroha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 103

Real Application Cluster (RAC)

Kishore A

What is all the hype about grid computing? Grid computing is intended to allow businesses to move away from the idea of many individual servers, each of which is dedicated to a small number of applications. When configured in this manner, applications often either do not fully utilize the servers available hardware resource such as memory, CPU and disk or short of these resources during peak usage. Grid computing addresses these problems by providing an adaptive software infrastructure that makes efficient use of low-cost servers and modular storage, which balances work- loads more effectively and provides capacity on demand By scaling out with small servers in small

Oracle10g - RAC

WHAT IS ENTERPRISE GRID COMPUTING? Implement One from Many. Grid computing coordinates the use of clusters of machines to create a single logical entity, such as a database or an application server. By distributing work across many servers, grid computing exhibits benefits of availability, scalability, and performance using low-cost components. Because a single logical entity is implemented across many machines, companies can add or remove capacity in small increments, online. With the capability to add capacity on demand to a particular function, companies get more flexibility for adapting to peak loads, thus

Benefits of Enterprise Grid Computing

The primary benefit of grid computing to businesses is achieving high quality of service and flexibility at lower cost. Enterprise grid computing lowers costs by: Increasing hardware utilization and resource sharing Enabling companies to scale out incrementally with lowcost components Reducing management and administration requirements

New Trends in Hardware

Much of what makes grid computing possible today are the innovations in hardware. For example, Processors. New low-cost, high volume Intel Itanium 2, Sun SPARC, and IBM PowerPC 64-bit processors now deliver performance equal to or better than exotic processors used in high-end SMP servers. Blade servers. Blade server technology reduces the cost of hardware and increases the density of servers, which further reduces expensive data center real estate requirements. Networked storage. Disk storage costs continue to plummet even faster than processor costs. Network storage technologies such as Network Attached Storage (NAS) and Storage Area Networks (SANs) further reduce these costs by enabling

Oracle Database 10g Oracle Database 10g builds on the success of Oracle9i Database, and adds many new grid-specific capabilities. Oracle Database 10g is based on Real Application Clusters, introduced in Oracle9i. There are more than 500 production customers running Oracles clustering technology, helping to prove the validity of Oracles grid infrastructure.

Real Application Clusters Oracle Real Application Clusters enables a single database to run across multiple clustered nodes in a grid, pooling the processing resources of several standard machines. In Oracle 10g, the database can immediately begin balancing workload across a new node with new processing capacity as it gets reprovisioned from one database to another, and can relinquish a machine when it is no longer needed-this is capacity on demand. Other databases cannot grow and shrink while running and, therefore, cannot utilize hardware as efficiently. Servers can be easily added and dropped to an Oracle cluster with no downtime.

RAC 10g Architecture //

public network

Node1

VIP1 Service Listener instance 1 ASM

Oracle Clusterware

VIP2 Service Listener instance 2 ASM

Oracle Clusterware

Node 2

VIPn Service Listener instance n ASM

Oracle Clusterware

Node n

Operating System

Operating System shared storage

Operating System

Managed by ASM RAW Devices

Redo / Archive logs all instances Database / Control files OCR and Voting Disks

Under the Covers

Cluster Private High Speed Network
LMON LMD0 DIAG LMON LMD0 DIAG LMON LMD0 DIAG

Instance 1
SGA

Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache

Instance 2
SGA

Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache

Instance n
SGA

Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache

LCK0 LMS0

LGWR SMON

DBW0 PMON

LCK0 LMS0

LGWR SMON

DBW0 PMON

LCK0 LMS0

LGWR SMON

DBW0 PMON

Node 1

Node 2

Node n

Redo Log Files

Data Files and Control Files

Global Resource Directory RAC Database System has two important services. They are Global
Cache Service (GCS) and Global Enqueue Service (GES). These are basically collections of background processes. These two processes together cover and manage the total Cache Fusion process, resource transfers, and resource escalations among the instances. Global Resource Directory GES and GCS together maintain a Global Resource Directory (GRD) to record the information about the resources and the enqueues. GRD remains in the memory and is stored on all the instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC. Global Resource Directory (GRD) is the internal database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instances cache the GRD is updated. The following resources information is available in GRD. * Data Block Identifiers (DBA) * Location of most current version * Modes of the data blocks: (N)Null, (S)Shared, (X)Exclusive * The Roles of the data blocks (local or global) held by each instance * Buffer caches on multiple nodes in the cluster

Background Processes in a RAC instance

Select name,description from v$bgprocess where paddr <> 00 The one specific to a RAC instance are the DIAG, LCK, LMON, LMNDn and LMSn process.

DIAG : Diagnosability Daemon

The diagnosability daemon is responsible for capturing information on process failures in a RAC environment, and writing out trace information for failure analysis. The information produced by DIAG is most useful when working in conjunction with Oracle Support to troubleshoot causes for a failure. Only a single DIAG process is needed for each instance

LCK: Lock Process

The lock process (LCK) manages requests that are not cache-fusion requests, such as row cache requests and library cache requests Only a single LCK process is allowed for each instance. LCK maintains a list of lock elements and uses this list to validate locks during instance recovery

LMD:Lock Manager Daemon Process

The global enqueue service daemon (LMD) is a lock agent process that coordinates enqueue manager service requests. The requests are for global cache service enqueues that control access to global enqueues and resources. The LMD process also handles deadlock detection and remote enqueue requests.

LMON: Lock Monitor Process

LMON is the global enqueue service monitor. It is responsible for the reconfiguration of lock resources when an instance joins the cluster or leaves the cluster, and also is responsible for the dynamic lock remastering LMON will generate a trace file whenever a reconfiguration occurs (as opposed to remastering of a subset of locks). It is the responsibility of LMON to check for the death of instances clusterwide, and to initiate reconfiguration as quickly as possible

LMS: Lock Manager Server Process

The LMS process (or global cache service process) is in charge of shipping the blocks between instances for cache-fusion requests. In the event of a consistent-read request, the LMS process will first roll the block back, creating the consistent read (CR) image of the block, and will then ship that version of the block across the interconnect to the foreground process making the request at the remote instance. In addition, LMS must interact with the LMD process to retrieve lock requests placed by LMD. An instance may dynamically generate up to LMS

Server Control Utility

To manage the RAC database and its instances, Oracle has provided a new utility called the Server Control Utility (SRVCTL). This replaces the earlier utility opsctl which was used in the parallel server. The Server Control Utility is a single point of control between the Oracle Intelligent agent and each node in the RAC system. The SRVCTL communicates with the global daemon service (GSD) and resides on each of the nodes. The SRVCTL gathers information from the database and instances and acts as an intermediary between nodes and the Oracle Intelligent agent. When you use the SRVCTL to perform configuration operations on your cluster, the SRVCTL stores configuration data in the Server Management (SRVM) configuration repository. The SRVM includes all the components of Enterprise Manager such as the Intelligent Agent, the Server Control Utility (SRVCTL), and the Global Services Daemon. Thus, the SRVCTL is one of the SRVM Instance

Server Control Utility

For the SRVCTL to function, the Global Services Daemon (GSD) should be running on the node. The SRVCTL performs mainly two types of administrative tasks: Cluster Database Tasks and Cluster Database Configuration Tasks. SRVCTL Cluster Database tasks include: Starts and stops cluster databases. Starts and stops cluster database instances. Starts and stops listeners associated with a cluster database instance. Obtains the status of a cluster database instance. Obtains the status of listeners associated with a cluster database. SRVCTL Cluster Database Configuration tasks include: Adds and deletes cluster database configuration information. Adds an instance to, or deletes an instance from a cluster database. Renames an instance name within a cluster database configuration. Moves instances in a cluster database configuration. Sets and unsets the environment variable for an instance in a cluster database configuration. Sets and unsets the environment variable for an entire cluster in a

RAW Partitions, Cluster File System and Automatic Storage Management (ASM)

Raw Partitions are a set of unformatted devices on a shared disk sub-system.A raw partition is a disk drive device that does not have a file system set up. The raw partition is portion of the physical disk that is accessed at the lowest possible level. The actual application that uses a raw device is responsible for managing its own I/O to the raw device with no operating system buffering. Traditionally, they were required for Oracle Parallel Server (OPS) and they provided high performance by bypassing the file system overhead. Raw partitions were used in setting up databases for performance gains and for the purpose of concurrent access by multiple nodes in the cluster without system-level buffering.

RAW Partitions, Cluster File System and Automatic Storage Management (ASM) Oracle 9i RAC and 10g now supports both the cluster file system and the raw devices to store the shared data. In addition, 10g RAC supports shared storage resources from ASM instance. You will be able to create the data files out of the disk resources located in the ASM instance. The ASM resources are sharable and accessed by all the

RAW Devices

Raw Devices have been in use for very long time. They were the primary storage structures for data files of the Oracle Parallel Server. They remain in use even in the RAC versions 9i and 10g. Raw Devices are difficult to manage and administer, but provide high performing shared storage structures. When you use the raw devices for data files, redo log files and control files, you may have to use the local file systems or some sort of network attached file system for writing the archive log files, handling the utl_file_dir files and files supporting the external tables. On Raw Devices On Local File System Data files Archive log files Redo files Oracle Home files Control files CRS Home files Voting Disk Alert log, Trace files

RAW Devices

Advantages Raw partitions have several advantages: They are not subject to any operating system locking. The operating system buffer or cache is bypassed, giving performance gains and reduced memory consumption. Multiple systems can be easily shared. The application or database system has full control to manipulate the internals of access. Historically, the support for asynchronous I/O on UNIX systems was generally limited to raw partitions

RAW Devices

Issues and Difficulties There are many administrative inconveniences and drawbacks such as: The unit of allocation to the database is the entire raw partition. We cannot use a raw partition for multiple tablespaces. A raw partition is not the same as a file system where we can create many files. Administrators have to create them with specific sizes. When the databases grow in size, raw partitions cannot be extended. We need to add extra partitions to support the growing tablespace. Sometimes we may have limitations on the total number of raw partitions we can use in the system. Furthermore, there are no database operations that can occur on an individual data file. There is, therefore, no logical benefit from having a tablespace consisting of many data files except for those tablespaces that are larger than the maximum Oracle can support in a single file. We cannot use the standard file manipulation commands on the raw partitions, and thus, on the data files. We cannot

RAW Devices

Raw partitions cannot be used for writing the archive logs. Administrators need to keep track of the raw volumes with their cryptic naming conventions. However, by using the symbolic links, we can reduce the hassles associated with names. For example, a cryptic name like /dev/rdsk/c8t4d5s4 or a name like /dev/sd/sd001 is an administrative challenge. To alleviate this, administrators often rely on symbolic links to provide logical names that make sense. This, however, substitutes one complexity for another. In a clustered environment like Linux clusters, it is not guaranteed that the physical devices will have the same device names on different nodes or across reboots of a single node. To solve this problem, manual intervention is needed, which will increase administration overhead.

Cluster File System

CFS offers a very good shared storage facility for building the RAC database. CFS provides a shared file system, which is mounted on all the cluster nodes simultaneously. When you implement the RAC database with the commercial CFS products such as the Veritas CFS or PolyServe Matrix Server, you will able to store all kinds of database files including the shared Oracle Home and CRS Home. However, the capabilities of the CFS products are not the same. For example, Oracle CFS (OCFS), used in case of Linux RAC implementations, has limitations. It is not a general purpose file system. It cannot be used for shared Oracle Home.

Cluster File System

On Cluster File System Data files Archive Log files Redo files Oracle Home Files Control files Alert log,Trace files Voting Disk Files for External Tables OCR File utl_file_dir location A cluster file system (CFS) is a file system that may be accessed (read and write) by all the members in the cluster at the same time. This implies that all the members of the cluster have the same view. Some of the popular and widely used cluster file system products for Oracle RAC include HP Tru64 CFS, Veritas CFS, IBM GPFS, Polyserve Matrix Server, and Oracle Cluster File system. The cluster file system offers: Simple management. The use of Oracle Managed Files with RAC. A Single Oracle Software Installation. Auto-extend Enabled on Oracle Data Files. Uniform accessibility of Archive Logs.

ASM Automatic Storage Management

ASM is the new star on the block. ASM provides a vertical integration of the file system and volume manager for Oracle database files. ASM has the capability to spread database files across all available storage for optimal performance and resource utilization. It enables simple and non-intrusive resource allocation and provides automatic rebalancing When you are using the ASM for building shared files, you would get almost the same performance as that of raw partitions. The ASM controlled disk devices will be part of ASM instance, which can be shared by the RAC database instance. It is similar to the situation where raw devices supporting the RAC database had to be shared by multiple nodes. The shared devices need to be presented to multiple nodes on the cluster and those devices will be input to the ASM instance. There will be an ASM instance supporting each RAC instance on the respective node

ASM Automatic Storage Management

From the ASM instance On local or CFS Data files Oracle Home Files Redo files CRS Home Files Control files Alert log, trace files Archive log files Files for external tables --------------------------util_file_dir location Voting Disk and OCR file Are located on raw partitions

ASM is for more Oracle specific data, redo log files and archived log files.

Automatic Storage Management

Automatic Storage Management simplifies storage management for Oracle Databases. Instead of managing many database files, Oracle DBAs manage only a small number of disk groups. A disk group is a set of disk devices that Oracle manages as a single, logical unit. An administrator can define a particular disk group as the default disk group for a database, and Oracle automatically allocates storage for and creates or deletes the files associated with the database object. Automatic Storage Management also offers the benefits of storage technologies such as RAID or Logical Volume Managers (LVMs). Oracle can balance I/O from multiple databases across all of the devices in a disk group, and it implements striping and mirroring to improve I/O performance and data reliability. Because Automatic Storage Management is written to work exclusively with Oracle, it achieves better performance than generalized storage virtualization solutions.

Shared Disk Storage Oracle RAC relies on a shared disk architecture. The database files, online redo logs, and control files for the database must be accessible to each node in the cluster. The shared disks also store the Oracle Cluster Registry and Voting Disk. There are a variety of ways to configure shared storage including direct attached disks (typically SCSI over copper or fiber), Storage Area Networks (SAN), and Network Attached Storage (NAS). Private Network Each cluster node is connected to all other nodes via a private high-speed network, also known as the cluster interconnect or high-speed interconnect (HSI). This network is used by Oracle's Cache Fusion technology to effectively combine the physical memory (RAM) in each host into a single cache. Oracle Cache Fusion allows data stored in the cache of one Oracle instance to be accessed by any other instance by transferring it across

The private network is typically built with Gigabit Ethernet, but for high-volume environments, many vendors offer proprietary low-latency, high-bandwidth solutions specifically designed for Oracle RAC. Linux also offers a means of bonding multiple physical NICs into a single virtual NIC to provide increased bandwidth and availability. Public Network To maintain high availability, each cluster node is assigned a virtual IP address (VIP). In the event of host failure, the failed node's IP address can be reassigned to a surviving node to allow applications to continue accessing the database through the same IP address.

It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen. The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately. This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

The Oracle CRS contains all the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, CRS will send messages (via a special ping operation) to all nodes configured in the clusteroften called the "heartbeat." If the heartbeat fails for any of the nodes, it checks with the CRS configuration files (on the shared disk) to distinguish between a real node failure and a network failure. CRS maintains two files: the Oracle Cluster Registry (OCR) and the Voting Disk. The OCR and the Voting Disk must reside on shared disks

The Voting Disk is used by the Oracle cluster manager in various layers. The Cluster Manager and Node Monitor accepts registration of Oracle instances to the cluster and it sends ping messages to Cluster Managers (Node Monitor) on other RAC nodes. If this heartbeat fails, oracm uses a quorum file or a quorum partition on the shared disk to distinguish between a node failure and a network failure. So if a node stops sending ping messages, but continues writing to the quorum file or partition, then the other Cluster Managers can recognize it as a network failure. Hence the availability from the Voting Disk is critical for the operation of the Oracle Cluster Manager. The shared volumes created for the OCR and the voting disk should be configured using RAID to protect against media failure. This requires the use of an external cluster volume manager, cluster file system, or storage hardware that provides RAID protection. . Oracle Cluster Registry (OCR) is used to store the cluster configuration information among other things. OCR needs to be accessible from all nodes in the

Cache Fusion One of the bigger differences between Oracle RAC and OPS is the presence of Cache Fusion technology. In OPS, a request for data between nodes required the data to be written to disk first, and then the requesting node could read that data. In RAC, data is passed along with locks. Every time an instance wants to update a block, it has to obtain a lock on it to make sure no other instance in the cluster is updating the same block. To resolve this problem, Oracle does a data block ping mechanism that allows it to get the status of the specific block before reading it from the disk. Cache Fusion resolves data block read/read, read/write and write/write conflicts among ORACLE database nodes through high performance interconnect networks, bypassing much slower physical disk operations used in previous releases. Using Oracle 9i RAC cache fusion feature, close to linear scalability of database performance can be achieved when adding nodes to the cluster. ORACLE enables better Database capacity