RAC Presentation Oracle10gR2
RAC Presentation Oracle10gR2
Kishore A
What is all the hype about grid computing? Grid computing is intended to allow businesses to move away from the idea of many individual servers, each of which is dedicated to a small number of applications. When configured in this manner, applications often either do not fully utilize the servers available hardware resource such as memory, CPU and disk or short of these resources during peak usage. Grid computing addresses these problems by providing an adaptive software infrastructure that makes efficient use of low-cost servers and modular storage, which balances work- loads more effectively and provides capacity on demand By scaling out with small servers in small
Oracle10g - RAC
WHAT IS ENTERPRISE GRID COMPUTING? Implement One from Many. Grid computing coordinates the use of clusters of machines to create a single logical entity, such as a database or an application server. By distributing work across many servers, grid computing exhibits benefits of availability, scalability, and performance using low-cost components. Because a single logical entity is implemented across many machines, companies can add or remove capacity in small increments, online. With the capability to add capacity on demand to a particular function, companies get more flexibility for adapting to peak loads, thus
Oracle Database 10g Oracle Database 10g builds on the success of Oracle9i Database, and adds many new grid-specific capabilities. Oracle Database 10g is based on Real Application Clusters, introduced in Oracle9i. There are more than 500 production customers running Oracles clustering technology, helping to prove the validity of Oracles grid infrastructure.
Real Application Clusters Oracle Real Application Clusters enables a single database to run across multiple clustered nodes in a grid, pooling the processing resources of several standard machines. In Oracle 10g, the database can immediately begin balancing workload across a new node with new processing capacity as it gets reprovisioned from one database to another, and can relinquish a machine when it is no longer needed-this is capacity on demand. Other databases cannot grow and shrink while running and, therefore, cannot utilize hardware as efficiently. Servers can be easily added and dropped to an Oracle cluster with no downtime.
Node1
Node 2
Node n
Operating System
Operating System
Redo / Archive logs all instances Database / Control files OCR and Voting Disks
Instance 1
SGA
Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache
Instance 2
SGA
Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache
Instance n
SGA
Global Resource Directory Dictionary Cache Library Cache Log buffer Buffer Cache
LCK0 LMS0
LGWR SMON
DBW0 PMON
LCK0 LMS0
LGWR SMON
DBW0 PMON
LCK0 LMS0
LGWR SMON
DBW0 PMON
Node 1
Node 2
Node n
Global Resource Directory RAC Database System has two important services. They are Global
Cache Service (GCS) and Global Enqueue Service (GES). These are basically collections of background processes. These two processes together cover and manage the total Cache Fusion process, resource transfers, and resource escalations among the instances. Global Resource Directory GES and GCS together maintain a Global Resource Directory (GRD) to record the information about the resources and the enqueues. GRD remains in the memory and is stored on all the instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC. Global Resource Directory (GRD) is the internal database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instances cache the GRD is updated. The following resources information is available in GRD. * Data Block Identifiers (DBA) * Location of most current version * Modes of the data blocks: (N)Null, (S)Shared, (X)Exclusive * The Roles of the data blocks (local or global) held by each instance * Buffer caches on multiple nodes in the cluster
Select name,description from v$bgprocess where paddr <> 00 The one specific to a RAC instance are the DIAG, LCK, LMON, LMNDn and LMSn process.
The diagnosability daemon is responsible for capturing information on process failures in a RAC environment, and writing out trace information for failure analysis. The information produced by DIAG is most useful when working in conjunction with Oracle Support to troubleshoot causes for a failure. Only a single DIAG process is needed for each instance
The lock process (LCK) manages requests that are not cache-fusion requests, such as row cache requests and library cache requests Only a single LCK process is allowed for each instance. LCK maintains a list of lock elements and uses this list to validate locks during instance recovery
The global enqueue service daemon (LMD) is a lock agent process that coordinates enqueue manager service requests. The requests are for global cache service enqueues that control access to global enqueues and resources. The LMD process also handles deadlock detection and remote enqueue requests.
LMON is the global enqueue service monitor. It is responsible for the reconfiguration of lock resources when an instance joins the cluster or leaves the cluster, and also is responsible for the dynamic lock remastering LMON will generate a trace file whenever a reconfiguration occurs (as opposed to remastering of a subset of locks). It is the responsibility of LMON to check for the death of instances clusterwide, and to initiate reconfiguration as quickly as possible
The LMS process (or global cache service process) is in charge of shipping the blocks between instances for cache-fusion requests. In the event of a consistent-read request, the LMS process will first roll the block back, creating the consistent read (CR) image of the block, and will then ship that version of the block across the interconnect to the foreground process making the request at the remote instance. In addition, LMS must interact with the LMD process to retrieve lock requests placed by LMD. An instance may dynamically generate up to LMS
To manage the RAC database and its instances, Oracle has provided a new utility called the Server Control Utility (SRVCTL). This replaces the earlier utility opsctl which was used in the parallel server. The Server Control Utility is a single point of control between the Oracle Intelligent agent and each node in the RAC system. The SRVCTL communicates with the global daemon service (GSD) and resides on each of the nodes. The SRVCTL gathers information from the database and instances and acts as an intermediary between nodes and the Oracle Intelligent agent. When you use the SRVCTL to perform configuration operations on your cluster, the SRVCTL stores configuration data in the Server Management (SRVM) configuration repository. The SRVM includes all the components of Enterprise Manager such as the Intelligent Agent, the Server Control Utility (SRVCTL), and the Global Services Daemon. Thus, the SRVCTL is one of the SRVM Instance
For the SRVCTL to function, the Global Services Daemon (GSD) should be running on the node. The SRVCTL performs mainly two types of administrative tasks: Cluster Database Tasks and Cluster Database Configuration Tasks. SRVCTL Cluster Database tasks include: Starts and stops cluster databases. Starts and stops cluster database instances. Starts and stops listeners associated with a cluster database instance. Obtains the status of a cluster database instance. Obtains the status of listeners associated with a cluster database. SRVCTL Cluster Database Configuration tasks include: Adds and deletes cluster database configuration information. Adds an instance to, or deletes an instance from a cluster database. Renames an instance name within a cluster database configuration. Moves instances in a cluster database configuration. Sets and unsets the environment variable for an instance in a cluster database configuration. Sets and unsets the environment variable for an entire cluster in a
RAW Partitions, Cluster File System and Automatic Storage Management (ASM)
Raw Partitions are a set of unformatted devices on a shared disk sub-system.A raw partition is a disk drive device that does not have a file system set up. The raw partition is portion of the physical disk that is accessed at the lowest possible level. The actual application that uses a raw device is responsible for managing its own I/O to the raw device with no operating system buffering. Traditionally, they were required for Oracle Parallel Server (OPS) and they provided high performance by bypassing the file system overhead. Raw partitions were used in setting up databases for performance gains and for the purpose of concurrent access by multiple nodes in the cluster without system-level buffering.
RAW Partitions, Cluster File System and Automatic Storage Management (ASM) Oracle 9i RAC and 10g now supports both the cluster file system and the raw devices to store the shared data. In addition, 10g RAC supports shared storage resources from ASM instance. You will be able to create the data files out of the disk resources located in the ASM instance. The ASM resources are sharable and accessed by all the
RAW Devices
Raw Devices have been in use for very long time. They were the primary storage structures for data files of the Oracle Parallel Server. They remain in use even in the RAC versions 9i and 10g. Raw Devices are difficult to manage and administer, but provide high performing shared storage structures. When you use the raw devices for data files, redo log files and control files, you may have to use the local file systems or some sort of network attached file system for writing the archive log files, handling the utl_file_dir files and files supporting the external tables. On Raw Devices On Local File System Data files Archive log files Redo files Oracle Home files Control files CRS Home files Voting Disk Alert log, Trace files
RAW Devices
Advantages Raw partitions have several advantages: They are not subject to any operating system locking. The operating system buffer or cache is bypassed, giving performance gains and reduced memory consumption. Multiple systems can be easily shared. The application or database system has full control to manipulate the internals of access. Historically, the support for asynchronous I/O on UNIX systems was generally limited to raw partitions
RAW Devices
Issues and Difficulties There are many administrative inconveniences and drawbacks such as: The unit of allocation to the database is the entire raw partition. We cannot use a raw partition for multiple tablespaces. A raw partition is not the same as a file system where we can create many files. Administrators have to create them with specific sizes. When the databases grow in size, raw partitions cannot be extended. We need to add extra partitions to support the growing tablespace. Sometimes we may have limitations on the total number of raw partitions we can use in the system. Furthermore, there are no database operations that can occur on an individual data file. There is, therefore, no logical benefit from having a tablespace consisting of many data files except for those tablespaces that are larger than the maximum Oracle can support in a single file. We cannot use the standard file manipulation commands on the raw partitions, and thus, on the data files. We cannot
RAW Devices
Raw partitions cannot be used for writing the archive logs. Administrators need to keep track of the raw volumes with their cryptic naming conventions. However, by using the symbolic links, we can reduce the hassles associated with names. For example, a cryptic name like /dev/rdsk/c8t4d5s4 or a name like /dev/sd/sd001 is an administrative challenge. To alleviate this, administrators often rely on symbolic links to provide logical names that make sense. This, however, substitutes one complexity for another. In a clustered environment like Linux clusters, it is not guaranteed that the physical devices will have the same device names on different nodes or across reboots of a single node. To solve this problem, manual intervention is needed, which will increase administration overhead.
CFS offers a very good shared storage facility for building the RAC database. CFS provides a shared file system, which is mounted on all the cluster nodes simultaneously. When you implement the RAC database with the commercial CFS products such as the Veritas CFS or PolyServe Matrix Server, you will able to store all kinds of database files including the shared Oracle Home and CRS Home. However, the capabilities of the CFS products are not the same. For example, Oracle CFS (OCFS), used in case of Linux RAC implementations, has limitations. It is not a general purpose file system. It cannot be used for shared Oracle Home.
ASM is the new star on the block. ASM provides a vertical integration of the file system and volume manager for Oracle database files. ASM has the capability to spread database files across all available storage for optimal performance and resource utilization. It enables simple and non-intrusive resource allocation and provides automatic rebalancing When you are using the ASM for building shared files, you would get almost the same performance as that of raw partitions. The ASM controlled disk devices will be part of ASM instance, which can be shared by the RAC database instance. It is similar to the situation where raw devices supporting the RAC database had to be shared by multiple nodes. The shared devices need to be presented to multiple nodes on the cluster and those devices will be input to the ASM instance. There will be an ASM instance supporting each RAC instance on the respective node
ASM is for more Oracle specific data, redo log files and archived log files.
Shared Disk Storage Oracle RAC relies on a shared disk architecture. The database files, online redo logs, and control files for the database must be accessible to each node in the cluster. The shared disks also store the Oracle Cluster Registry and Voting Disk. There are a variety of ways to configure shared storage including direct attached disks (typically SCSI over copper or fiber), Storage Area Networks (SAN), and Network Attached Storage (NAS). Private Network Each cluster node is connected to all other nodes via a private high-speed network, also known as the cluster interconnect or high-speed interconnect (HSI). This network is used by Oracle's Cache Fusion technology to effectively combine the physical memory (RAM) in each host into a single cache. Oracle Cache Fusion allows data stored in the cache of one Oracle instance to be accessed by any other instance by transferring it across
The private network is typically built with Gigabit Ethernet, but for high-volume environments, many vendors offer proprietary low-latency, high-bandwidth solutions specifically designed for Oracle RAC. Linux also offers a means of bonding multiple physical NICs into a single virtual NIC to provide increased bandwidth and availability. Public Network To maintain high availability, each cluster node is assigned a virtual IP address (VIP). In the event of host failure, the failed node's IP address can be reassigned to a surviving node to allow applications to continue accessing the database through the same IP address.
It's all about availability of the application. When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen. The new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately. This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.
The Oracle CRS contains all the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, CRS will send messages (via a special ping operation) to all nodes configured in the clusteroften called the "heartbeat." If the heartbeat fails for any of the nodes, it checks with the CRS configuration files (on the shared disk) to distinguish between a real node failure and a network failure. CRS maintains two files: the Oracle Cluster Registry (OCR) and the Voting Disk. The OCR and the Voting Disk must reside on shared disks
The Voting Disk is used by the Oracle cluster manager in various layers. The Cluster Manager and Node Monitor accepts registration of Oracle instances to the cluster and it sends ping messages to Cluster Managers (Node Monitor) on other RAC nodes. If this heartbeat fails, oracm uses a quorum file or a quorum partition on the shared disk to distinguish between a node failure and a network failure. So if a node stops sending ping messages, but continues writing to the quorum file or partition, then the other Cluster Managers can recognize it as a network failure. Hence the availability from the Voting Disk is critical for the operation of the Oracle Cluster Manager. The shared volumes created for the OCR and the voting disk should be configured using RAID to protect against media failure. This requires the use of an external cluster volume manager, cluster file system, or storage hardware that provides RAID protection. . Oracle Cluster Registry (OCR) is used to store the cluster configuration information among other things. OCR needs to be accessible from all nodes in the
Cache Fusion One of the bigger differences between Oracle RAC and OPS is the presence of Cache Fusion technology. In OPS, a request for data between nodes required the data to be written to disk first, and then the requesting node could read that data. In RAC, data is passed along with locks. Every time an instance wants to update a block, it has to obtain a lock on it to make sure no other instance in the cluster is updating the same block. To resolve this problem, Oracle does a data block ping mechanism that allows it to get the status of the specific block before reading it from the disk. Cache Fusion resolves data block read/read, read/write and write/write conflicts among ORACLE database nodes through high performance interconnect networks, bypassing much slower physical disk operations used in previous releases. Using Oracle 9i RAC cache fusion feature, close to linear scalability of database performance can be achieved when adding nodes to the cluster. ORACLE enables better Database capacity
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Contents
Introduction Oracle RAC 10g Overview Shared-Storage Overview FireWire Technology Hardware & Costs Install the Linux Operating System Network Configuration Obtain & Install FireWire Modules Create "oracle" User and Directories Create Partitions on the Shared FireWire Storage Device
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Configure the Linux Servers for Oracle Configure the hangcheck-timer Kernel Module Configure RAC Nodes for Remote Access All Startup Commands for Each RAC Node Check RPM Packages for Oracle 10g Release 2 Install & Configure Oracle Cluster File System (OCFS2) Install & Configure Automatic Storage Management ( ASMLib 2.0) Download Oracle 10g RAC Software Install Oracle 10g Clusterware Software Install Oracle 10g Database Software Install Oracle10g Companion CD Software Create TNS Listener Process
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Create the Oracle Cluster Database Verify TNS Networking Files Create / Alter Tablespaces Verify the RAC Cluster & Database Configuration Starting / Stopping the Cluster Transparent Application Failover - (TAF) Conclusion Acknowledgements
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Download
- Red Hat Enterprise Linux 4 - Oracle Cluster File System Release 2 - (1.2.3-1) - Single Processor / SMP / Hugemem - Oracle Cluster File System Releaase 2 Tools - (1.2.1-1) - Tools / Console - Oracle Database 10g Release 2 EE, Clusterware, Companion CD (10.2.0.1.0) - Precompiled RHEL4 FireWire Modules - (2.6.9-22.EL) - ASMLib 2.0 Driver - (2.6.9-22.EL / 2.0.3-1) - Single Processor / SMP / Hugemem - ASMLib 2.0 Library and Tools - (2.0.3-1) - Driver Support Files / Userspace Library
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Introduction One of the most efficient ways to become familiar with Oracle Real Application Clusters (RAC) 10g technology is to have access to an actual Oracle RAC 10g cluster. There's no better way to understand its benefitsincluding fault tolerance, security, load balancing, and scalabilitythan to experience them directly. The Oracle Clusterware software will be installed to /u01/app/oracle/product/crs on each of the nodes that make up the RAC cluster. However, the Clusterware software requires that two of its filesthe Oracle Cluster Registry (OCR) file and the Voting Disk filebe shared with all nodes in the cluster. These two files will be installed on shared storage using OCFS2. It is possible (but not recommended by Oracle) to use RAW devices for these files; however, it is not possible to use ASM for these two Clusterware files. The Oracle Database 10g Release 2 software will be installed into a separate Oracle Home, namely /u01/app/oracle/product/10.2.0/db_1, on each of the nodes that make up the RAC cluster. All the Oracle physical database files (data, online redo logs, control files, archived redo logs), will be installed to different partitions of the shared drive being managed by ASM. (The Oracle database files can just as easily be stored on OCFS2. Using ASM, however, makes the article that much more interesting!)
Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). RAC allows multiple instances to access the same database (storage) simultaneously. It provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same timebecause all nodes access the same databasethe failure of one instance will not cause the loss of access to the database. At the heart of Oracle RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available to allow all nodes to access the database. Each node has its own redo log and control files but the other nodes must be able to access them in order to recover that node in the event of a system failure. One of the bigger differences between Oracle RAC and OPS is the presence of Cache Fusion technology. In OPS, a request for data between nodes required the data to be written to disk first, and then the requesting node could read that data. In RAC, data is passed along with locks.
3. Shared-Storage Overview
Fibre Channel is one of the most popular solutions for shared storage. As I mentioned previously, Fibre Channel is a high-speed serial-transfer interface used to connect systems and storage devices in either point-topoint or switched topologies. Protocols supported by Fibre Channel include SCSI and IP. Fibre Channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 gigabits per second. Fibre Channel, however, is very expensive; the switch alone can cost as much as US$1,000 and high-end drives can reach prices of US$300. Overall, a typical Fibre Channel setup (including cards for the servers) costs roughly US$5,000. A less expensive alternative to Fibre Channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget at around US$1,000 to US$2,000 for a two-node cluster. Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and
4. FireWire Technology
Developed by Apple Computer and Texas Instruments, FireWire is a cross-platform implementation of a high-speed serial data bus. With its high bandwidth, long distances (up to 100 meters in length) and high-powered bus, FireWire is being used in applications such as digital video (DV), professional audio, hard drives, high-end digital still cameras and home entertainment devices. Today, FireWire operates at transfer rates of up to 800 megabits per second while next generation FireWire calls for speeds to a theoretical bit rate to 1,600 Mbps and then up to a staggering 3,200 Mbps. That's 3.2 gigabits per second. This speed will make FireWire indispensable for transferring massive data files and for even the most demanding video applications, such as working with uncompressed high-definition (HD) video or multiple standarddefinition (SD) video streams.
Disk Interface Serial Parallel (standard) USB 1.1 Parallel (ECP/EPP) IDE ATA SCSI-1 SCSI-2 (Fast SCSI/Fast Narrow SCSI) Fast Wide SCSI (Wide SCSI) Ultra SCSI (SCSI-3/Fast-20/Ultra Narrow) Ultra IDE Wide Ultra SCSI (Fast Wide 20) Ultra2 SCSI IEEE1394(b) USB 2.x Wide Ultra2 SCSI Ultra3 SCSI Wide Ultra3 SCSI FC-AL Fiber Channel
Speed 115 kb/s - (.115 Mb/s) 115 KB/s - (.115 MB/s) 12 Mb/s - (1.5 MB/s) 3.0 MB/s 3.3 - 16.7 MB/s 3.3 - 66.6 MB/sec 5 MB/s 10 MB/s 20 MB/s 20 MB/s 33 MB/s 40 MB/s 40 MB/s 100 - 400Mb/s - (12.5 - 50 MB/s) 480 Mb/s - (60 MB/s) 80 MB/s 80 MB/s 160 MB/s 100 - 400 MB/s
1.
2.
1.Oracle Cluster Registry (OCR) File - /u02/oradata/orcl/OCRFile (OCFS2 ) 2.CRS Voting Disk - /u02/oradata/orcl/CSSFile (OCFS2 ) 3.Oracle Database files ASM
5. Software Requirements Software At the software level, each node in a RAC cluster needs: 1. An operating system 2. Oracle Clusterware Software 3. Oracle RAC software, and optionally An Oracle Automated Storage Management instance.
ASM is a new feature in Oracle Database 10g that provides the services of a filesystem, logical volume manager, and software RAID in a platformindependent manner. Oracle ASM can stripe and mirror your disks, allow disks to be added or removed while the database is under load, and automatically balance I/O to remove "hot spots." It also supports direct and asynchronous I/O and implements the Oracle Data Manager API (simplified I/O system call interface) introduced in Oracle9i. Oracle ASM is not a general-purpose filesystem and can be used only for Oracle data files, redo logs, control files, and the RMAN Flash Recovery Area. Files in ASM can be created and named automatically by the database (by use of the Oracle Managed Files feature) or manually by the DBA. Because the files stored in ASM are not accessible to the operating system, the only way to perform backup and recovery operations on databases that use ASM files is through Recovery Manager (RMAN). ASM is implemented as a separate Oracle instance that must be up if other databases are to be able to access it. Memory requirements for ASM are light: only 64MB for most systems. In Oracle RAC environments, an ASM instance must be running on each cluster node.
This article was designed to work with the Red Hat Enterprise Linux 4 (AS/ES) operating environment. . You will need three IP addresses for each server: one for the private network, one for the public network, and one for the virtual IP address. Use the operating system's network configuration tools to assign the private and public network addresses. Do not assign the virtual IP address using the operating system's network configuration tools; this will be done by the Oracle Virtual IP Configuration Assistant (VIPCA) during Oracle RAC software installation.
Linux1 eth0: - Check off the option to [Configure using DHCP] - Leave the [Activate on boot] checked - IP Address: 192.168.1.100 - Netmask: 255.255.255.0 eth1: - Check off the option to [Configure using DHCP] - Leave the [Activate on boot] checked - IP Address: 192.168.2.100 - Netmask: 255.255.255.0
Linux2 eth0: - Check off the option to [Configure using DHCP] - Leave the [Activate on boot] checked - IP Address: 192.168.1.101 - Netmask: 255.255.255.0 eth1: - Check off the option to [Configure using DHCP] - Leave the [Activate on boot] checked - IP Address: 192.168.2.101 - Netmask: 255.255.255.0
Server 1 (linux1)
Device
IP Address
Subnet
Purpose
eth0
192.168.1.100
255.255.255.0
eth1
192.168.2.100
255.255.255.0
/etc/hosts
127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 linux1-priv 192.168.2.101 linux2-priv # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 linux1-vip 192.168.1.201 linux2-vip
Server 2 (linux2)
Device
IP Address
Subnet
Purpose
eth0
192.168.1.101
255.255.255.0
eth1
192.168.2.101
255.255.255.0
/etc/hosts
127.0.0.1 localhost loopback # Public Network - (eth0) 192.168.1.100 linux1 192.168.1.101 linux2 # Private Interconnect - (eth1) 192.168.2.100 int-linux1 192.168.2.101 int-linux2 # Public Virtual IP (VIP) addresses for - (eth0) 192.168.1.200 vip-linux1 192.168.1.201 vip-linux2
Note that the virtual IP addresses only need to be defined in the /etc/hosts file for both nodes. The public virtual IP addresses will be configured automatically by Oracle when you run the Oracle Universal Installer, which starts Oracle's Virtual Internet Protocol Configuration Assistant (VIPCA). All virtual IP addresses will be activated when the srvctl start nodeapps -n <node_name> command is run. This is the Host Name/IP Address that will be configured in the client(s) tnsnames.ora file (more details later).
Adjusting Network Settings Oracle now uses UDP as the default protocol on Linux for interprocess communication, such as cache fusion buffer transfers between the instances. It is strongly suggested to adjust the default and maximum send buffer size (SO_SNDBUF socket option) to 256 KB, and the default and maximum receive buffer size (SO_RCVBUF socket option) to 256 KB. The receive buffers are used by TCP and UDP to hold received data until is is read by the application. The receive buffer cannot overflow because the peer is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the socket receive buffer. This could cause the sender to overwhelm the receiver .
To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process: net.core.rmem_default=262144 net.core.wmem_default=262144 net.core.rmem_max=262144 net.core.wmem_max=262144
http://oss.oracle.com/projects/firewire/di st/files/RedHat/RHEL4/i386/oraclefirewire-modules-2.6.9-22.EL-12861.i686.rpm
Install the supporting FireWire modules, as root: Install the supporting FireWire modules package by running either of the following: # rpm -ivh oracle-firewire-modules-2.6.9-22.EL1286-1.i686.rpm - (for single processor) - OR # rpm -ivh oracle-firewire-modules-2.6.922.ELsmp-1286-1.i686.rpm - (for multiple processors) Add module options: Add the following lines to /etc/modprobe.conf:
Connect FireWire drive to each machine and boot into the new kernel: After both machines are powered down, connect each of them to the back of the FireWire drive. Power on the FireWire drive. Finally, power on each Linux server and ensure to boot each machine into the new kernel Check for SCSI Device: 01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) Second, let's check to see that the modules are loaded: # lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod" sd_mod 13744 0 sbp2 19724 0 scsi_mod 106664 3 [sg sd_mod sbp2] ohci1394 28008 0 (unused) ieee1394 62884 0 [sbp2 ohci1394]
Perform the following procedure on all nodes in the cluster! I will be using the Oracle Cluster File System (OCFS) to store the files required to be shared for the Oracle Cluster Ready Services (CRS). When using OCFS, the UID of the UNIX user oracle and GID of the UNIX group dba must be identical on all machines in the cluster. If either the UID or GID are different, the files on the OCFS file system will show up as "unowned" or may even be owned by a different user. For this article, I will use 175 for the oracle UID and 115 for the dba GID. Create Group and User for Oracle Let's continue our example by creating the Unix dba group and oracle user account along with all appropriate directories. # mkdir -p /u01/app # groupadd -g 115 dba # useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle # chown -R oracle:dba /u01 # passwd oracle # su oracle Note: When you are setting the Oracle environment variables for each RAC node, ensure to assign each RAC node a unique Oracle SID! For this example, I used: linux1 : ORACLE_SID=orcl1 linux2 : ORACLE_SID=orcl2
Now, let's create the mount point for the Oracle Cluster File System (OCFS) that will be used to store files for the Oracle Cluster Ready Service (CRS). These commands will need to be run as the "root" user account: $ su # mkdir -p /u02/oradata/orcl # chown -R oracle:dba /u02 Oracle Cluster File System (OCFS) version 2 OCFS version 1 is a great alternative to raw devices. Not only is it easier to administer and maintain, it overcomes the limit of 255 raw devices. However, it is not a general-purpose cluster filesystem. It may only be used to store the following types of files: Oracle data files Online redo logs Archived redo logs Control files Spfiles CRS shared files (Oracle Cluster Registry and CRS voting disk).
Create the following partitions on only one node in the cluster! The next step is to create the required partitions on the FireWire (shared) drive. As I mentioned previously, we will use OCFS to store the two files to be shared for CRS. We will then use ASM for all physical database files (data/index files, online redo log files, control files, SPFILE, and archived redo log files). The following table lists the individual partitions that will be created on the FireWire (shared) drive and what files will be contained on them.
Most of the configuration procedures in this section should be performed on all nodes in the cluster! Creating the OCFS2 filesystem, however, should be executed on only one node in the cluster. It is now time to install OCFS2. OCFS2 is a cluster filesystem that allows all nodes in a cluster to concurrently access a device via the standard filesystem interface. This allows for easy management of applications that need to run across a cluster. OCFS Release 1 was released in 2002 to enable Oracle RAC users to run the clustered database without having to deal with RAW devices. The filesystem was designed to store database related files, such as data files, control files, redo logs, archive logs, etc. OCFS Release 2 (OCFS2), in contrast, has been designed as a general-purpose cluster filesystem. With it, one can store not only database related files on a shared disk, but also store Oracle binaries and configuration files (shared Oracle Home) making management of RAC even easier.
Installing OCFS We will be installing the OCFS files onto two single-processor machines. The
17. Install and Configure Automatic Storage Management and Disks Downloading the ASMLib Packages Installing ASMLib Packages
Edit the file /etc/sysconfig/rawdevices as follows: # raw device bindings # format: <rawdev> <major> <minor> # <rawdev> <blockdev> # example: /dev/raw/raw1 /dev/sda1 # /dev/raw/raw2 8 5 /dev/raw/raw2 /dev/sda2 /dev/raw/raw3 /dev/sda3 /dev/raw/raw4 /dev/sda4 The raw device bindings will be created on each reboot. You would then want to change ownership of all raw devices to the "oracle" user account: # chown oracle:dba /dev/raw/raw2; chmod 660 /dev/raw/raw2 # chown oracle:dba /dev/raw/raw3; chmod 660 /dev/raw/raw3 # chown oracle:dba /dev/raw/raw4; chmod 660 /dev/raw/raw4 The last step is to reboot the server to bind the devices or simply restart the rawdevices service: # service rawdevices restart
17. Install and Configure Automatic Storage Management and Disks Creating ASM Disks for Oracle
Install ASMLib 2.0 Packages This installation needs to be performed on all nodes as the root user account: $ su # rpm -Uvh oracleasm-2.6.9-22.EL-2.0.3-1.i686.rpm \ oracleasmlib-2.0.2-1.i386.rpm \ oracleasm-support-2.0.3-1.i386.rpm Preparing... ########################################### [100%] 1:oracleasm-support ########################################### [ 33%] 2:oracleasm-2.6.9-22.EL ########################################### [ 67%] 3:oracleasmlib ########################################### [100%]
$ su # /etc/init.d/oracleasm createdisk VOL1 /dev/sda2 Marking disk "/dev/sda2" as an ASM disk [ OK ] # /etc/init.d/oracleasm createdisk VOL2 /dev/sda3 Marking disk "/dev/sda3" as an ASM disk [ OK ] # /etc/init.d/oracleasm createdisk VOL3 /dev/sda4 Marking disk "/dev/sda4" as an ASM disk [ OK ] If you do receive a failure, try listing all ASM disks using: # /etc/init.d/oracleasm listdisks VOL1 VOL2 VOL3
17. Install and Configure Automatic Storage Management and Disks On all other nodes in the cluster, you must perform a scandisk to recognize the new volumes: # /etc/init.d/oracleasm scandisks Scanning system for ASM disks [ OK ] We can now test that the ASM disks were successfully created by using the following command on all nodes as the root user account: # /etc/init.d/oracleasm listdisks VOL1 VOL2 VOL3
18. Download Oracle RAC 10g Release 2 Software The following download procedures only need to be performed on one node in the cluster! The next logical step is to install Oracle Clusterware Release 2 (10.2.0.1.0), Oracle Database 10g Release 2 (10.2.0.1.0), and finally the Oracle Database 10g Companion CD Release 2 (10.2.0.1.0) for Linux x86 software. However, you must first download and extract the required Oracle software packages from OTN. You will be downloading and extracting the required software from Oracle to only one of the Linux nodes in the clusternamely, linux1. You will perform all installs from this machine. The Oracle installer will copy the required software packages to all other nodes in the RAC configuration we set up in Section 13. Login to one of the nodes in the Linux RAC cluster as the oracle user account. In this example, you will be downloading the required Oracle software to linux1 and saving them to /u01/app/oracle/orainstall.
Perform the following installation procedures on only one node in the cluster! The Oracle Clusterware software will be installed to all other nodes in the cluster by the Oracle Universal Installer. You are now ready to install the "cluster" part of the environment the Oracle Clusterware. In the previous section, you downloaded and extracted the install files for Oracle Clusterware to linux1 in the directory /u01/app/oracle/orainstall/clusterware. This is the only node from which you need to perform the install. During the installation of Oracle Clusterware, you will be asked for the nodes involved and to configure in the RAC cluster. Once the actual installation starts, it will copy the required software to all nodes using the remote access we configured in the section Section 13 ("Configure RAC Nodes for Remote Access"). So, what exactly is the Oracle Clusterware responsible for? It contains all of the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, Oracle Clusterware will send messages (via a special ping operation) to all nodes configured in the cluster, often called the "heartbeat." If the heartbeat fails for any of the nodes, it checks with the Oracle Clusterware configuration files (on the shared disk) to distinguish between a real node failure and a network failure. After installing Oracle Clusterware, the Oracle Universal Installer (OUI) used to install the Oracle 10g database software (next section) will automatically recognize these nodes. Like the Oracle Clusterware install you will be performing in this section, the Oracle Database 10 g
20. Install Oracle Database 10g Release 2 Software Perform the following installation procedures on only one node in the cluster! The Oracle database software will be installed to all other nodes in the cluster by the Oracle Universal Installer.
After successfully installing the Oracle Clusterware software, the next step is to install the Oracle Database 10g Release 2 (10.2.0.1.0) with RAC. .
Installing Oracle Database 10g Software Install the Oracle Database 10g software with the following: $ cd ~oracle $ /u01/app/oracle/orainstall/db/Disk1/runInstaller -ignoreSysPrereqs
21. Create the TNS Listener Process The Oracle TNS listener process should now be running on all nodes in the RAC cluster: $ hostname linux1 $ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}' LISTENER_LINUX1 ===================== $ hostname linux2 $ ps -ef | grep lsnr | grep -v 'grep' | grep -v 'ocfs' | awk '{print $9}' LISTENER_LINUX2
22. Create the Oracle Cluster Database The database creation process should only be performed from one node in the cluster! We will use the DBCA to create the clustered database. Creating the Clustered Database To start the database creation process, run the following: # xhost + access control disabled, clients can connect from any host # su - oracle $ dbca &
24. Creating/Altering Tablespaces SQL> create tablespace indx datafile '+ORCL_DATA1' size 1024m 2 autoextend on next 50m maxsize unlimited 3 extent management local autoallocate 4 segment space management auto; SQL> alter database datafile '+ORCL_DATA1/orcl/datafile/system.259.1' resize 800m; SQL> alter database datafile '+ORCL_DATA1/orcl/datafile/sysaux.261.1' resize 500m; SQL> alter tablespace undotbs1 add datafile '+ORCL_DATA1' size 1024m 2 autoextend on next 50m maxsize 2048m; SQL> alter tablespace undotbs2 add datafile '+ORCL_DATA1' size 1024m 2 autoextend on next 50m maxsize 2048m; SQL> alter database tempfile
25. Verify the RAC Cluster/Database Configuration The following RAC verification checks should be performed on all nodes in the cluster! For this guide, we will perform these checks only from linux1 Status of all instances and services $ srvctl status database -d orcl Instance orcl1 is running on node linux1 Instance orcl2 is running on node linux2 Status of a single instance $ srvctl status instance -d orcl -i orcl2 Instance orcl2 is running on node linux2 Status of a named service globally across the database $ srvctl status service -d orcl -s orcltest Service orcltest is running on instance(s) orcl2, orcl1
25. Verify the RAC Cluster/Database Configuration Display all services for the specified cluster database $ srvctl config service -d orcl orcltest PREF: orcl2 orcl1 AVAIL: Display the configuration for node applications - (VIP, GSD, ONS, Listener) $ srvctl config nodeapps -n linux1 -a -g -s -l VIP exists.: /viplinux1/192.168.1.200/255.255.255.0/eth0:eth1 GSD exists. ONS daemon exists. Listener exists. Display the configuration for the ASM instance(s) $ srvctl config asm -n linux1 +ASM1 /u01/app/oracle/product/10.1.0/db_1
25. Verify the RAC Cluster/Database Configuration All running instances in the cluster SELECT inst_id , instance_number inst_no , instance_name inst_name , parallel , status , database_status db_status , active_state state , host_name host FROM gv$instance ORDER BY inst_id; INST_ID INST_NO INST_NAME PAR STATUS DB_STATUS STATE HOST -------- -------- ---------- --- ------- ------------ --------- ------1 1 orcl1 YES OPEN ACTIVE NORMAL linux1 2 2 orcl2 YES OPEN ACTIVE NORMAL linux2
25. Verify the RAC Cluster/Database Configuration All data files which are in the disk group select name from v$datafile union select member from v$logfile union select name from v$controlfile union select name from v$tempfile; All ASM disk that belong to the 'ORCL_DATA1' disk group SELECT path FROM v$asm_disk WHERE group_number IN (select group_number from v$asm_diskgroup where name = 'ORCL_DATA1'); PATH ---------------------------------ORCL:VOL1 ORCL:VOL2 ORCL:VOL3
27. Managing Transparent Application Failover SQL Query to Check the Session's Failover Information The following SQL query can be used to check a session's failover type, failover method, and if a failover has occurred. We will be using this query throughout this example. COLUMN instance_name FORMAT a13 COLUMN host_name FORMAT a9 COLUMN failover_method FORMAT a15 COLUMN failed_over FORMAT a11 SELECT instance_name , host_name , NULL AS failover_type , NULL AS failover_method , NULL AS failed_over FROM v$instance UNION SELECT NULL , NULL , failover_type , failover_method , failed_over FROM v$session WHERE username = 'SYSTEM';
Additional Information
Additional Information
Tnsnames.ora example.
A typical tnsnames.ora file configured to use TAF would similar to: ORCLTEST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orcltest.idevelopment.info) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) ) )
)
Contact Information
Kishore A [email protected]