0% found this document useful (0 votes)
111 views

Welcome: Classroom Session

Cluster is a group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing avaliablity. Cluster links commodity hardware with intelligent software to provide application failover and control. VERITAS NetBackup offers a single console for all your backup and recovery operations.

Uploaded by

Venu Gopal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

Welcome: Classroom Session

Cluster is a group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing avaliablity. Cluster links commodity hardware with intelligent software to provide application failover and control. VERITAS NetBackup offers a single console for all your backup and recovery operations.

Uploaded by

Venu Gopal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Welcome

Classroom Session

At the end of this Session we will be familiar with


Cluster Terminology Cluster Communication VCS Architecture Maintaining the Cluster Configuration

Troubleshooting
Software / Hardware Requirements and Recommendations

Have a look on what we have

VERITAS Cluster Server


VCS is the industry's leading cross-platform clustering solution for minimizing application downtime

VERITAS Volume Manager


VxVM provides easy-to-use online disk storage management for computing environment and SAN,through Redundant Array of Independent Disk (RAID)

VERITAS NetBackup
Veritas NetBackup offers a single console for all your backup and recovery operations.

What is Cluster ?
Cluster is a group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing avaliablity.Each system ,or node runs its own OS and cooperate at the software level to form a cluster. Cluster links commodity hardware with intelligent software to provide application failover and control. A cluster consist of the following

Two or more node then two systems (nodes)


Dedicated private network between all the nodes Shared Storage which can be accessed by all the nodes of the cluster

What is VCS Cluster ?


A VCS Cluster is composed of a set of systems that provide scalability and high availability for specified applications.
VCS monitor and controls the applications in a cluster and can restart or move them in response to communications infrastructure. Each Cluster has a unique id. Cluster can have 1 to 32 member system or node Each node must run the same operating system with in a single VCS cluster Nodes without common storage cannot failover an application that stores data to disk

Cluster Terminology
Cluster Name of your HA environment Nodes Physical systems that make up the cluster

Service group A service group is a virtual container that contains all the
hardware and software resources that are required to run the managed application.

Resource Cluster components


(i.e. NICs, IPs, disk groups, volumes, mounts, processes, etc...)

Attributes Parameter values that define the resources Agents are multi-threaded processes that provide the logic to manage resources. Dependencies Links between resources or service groups

Switchover and Failover


Failover and switchover are the processes of bringing up application services on a different node in a cluster. Client system access a virtual IP address that moves the service client systems are unaware of which server they are using. A virtual IP address is an address brought up in addition to the base address of system in the cluster.

Switchover: A switchover is an orderly shutdown of an application and its supporting resources on one server and a controlled startup on another server. Failover: A failover is similar to a switchover, except the ordered shutdown of applications on the original node may not be possible, so the services are started on another node.

Lets look in depth

The following diagram illustrates the usage of the regular and virtual IP addresses used in cluster configuration: Administration of the nodes uses IP1/IP2 and IP3/IP4 respectively. The cluster IP address for external clients over the public network uses VIP1

Lets look in depth


The cluster includes two machines, each of them running Application and Veritas software. The Veritas Cluster Server (VCS) software consolidates the Application and exposes a single entity by providing a single virtual IP address for the entire cluster. The cluster software distinguishes an active and a standby machine: the active machine "owns" the virtual IP address and all network connections, while the standby machine is passive until a fail-over occurs. At fail-over, the IP address is passed from the failing server to the backup server, which becomes activated and re-establishes all network connections. When a fail-over occurs, the Application reconnect to the activated (backup) node . The cluster has a virtual IP (VIP) address used for communication with the external entities. Each node in the cluster has also an IP address for administration of the node/cluster Heartbeat networkUsed by the Veritas Cluster Server to perform cluster monitoring and control

Cluster Communication
Cluster membership is defined in the primary cluster configuration file - simply an ascii file that the administrator edits Cluster communications is any network path between the cluster nodes.

User Space Kernel Space

User Space Kernel Space

Cluster Communication
VCS agents track the state of all resources and service groups in the cluster.

Cont

HAD (High Availability Demon) polls the various agents on the node and, if there's a change, reports that to GAB.
GAB (Group Membership Services/Atomic Broadcast) has two jobs.
First, it tracks which systems are part of the cluster. Cluster membership is defined by systems sharing the same cluster ID and a pair of redundant ethernet LLT cables. GAB's second job is to transmit resource status changes to all nodes in the cluster. The atomic broadcast portion of the name implies (correctly, as it turns out) that all systems in the cluster are notified of any changes. If a failure occurs during the update, the "status change" is rolled back ensuring that, upon recovery, all nodes have the same status information. It's the same paradigm as a database commit, if that's familiar to you

Cluster Communication

Cont

LLT is responsible for transmitting the heartbeat signals which GAB uses to maintain cluster membership. A cluster can have between 2 and 8 LLT cables. LLT links can be identified as low or high priority.

High priority links:


Send a heartbeat ever .5 seconds Carry cluster status information Should be configured over dedicated network links

Low priority links:


Send a hearbeat every second. Do not carry cluster status information. Can be configured on public networks Will be automtically promoted to high priority links if all other high priority links have failed. LLT is the lowest protocol in the VCS communications chain so everything else relies on it. If LLT isn't happy, ain't nothing happening - so, let's make LLT happy.

Workstation

VCS Architecture

VCS Architecture
Agents monitor resources on each system and provide status to HAD on the local system

HAD on each system send status information to GAB


GAB broadcasts configuration information to all cluster members LLT transports all cluster communications to all cluster nodes

HAD on each node takes corrective action, such as failover, when necessary

Maintaining the Cluster Configuration


Veritas Cluster does NOT have both boxes up at once servicing requests. It only offers a hot standby system. This enables the system to keep running (with a short transfer period) if a machine fails or system maintenance needs to be done.

Cluster Startup
Here is what the cluster does at startup: Node checks if other node is already started, if so -- stays OFFLINE If no other machine is running, checks communication (gabconfig). May need system admin intervention if cluster requires both nodes to be available. (/sbin/gabconfig -c -x) Once communication between machines is open -- or gabconfig has been started, it sets up network (nic & ip adddress) (starts cluster server) If also brings up volume manager, file system, and then (Application) oracle. If any of the critical processes fail, the whole system is faulted. The most common reason for failing is expired licenses, so check licenses before doing work with vxlicense -p.

Maintaining the Cluster Configuration


File Locations (Logs, Conf, Executables)
Log location: /var/VRTSvcs/log There are several logs in this directory:

Cont

hashadow-log_A: hashadow checks to see if the ha cluster daemon (had) is up and restarts it if needed. This is the log of that process. engine.log_A: primary log, usually what you will be reading for debugging Oracle_A: oracle process log (related to cluster only) Sqlnet_A: sqlnet process log (related to cluster only) IP_A: related to shared IP Volume_A: related to Volume manager Mount_A: related to mounting actual filesystes (filesystem) DiskGroup_A: related to Volume Manager/Cluster Server NIC_A: related to actual network device Look at the most recent ones for debugging purposes (ls -ltr).

Maintaining the Cluster Configuration


Conf Files:
LLT conf: /etc/llttab [should NOT need to access this]

Cont

Network conf: /etc/gabtab If has: /sbin/gabconfig -c -n2 , will need to run /sbin/gabconfig -c -x if only one system comes up and both systems were down. Cluster conf:: /etc/VRTSvcs/conf/config/main.cf Has exact details on what the cluster contains. Most executables are in: /opt/VRTSvcs/bin or /sbin

Maintaining the Cluster Configuration


Check Veritas Licenses - for FileSystem, Volume Manager AND Cluster

Cont

vxlicense -p If any licenses are not valid or expired -- get them FIXED before continuing! All licenses should say "No expiration". If ANY license has an actual expiration date, the test failed. Permenant licenses do NOT have an expiration date. Non-essential licenses may be moved -- however, a senior admin should do this.

Hand check SystemList & AutoStartList


On either machine: grep SystemList /etc/VRTSvcs/conf/config/main.cf You should get: SystemList = { system1, system2 } grep AutoStartList /etc/VRTSvcs/conf/config/main.cf You should get: AutoStartList = { system1, system2 } Each list should contain both machines. If not, many of the next tests will fail. If your lists do NOT contain both systems, you will probably need to modify them with commands that follow. more /etc/VRTSvcs/conf/config/main.cf (See if it is reasonable. It is likely that the systems aren't fully set up) haconf -makerw (this lets you write the conf file) hagrp -modify oragrp SystemList system1 0 system2 1 hagrp -modify oragrp AutoStartList system1 system2 haconf -dump -makero (this makes conf file read only again)

Maintaining the Cluster Configuration


Verify Cluster is Running
vxlicense -p First verify that veritas is up & running: hastatus -summary If this command could NOT be found, add the following to root's path in /.profile: vi /.profile add /opt/VRTSvcs/bin to your PATH variable If /.profile does not already exist, use this one: PATH=/usr/bin:/usr/sbin:/usr/ucb:/usr/local/bin:/opt/VRTSvcs/bin:/sbin:$PATH export PATH . /.profile Re-verify command now runs if you changed /.profile: hastatus -summary If your systems do not show the Expected status, try these debugging steps: If NO systems are up, run hastart on both systems and run hastatus -summary again.

Cont

If only one system is shown, start other system with hastart. Note: one system should ALWAYS be OFFLINE for the way we configure systems here. (If we ran oracle parallel server, this could change -- but currently we run standard oracle server) If both systems are up but are OFFLINE and hastart did NOT correct the problem and oracle filesystems are not running on either system, the cluster needs to be reset. (This happens under strange network situations with GE Access.) [You ran hastart and that wasn't enough to get full cluster to work.]

VCS Troubleshooting

VCS Software Requirement


The following VERITAS software components are qualified configurations: VERITAS Volume Manager 3.2 or later, VERITAS File System 3.4 or later, VERITAS Cluster Server 2.0 or later. DB Edition for DB2 for Solaris 1.0 or later. While VERITAS Cluster Server does not require a volume manager, the use of VERITAS Volume Manager is strongly recommended for ease of installation, configuration and management. System Packages Mandatory: SUNWbashGNU Bourne-Again shell (bash) SUNWgzipGNU Zip (gzip) compression utility SUNWzipInfo-Zip (zip) compression utility SUNWlibCxSun WorkShop Bundled 64-bit libC sudo (superuser do) package Optional: SUNWadmapsystem administration applications SUNWadmcsystem administration core Libraries

VCS Hardware Requirement


Following is a list of hardware currently supported by VERITAS Cluster Server:

For server nodes:


Any SPARC/Solaris server from Sun Microsystems running Solaris 2.6 or later with a minimum of 128 MB RAM.

For disk storage:


EMC Symmetrix, IBM Enterprise Storage Server, HDS 7700 and 9xxx, Sun T3, Sun A5000, Sun A1000, Sun D1000 and any other disk storage supported by VCS 2.0 or later; your VERITAS representative can confirm which disk subsystems are supported or you can refer to VCS documentation. Typical environments will require mirrored private disks (in each cluster node) for the DB2 binaries and shared disks between nodes for the DB2 data.

For network interconnects:


For the public network connections, any network connection supporting IP-based addressing. For the heartbeat connections (internal to the cluster), redundant heartbeat connections are required; this requirement can be met through the use of two additional Ethernet controllers per server or one additional Ethernet controller per server and the use of one shared GABdisk per cluster

Hope now we all are familiar with VCS concept and ready for the Lab- Session

You might also like