Wednesday, February 3, 2010

Oracle RAC troubleshoot

Here isNode eviction example

Oracle RAC startup..

after starting System parameters with non-default values..

....

Cluster communication is configured to use the following interface(s) for this instance
10.1.1.1
Sun May 24 10:50:06 2009
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=14582
DIAG started with pid=3, OS id=14594
PSP0 started with pid=4, OS id=14605
LMON started with pid=5, OS id=14616
LMD0 started with pid=6, OS id=14629
LMS0 started with pid=7, OS id=14635
LMS1 started with pid=8, OS id=14649
MMAN started with pid=9, OS id=14663
DBW0 started with pid=10, OS id=14668
LGWR started with pid=11, OS id=14670
CKPT started with pid=12, OS id=14672
SMON started with pid=13, OS id=14674
RECO started with pid=14, OS id=14676
CJQ0 started with pid=15, OS id=14680
MMON started with pid=16, OS id=14687
MMNL started with pid=17, OS id=14689
Sun May 24 10:50:07 2009
lmon registered with NM - instance id 1 (internal mem no 0)
Sun May 24 10:50:08 2009
Reconfiguration started (old inc 0, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 1
* domain 0 valid = 1 according to instance 1
Sun May 24 10:50:09 2009
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Sun May 24 10:50:09 2009
LMS 1: 0 GCS shadows cancelled, 0 closed
Sun May 24 10:50:09 2009
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Sun May 24 10:50:09 2009
LMS 1: 0 GCS shadows traversed, 0 replayed
Sun May 24 10:50:09 2009
LMS 0: 0 GCS shadows traversed, 0 replayed
Sun May 24 10:50:09 2009
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=18, OS id=14708
Sun May 24 10:50:11 2009
ALTER DATABASE MOUNT

Reasons for Node Evictions in Cluster db configuration

ORA-29740: evicted by member 0, group incarnation

MetaLink has this note on RAC node eviction:

“The node with the lower node number will survive the eviction (The first node to join the cluster).
In case of 3 nodes, 2 nodes will survive and the one you pulled the cable will go away.
4 nodes - the sub cluster with the lower node number will survive.”


A node is evicted from the cluster after it kills itself because it is not able to service the applications. This generally happens during the communication failure between the instances, when the instance is not able to send heartbeat information to the control file and various other reasons.

During failures, to avoid data corruption, the failing instance evicts itself from the cluster group. The node eviction process is reported as Oracle error ORA-29740 in the alert log and LMON trace files

1) Network Disruption (between nodes, voting disk, OCR)
2) interconnect timeout (i think the default is 10 seconds)

http://searchsystemschannel.techtarget.com/generic/0,295582,sid99_gci1254273,00.html

Detail explanation:

Instance Membership Recovery

When a communication failure occurs between the instances, or when an instance is not able to issue the heartbeat information to the control file, the cluster group may be in danger of possible data corruption. In addition, when no mechanism is present to detect the failures, the entire cluster will hang. To address the issue, IMR was introduced in Oracle 9i and improved in Oracle 10g. IMR removes the failed instance from the cluster group. When a subset of a cluster group survives during failures, IMR ensures that the larger partition group survives and kills all other smaller groups.

IMR is a part of the service offered by Cluster Group Services (CGS). LMON is the key process that handles many of the CGS functionalities. As you know, cluster software (known as Cluster Manager, or CM) can be a vendor-provided or Oracle-provided infrastructure tool. CM facilitates communication between all nodes of the cluster and provides information on the health of each node—the node state. It detects failures and manages the basic membership of nodes in the cluster. CM works at the cluster level and not at the database or instance level.

Inside RAC, the Node Monitor (NM) provides information about nodes and their health by registering and communicating with the CM. NM services are provided by LMON. Node membership is represented as a bitmap in the GRD. A value of 0 denotes that a node is down and a value of 1 denotes that the node is up. There is no value to indicate a "transition" period such as during bootup or shutdown. LMON uses the global notification mechanism to let others know of a change in the node membership. Every time a node joins or leaves a cluster, this bitmap in the GRD has to be rebuilt and communicated to all registered members in the cluster.

Node membership registration and deregistration is done in a series of synchronized steps --a topic beyond the scope of this chapter. Basically, cluster members register and deregister from a group. The important thing to remember is that NM always communicates with the other instances in the cluster about their health and status using the CM. In contrast, if LMON needs to send a message to LMON on another instance, it can do so directly without the help or involvement of CM. It is important to differentiate between cluster communication and RAC communication.

A simple extract from the alert log file about member registration is provided here:

Thu Jan 1 00:02:17 1970 alter database mount Thu Jan 1 00:02:17 1970 lmon registered with NM - instance id 1 (internal mem no 0) Thu Jan 1 00:02:17 1970 Reconfiguration started List of nodes: 0, Global Resource Directory frozen
Here you can see that this instance was the first to start up and that LMON registered itself with the NM interface, which is a part of the Oracle kernel.

When an instance joins or leaves the cluster, the LMON trace of another instance shows the reconfiguration of the GRD:

kjxgmpoll reconfig bitmap: 0 1 3 *** 1970-01-01 01:20:51.423 kjxgmrcfg: Reconfiguration started, reason 1
You may find these lines together with other lines asking SMON to perform instance recovery. This happens when any instance crash occurs or when an instance departs the cluster without deregistering in a normal fashion:

Post SMON to start 1st pass IR *** 1970-01-01 01:20:51.423 kjxgmpoll reconfig bitmap: 0 1 3 *** 1970-01-01 01:20:51.423 kjxgmrcfg: Reconfiguration started, reason 1 kjxgmcs: Setting state to 2 0. *** 1970-01-01 01:20:51.423 Name Service frozen
The CGS is present primarily to provide a coherent and consistent view of the cluster from an OS perspective. It tells Oracle that n number of nodes are in the cluster. It is designed to provide a synchronized view of the cluster instance membership. Its main responsibility involves regular status checks of the members and measures whether they are valid in the group, and very importantly, it detects split-brain scenarios in case of communication failures.

Specific rules bind together members within the cluster group, which keeps the cluster in a consistent state:

Each member should be able to communicate without any problems with any other registered and valid member in the group.

Members should see all other registered members in the cluster as valid and have a consistent view.All members must be able to read from and write to the control file.

So, when a communication failure occurs between the instances, or when an instance is not able to issue the heartbeat information to the voting disk, IMR is triggered. Without IMR (there is no mechanism to detect the failures), the entire cluster could hang.