Dbvisit Community Dbvisit home       

Go Back   Dbvisit Community > General Technical > Oracle RAC

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 03-30-2008, 01:06 PM
Geofrey Rainey Geofrey Rainey is offline
Banned
 
Join Date: Jan 2008
Location: Auckland, New Zealand
Posts: 4
Default How To Remove a Node in a RAC Cluster in the Event of a Node Becoming Corrupted

Recently I encountered a situation in an Oracle RAC cluster whereby
files were accidentally deleted within the CRS_HOME on one of the nodes
resulting in a node failure. I discovered that the conventional method
of node removal and addition from the cluster didn't work.
This document describes how to clean up a cluster in such a situation.

1. During or after the conventional Oracle method of removing a node,
(as documented in Note:269320.1, Removing a Node from a 10g RAC Cluster),
various errors might be encountered such as;
Code:
[oracle@<working node name> bin]$ ./srvctl stop nodeapps -n <broken node name>
CRS-0216: Could not stop resource 'ora.<broken node name>.ons'.
CRS-0216: Could not stop resource 'ora.<broken node name>.vip'.
CRS-0216: Could not stop resource 'ora.<broken node name>.gsd'.
[oracle@<working node name> bin]$

[root@<working node name> bin]# ./srvctl remove nodeapps -n <broken node name>
Please confirm that you intend to remove the node-level applications 
on node <broken node name> (y/[n]) y
PRKO-2112 : Some or all node applications are not removed 
successfully on node: <broken node name>
2. However, according to the OCR all information for the broken node has been removed.

Code:
[oracle@<working node name> bin]$ ./crs_stat -u
NAME=ora.<working node name>.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>

NAME=ora.cmastage.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>

NAME=ora.<working node name>.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=OFFLINE

NAME=ora.<working node name>.LISTENER_<WORKING_NODE_NAME>.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>

NAME=ora.<working node name>.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>

NAME=ora.<working node name>.ons
TYPE=application
TARGET=ONLINE
STATE=UNKNOWN on <working node name>

NAME=ora.<working node name>.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on <working node name>
3. If there still appear to be resources in the ocr for the broken node,
they can be removed as follows:
Code:
$CRS_HOME/bin/crs_unregister <resource name>
(where resource name is acquired from the output of the crs_stat command as above)

4. Now one might think the procedure has completed okay and that the broken node
can be added back into the cluster using the standard add node procedure but alas,
all sorts of weird errors might be encountered from here on in, if so this indicates
that the OCR might have become corrupted and will need to be re-initialised. This will
require an outage to the cluster and is detailed below.

5. Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.

6. Execute the following on all nodes:
Code:
<CRS_HOME>/install/rootdelete.sh
7. Execute the following on the node which is supposed to be the first node:

Code:
<CRS_HOME>/install/rootdeinstall.sh
8. The following commands should return nothing

Code:
ps -e | grep -i 'ocs[s]d'
ps -e | grep -i 'cr[s]d.bin'
ps -e | grep -i 'ev[m]d.bin'
9. Execute <CRS_HOME>/root.sh on first node

10. After successful root.sh execution on first node, execute root.sh on the rest of the nodes of the cluster.

11. The nodeapps might need to be added manually using the srvctl command as follows (as root user for each node):

Code:
[root@<working node name> bin]# ./srvctl add nodeapps -n <working node name> -o /u01/app/oracle/product/10.2/db_1 -A <working node name vip>/<netmask>/<device name>
(where <working node name vip> = hosts file entry for vip, or IP address, and <device name> = device name such as eth0)

12. Add the database to the OCR using the appropriate srvctl add database command as the user who owns the database,
ensure that this is not run as root user

13. Add ASM, DB, Instance, services using approproate srvctl add commands.

14. Add the listener using netca. This may give errors if the listener.ora contains the entries already.
If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the
$TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca.
Add all the listeners that were added earlier.

Also see: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux

References:

Removing a Node from a 10g RAC Cluster. Note:269320.1 (Oracle Metalink)
Re-initialising the OCR. Note:399482.1 (Oracle Metalink)

Last edited by Arjen Visser; 11-22-2010 at 03:44 PM.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT +12. The time now is 04:52 PM.