![]() |
Dbvisit home |
|
|||||||
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Recently I encountered a situation in an Oracle RAC cluster whereby
files were accidentally deleted within the CRS_HOME on one of the nodes resulting in a node failure. I discovered that the conventional method of node removal and addition from the cluster didn't work. This document describes how to clean up a cluster in such a situation. 1. During or after the conventional Oracle method of removing a node, (as documented in Note:269320.1, Removing a Node from a 10g RAC Cluster), various errors might be encountered such as; Code:
[oracle@<working node name> bin]$ ./srvctl stop nodeapps -n <broken node name> CRS-0216: Could not stop resource 'ora.<broken node name>.ons'. CRS-0216: Could not stop resource 'ora.<broken node name>.vip'. CRS-0216: Could not stop resource 'ora.<broken node name>.gsd'. [oracle@<working node name> bin]$ [root@<working node name> bin]# ./srvctl remove nodeapps -n <broken node name> Please confirm that you intend to remove the node-level applications on node <broken node name> (y/[n]) y PRKO-2112 : Some or all node applications are not removed successfully on node: <broken node name> Code:
[oracle@<working node name> bin]$ ./crs_stat -u NAME=ora.<working node name>.inst TYPE=application TARGET=ONLINE STATE=ONLINE on <working node name> NAME=ora.cmastage.db TYPE=application TARGET=ONLINE STATE=ONLINE on <working node name> NAME=ora.<working node name>.ASM1.asm TYPE=application TARGET=ONLINE STATE=OFFLINE NAME=ora.<working node name>.LISTENER_<WORKING_NODE_NAME>.lsnr TYPE=application TARGET=ONLINE STATE=ONLINE on <working node name> NAME=ora.<working node name>.gsd TYPE=application TARGET=ONLINE STATE=ONLINE on <working node name> NAME=ora.<working node name>.ons TYPE=application TARGET=ONLINE STATE=UNKNOWN on <working node name> NAME=ora.<working node name>.vip TYPE=application TARGET=ONLINE STATE=ONLINE on <working node name> they can be removed as follows: Code:
$CRS_HOME/bin/crs_unregister <resource name> 4. Now one might think the procedure has completed okay and that the broken node can be added back into the cluster using the standard add node procedure but alas, all sorts of weird errors might be encountered from here on in, if so this indicates that the OCR might have become corrupted and will need to be re-initialised. This will require an outage to the cluster and is detailed below. 5. Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user. 6. Execute the following on all nodes: Code:
<CRS_HOME>/install/rootdelete.sh Code:
<CRS_HOME>/install/rootdeinstall.sh Code:
ps -e | grep -i 'ocs[s]d' ps -e | grep -i 'cr[s]d.bin' ps -e | grep -i 'ev[m]d.bin' 10. After successful root.sh execution on first node, execute root.sh on the rest of the nodes of the cluster. 11. The nodeapps might need to be added manually using the srvctl command as follows (as root user for each node): Code:
[root@<working node name> bin]# ./srvctl add nodeapps -n <working node name> -o /u01/app/oracle/product/10.2/db_1 -A <working node name vip>/<netmask>/<device name> 12. Add the database to the OCR using the appropriate srvctl add database command as the user who owns the database, ensure that this is not run as root user 13. Add ASM, DB, Instance, services using approproate srvctl add commands. 14. Add the listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier. Also see: Remove a Node from an Existing Oracle RAC 10g R1 Cluster on Linux References: Removing a Node from a 10g RAC Cluster. Note:269320.1 (Oracle Metalink) Re-initialising the OCR. Note:399482.1 (Oracle Metalink) Last edited by Arjen Visser; 11-22-2010 at 02:44 PM. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|