Recently I had to extend the RAC database home of an existing RAC cluster to two new servers using addnode.sh and it was hanging at 100%. The particular version of the Oracle database is 11.2.0.3 running on Oracle Enterprise Linux 5.8. I didn’t see anything in the log files but notice that the underlying java server process is consuming 100% of CPU when issuing top. I try restarting the addnode.sh step multiple times but I continue seeing the same behavior (even waiting more than 60 minutes with no progression).
I start by increasing the Java heap size for the JRE used by the OUI from the default of 140m to 512m.
To do so I modify the file $ORACLE_HOME/oui/oraparm.ini.
JRE_MEMORY_OPTIONS=" -mx512m"
After this I see the process progresses a little further but now I’m actually seeing a Java heap space error in the log. This information is welcome and helpful as it has more meaning now.
Exception java.lang.OutOfMemoryError: Java heap space occurred.. java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$CharsetSE.encode(StringCoding.java:334) at java.lang.StringCoding.encode(StringCoding.java:378) at java.lang.String.getBytes(String.java:812) at java.io.UnixFileSystem.checkAccess(Native Method) at java.io.File.canRead(File.java:660) at oracle.cluster.deployment.ractrans.DirectoryMap.processDir(DirectoryMap.java:294) at oracle.cluster.deployment.ractrans.DirectoryMap.processDir(DirectoryMap.java:287) at oracle.cluster.deployment.ractrans.DirectoryMap.processDir(DirectoryMap.java:287) at oracle.cluster.deployment.ractrans.DirectoryMap.processDir(DirectoryMap.java:287) at oracle.cluster.deployment.ractrans.DirectoryMap.<init>(DirectoryMap.java:169) at oracle.cluster.deployment.ractrans.DirListing.<init>(DirListing.java:301) at oracle.cluster.deployment.ractrans.DirListing.<init>(DirListing.java:165) at oracle.cluster.deployment.ractrans.RACTransferCore.createDirListing(RACTransferCore.java:211) at oracle.cluster.deployment.ractrans.RACTransfer.createDirListing(RACTransfer.java:1907) at oracle.cluster.deployment.ractrans.RACTransfer.transferDirStructureToNodes(RACTransfer.java:619) at oracle.cluster.deployment.ractrans.RACTransfer.transferDirToNodes(RACTransfer.java:256) at oracle.ops.mgmt.cluster.ClusterCmd.transferDirToNodes(ClusterCmd.java:3168) at oracle.ops.mgmt.cluster.ClusterCmd.transferDirToNodes(ClusterCmd.java:3086) at oracle.sysman.oii.oiip.oiipg.OiipgClusterOps.transferDirToNodes(OiipgClusterOps.java:947) at oracle.sysman.oii.oiif.oiifw.OiifwClusterCopyWCCE.doOperation(OiifwClusterCopyWCCE.java:544) at oracle.sysman.oii.oiif.oiifb.OiifbCondIterator.iterate(OiifbCondIterator.java:171) at oracle.sysman.oii.oiif.oiifw.OiifwAddNodePhaseWCDE.doOperation(OiifwAddNodePhaseWCDE.java:313) at oracle.sysman.oii.oiif.oiifb.OiifbCondIterator.iterate(OiifbCondIterator.java:171) at oracle.sysman.oii.oiic.OiicPullSession.doOperation(OiicPullSession.java:1380) at oracle.sysman.oii.oiic.OiicSessionWrapper.doOperation(OiicSessionWrapper.java:294) at oracle.sysman.oii.oiic.OiicInstaller.run(OiicInstaller.java:579) at oracle.sysman.oii.oiic.OiicInstaller.runInstaller(OiicInstaller.java:969) at oracle.sysman.oii.oiic.OiicInstaller.main(OiicInstaller.java:906)
I increase the memory again but this time to 1024m (JRE_MEMORY_OPTIONS=” -mx1024m”) and try it again. This time it completes successfully and I’m able to continue with extending the RAC database home tasks.
What is odd is this was happening to me in production but I had already completed this 3 times prior in non-production successfully with the default Java memory of 140m. The story of a DBA’s life, no matter how many times you practice ahead of time sometimes we might run into unforeseen issues and of course it usually happens in production. 🙂
Hey Alfredo, thank you very much!
Interestingly I have added nodes for 5-6 times before in the exact same environment, all of them were successful. Only now I had this issue but it persisted on all the nodes.
Thanks again!