Text box

Please note that the problems that i face may not be the same that exist on your environment so please test before applying the same steps i followed to solve the problem .

Tuesday, 8 February 2011

Stalled or Hanged Grid 11GR2 installation at 65% - Performing Remote Operations

In case the process of performing remote operations hangs while installing Rac then check your firewall and disable it using service iptables stop.
if the installation continues to work it is fine if it is not then deinstall and install again.
The process that is responsable for transporting is ractrans.
you can check it using command:
lsof -i:
sshd       3131  root    3u  IPv6   9635       TCP rac2.localdomain:ssh->192.168.0.1:savant (ESTABLISHED)
sshd       3131  root    7u  IPv4   9676       TCP localhost.localdomain:x11-ssh-offset (LISTEN)
sshd       3131  root    8u  IPv6   9677       TCP [::1]:x11-ssh-offset (LISTEN)
ractrans  18253  grid    0u  IPv4 100913       TCP rac2.localdomain:44281->rac1.localdomain:42729 (ESTABLISHED)
ractrans  18253  grid    3u  IPv4 100912       TCP *:44281 (LISTEN)

Monday, 24 January 2011

How to forcefully ‘deconfig’ Grid cluster configuration in 11gR2

I was installing 11gR2 RAC with Grid infrastructure on a 2 node AIX cluster. I did all the steps (normally what I do on Solaris), but missed few AIX specific steps & my root.sh failed.
# /grid/11.2.0/root.sh
Running Oracle 11g root.sh script…
The following environment variables are set as:
ORACLE_OWNER= ora11gr2
ORACLE_HOME= /grid/11.2.0
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file “dbhome” already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying dbhome to /usr/local/bin …
The file “oraenv” already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying oraenv to /usr/local/bin …
The file “coraenv” already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying coraenv to /usr/local/bin …

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-03-02 07:55:55: Parsing the host name
2010-03-02 07:55:55: Checking for super user privileges
2010-03-02 07:55:55: User has super user privileges
Using configuration parameter file: /grid/11.2.0/crs/install/crsconfig_params
Creating trace directory
User ora11gr2 is missing the following capabilities required to run CSSD in realtime:
CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
To add the required capabilities, please run:
/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE ora11gr2
CSS cannot be run in realtime mode at /grid/11.2.0/crs/install/crsconfig_lib.pm line 8119.
So root.sh returned the error & asked me to run chuser command with above options. After executing this command on both the nodes, I again executed the root.sh, buut it failed with message “Deconfigure the existing cluster configuration before starting”
bash-2.05b# /grid/11.2.0/root.sh
Running Oracle 11g root.sh script…
The following environment variables are set as:
ORACLE_OWNER= ora11gr2
ORACLE_HOME= /grid/11.2.0
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file “dbhome” already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
The file “oraenv” already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
The file “coraenv” already exists in /usr/local/bin. Overwrite it? (y/n) [n]:
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2010-03-02 08:07:32: Parsing the host name
2010-03-02 08:07:32: Checking for super user privileges
2010-03-02 08:07:32: User has super user privileges
Using configuration parameter file: /grid/11.2.0/crs/install/crsconfig_params
Improper Oracle Clusterware configuration found on this host
Deconfigure the existing cluster configuration before starting
to configure a new Clusterware
run ‘/grid/11.2.0/crs/install/rootcrs.pl -deconfig’
to configure existing failed configuration and then rerun root.sh
So I tried, but when I executed /grid/11.2.0/crs/install/rootcrs.pl -deconfig, it error out, saying, it could not communicate with CRS & asked me to start the CRS. But funny part is, CRS was not yet configured. In short it was going in a circular fashion.
In this scenario, -force option with -deconfig, will be very handy
bash-2.05b# /grid/11.2.0/crs/install/rootcrs.pl -deconfig -force -verbose
2010-03-02 08:11:29: Parsing the host name
2010-03-02 08:11:29: Checking for super user privileges
2010-03-02 08:11:29: User has super user privileges
Using configuration parameter file: /grid/11.2.0/crs/install/crsconfig_params
PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.eons is registered
Cannot communicate with crsd
Failure at scls_scr_setval with code 8
Internal Error Information:
Category: -2
Operation: failed
Location: scrsearch3
Other: id doesnt exist scls_scr_setval
System Dependent Information: 2
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Successfully deconfigured Oracle clusterware stack on this node
And finally, even though it could not communicate with CRS, it successfully deconfigured Oracle clusterware stack

Friday, 21 January 2011

Performance Tip

Time and Time Again,DBA's spend time and energy tuning a component of their database that is  not a top wait event and they are surprised to find that their change didnot make a huge difference in performance for example:-a faster CPU doesnot help a CPU-Bound  system.Always examine the Top Five Events :)

Thursday, 13 January 2011

Cluster verify utility fails with the user eqivelence test

./runcluvfy.sh stage -pre crsinst -fixup -n rac1.locamain,rac2.localdomain -verbose

Performing pre-checks for cluster services setup

Checking node reachability...

Check: Node reachability from node "Rac1"
  Destination Node                      Reachable?
  ------------------------------------  ------------------------
  rac1                               yes
  rac1                               yes
Result: Node reachability check passed from node "rac1"


Checking user equivalence...

Check: User equivalence for user "grid"
  Node Name                             Comment
  ------------------------------------  ------------------------
  rac1                               failed
  rac2                               failed
Result: PRVF-4007 : User equivalence check failed for user "grid"

ERROR:
User equivalence unavailable on all the specified nodes
Verification cannot proceed


Pre-check for cluster services setup was unsuccessful on all the nodes.

Fix:
Say you have two nodes, rac1 and rac2 and user oracle. You have created the ssh keys in oracle on both nodes.

On node: rac1

ssh rac2
ssh rac2.domain
ssh rac2-priv
ssh rac2-priv.domain
ssh rac1
ssh rac1.domain
ssh rac1-priv
ssh rac1-priv.domain
exec /usr/bin/ssh-agent $SHELL
/usr/bin/ssh-add

On node: rac2

ssh rac2
ssh rac2.domain
ssh rac2-priv
ssh rac2-priv.domain
ssh rac1
ssh rac1.domain
ssh rac1-priv
ssh rac1-priv.domain
exec /usr/bin/ssh-agent $SHELL
/usr/bin/ssh-add
Phew! Now all ssh combos are covered and you can reach everywhere including yourself. A lot of typing YES and even worse when the the node count increases as each has to reach all the others! I have had the equivalence check fail and realised I hadn't checked on the node, back to itself, just to the other node.