Text box

Please note that the problems that i face may not be the same that exist on your environment so please test before applying the same steps i followed to solve the problem .

Wednesday 12 February 2014

Prerequisite for installaing 11g RAC/grid infrastructure Release 2 on solaries-64 bit operating system

Creating Job Role Separation Operating System Privileges Groups and Users:
Run the following commands:

/usr/sbin/groupadd -g 1020 asmadmin

/usr/sbin/groupadd -g 1022 asmoper

/usr/sbin/groupadd -g 1021 asmdba

/usr/sbin/groupadd -g 1032 oper

/usr/sbin/useradd -u 1100 -g oinstall -G dba grid

/usr/sbin/useradd -u 1101 -g oinstall -G dba,asmdba oracle
/usr/sbin/usermod -g oinstall -G dba,asmdba oracle
mkdir -p  /app/11.2.0/grid
mkdir -p /app/grid
mkdir -p /app/oracle 
chown grid:oinstall /app/11.2.0
chown grid:oinstall /app/11.2.0/grid
chown grid:oinstall /app/grid
chown oracle:oinstall /app/oracle

Configuring Shell Limits:
Recommended Value
Shell STACK
32768 as Minimum value
Shell NOFILES
4096 as Minimum value
DATA
1048576 as Minimum value
VMEMORY
16777216 can be increased
TIME
-1 (Unlimited)
FILE
-1 (Unlimited)










Software Requirements List for Solaris Operating System (SPARC 64-Bit) Platforms:

Item
Requirements
Operating System, Packages and patches for Oracle Solaris 11
Oracle Solaris 11 (11/2011 SPARC) or later, for Oracle Grid
Infrastructure release 11.2.0.3 or later.
pkg://solaris/developer/build/make
pkg://solaris/developer/assembler
No special kernel parameters or patches are required at this time.
Operating System, Packages and Patches for Oracle Solaris 10
Oracle Solaris 10 U6 (5.10-2008.10)
SUNWarc
SUNWbtool
SUNWcsl
SUNWhea
SUNWi1cs (ISO8859-1)
SUNWi15cs (ISO8859-15)
SUNWi1of
SUNWlibC
SUNWlibm
SUNWlibms
SUNWsprot
SUNWtoo
SUNWxwfnt
119963-14: Sun OS 5.10: Shared Library Patch for C++
120753-06: SunOS 5.10: Microtasking libraries (libmtsk)
patch
139574-03: SunOS 5.10
141414-02
141414-09 (11.2.0.2 or later)
146808-01 (for Solaris 10 U9 or earlier)
Database Smart Flash Cache (An Enterprise Edition only feature.)
The following patches are required for Oracle Solaris (SPARC
64-Bit) if you are using the flash cache feature:
125555-03
139555-08
140796-01
140899-01
141016-01
141414-10
141736-05
IPMI
The following patches are required only if you plan to configure Failure Isolation using IPMI on SPARC systems:
137585-05 or later (IPMItool patch)
137594-02 or later (BMC driver patch)
Oracle RAC
Oracle Clusterware is required; Oracle Solaris Cluster is supported
for use with Oracle RAC on SPARC. If you use Oracle Solaris Cluster 3.2, then you must install the following additional kernel
packages and patches:
SUNWscucm 3.2.0: 126106-40
VERSION=3.2.0,REV=2006.12.05.22.58 or later
125508-08
125514-05
125992-04
126047-11
126095-05
126106-33
Note: You do not require the additional packages if you are using
Oracle Clusterware only, without Oracle Solaris Cluster.
If you use a volume manager, then you may need to install additional kernel packages.
Packages and patches for Oracle Solaris Cluster
Note: You do not require Oracle Solaris Cluster to install Oracle
Clusterware.
For Oracle Solaris 11, Oracle Solaris Cluster 4.0 is the minimum supported Oracle Solaris Cluster version.
For Oracle Solaris 10, Oracle Solaris Cluster 3.3 or later
UDLM (optional):
ORCLudlm 64-Bit reentrant 3.3.4.10
CAUTION: If you install the ORCLudlm package, then it is detected
automatically and used. Install ORCLudlm only if you want to use the
UDLM interface for your Oracle RAC cluster. Oracle recommends
with Oracle Solaris Cluster 3.3 and later that you use the native
cluster membership functionality provided with Oracle Solaris
Cluster.
For more information, refer to Section 2.8, "Oracle Solaris Cluster
Configuration on SPARC Guidelines."
For Oracle Solaris Cluster on SPARC, install UDLM onto each node
in the cluster using the patch Oracle provides in the Grid_
home/clusterware/udlm directory before installing and configuring
Oracle RAC. Although you may have a functional version of the
UDLM from a previous Oracle Database release, you must install
the Oracle 11g release 2 (11.2) 3.3.4.10 UDLM.
Oracle Messaging Gateway
Oracle Messaging Gateway supports the integration of Oracle Streams Advanced Queuing (AQ) with the following software:
IBM MQSeries V6 (6.6.0), client and server Tibco Rendezvous 7.2
Pro*C/C++,
Oracle Call Interface,
Oracle C++ Call Interface,
Oracle XML Developer's Kit
(XDK)
Oracle Solaris Studio 12 (formerly Sun Studio) (C and C++ 5.9)
119963-14: SunOS 5.10: Shared library patch for C++
124863-12 C++ SunOS 5.10 Compiler Common patch for Sun C C++
(optional)

Oracle ODBC Driver


gcc 3.4.2
Open Database Connectivity (ODBC) packages are only needed if
you plan on using ODBC. If you do not plan to use ODBC, then you
do not need to install the ODBC RPMs for Oracle Clusterware,
Oracle ASM, or Oracle RAC.

Programming languages for
Oracle RAC database
 Pro*COBOL
Micro Focus Server Express 5.1
 Pro*FORTRAN
Oracle Solaris Studio 12 (Fortran 95)
Download at the following URL:
http://www.oracle.com/technetwork/server-storage/solarisst
udio/overview/index.html
Oracle JDBC/OCI Drivers
You can use the following optional JDK versions with the Oracle
JDBC/OCI drivers, however they are not required for the
installation:
 JDK 6 Update 20 (JDK6 - 1.6.20) or later
 JDK 5 (1.5.0_24) or later
Note: JDK 6 is the minimum level of JDK supported on Oracle
Solaris 11.
SSH
Oracle Clusterware requires SSH. The required SSH software is the default SSH shipped with your operating system.


Operating System Kernel Requirements:

Verifying UDP and TCP Kernel Parameters
/usr/sbin/ndd -set /dev/tcp tcp_smallest_anon_port 9000



/usr/sbin/ndd -set /dev/tcp tcp_largest_anon_port 65500



/usr/sbin/ndd -set /dev/udp udp_smallest_anon_port 9000



/usr/sbin/ndd -set /dev/udp udp_largest_anon_port 65500



Storage Requirements:
Item
Requirement
/tmp
To be increased to 3G
Swap
Available RAM Swap Space Required
If RAM Between 2.5 GB and 16 GB Then Equal to the size of RAM
If RAM More than 16 GB Then Equal to 16 GB
3 volume groups
5G for each
and run the following for each disk where dsk1ocr is an example for the disk:
chown grid:asmadmin dsk1ocr
chmod 660 dsk1ocr
Or clusterfile system can be used instead.
/app
To be increased to 150G


Checking Resource Limits for Solaris
On Solaris platforms, the /etc/pam.conf file controls and limits resources for users on the system. On login, control and limit resources should be set for users on the system so that users are unable to perform denial of service attacks.
By default, PAM resource limits are not set for Solaris operating systems. To ensure that resource limits are honored, add the following line to the login service section of /etc/pam/conf: login auth required pam_dial_auth.so.1
For example:
# login service (explicit because of pam_dial_auth)
#
login auth requisite pam_authtok_get.so.1
login auth required pam_dhkeys.so.1
login auth required pam_unix_cred.so.1
login auth required pam_unix_auth.so.1
login auth required pam_dial_auth.so.1



Network Time Protocol Setting
Oracle Clusterware requires the same time zone setting on all cluster nodes. During installation, the installation process picks up the time zone setting of the Grid installation owner on the node where OUI runs, and uses that on all nodes as the default TZ setting for all processes managed by Oracle Clusterware. This default is used for databases, Oracle ASM, and any other managed processes.You have two options for time synchronization: an operating system configured network time protocol (NTP), or Oracle Cluster Time Synchronization Service. Oracle Cluster Time Synchronization Service is designed for organizations whose cluster servers are unable to access NTP services. If you use NTP, then the Oracle Cluster Time Synchronization daemon (ctssd) starts up in observer mode. If you do not have NTP daemons, then ctssd starts up in active mode and synchronizes time among cluster members without contacting an external time server.On Oracle Solaris Cluster systems, Oracle Solaris Cluster software supplies a template file called ntp.cluster (see /etc/inet/ntp.cluster on an installed cluster host) that establishes a peer relationship between all cluster hosts. One host is designated as the preferred host. Hosts are identified by their private host names. Time synchronization occurs across the cluster interconnect. If Oracle Clusterware detects either that the Oracle Solaris Cluster NTP or an outside NTP server is set default NTP server in the system in the /etc/inet/ntp.conf or the /etc/inet/ntp.conf.cluster files, then CTSS is set to the observer mode.

Note: Before starting the installation of the Oracle Grid Infrastructure, Oracle recommends that you ensure the clocks on all nodes are set to the same time.If you have NTP daemons on your server but you cannot configure them to synchronize time with a time server, and you want to use Cluster Time
Synchronization Service to provide synchronization service in the cluster, then deactivate and deinstall the NTP.
To disable the NTP service, run the following command as the root user
# /usr/sbin/svcadm disable ntp
When the installer finds that the NTP protocol is not active, the Cluster Time Synchronization Service is installed in active mode and synchronizes the time across the nodes.
To confirm that ctssd is active after installation, enter the following command as the
Grid installation owner:
$ crsctl check ctss
If you are using NTP, and you prefer to continue using it instead of Cluster Time
Synchronization Service, then you need to modify the NTP initialization file to enable
slewing, which prevents time from being adjusted backward. Restart the network time
protocol daemon after you complete this task. To do this on Oracle Solaris without Oracle Sun Cluster, edit the /etc/inet/ntp.conf file to add "slewalways yes" and "disable pll" to the file. After you make these changes, restart ntpd (on Oracle Solaris 11) or xntpd (on Oracle Solaris 10) using the command /usr/sbin/svcadm restart ntp.
To do this on Oracle Solaris 11 with Oracle Solaris Sun Cluster 4.0, edit the /etc/inet/ntp.conf.sc file to add "slewaways yes" and "disablepll" to the file. After you make these changes, restart ntpd or xntpd using the command  /usr/sbin/svcadmn restart ntp. To do this on Oracle Solaris 10 with Oracle Sun
Cluster 3.2, edit the /etc/inet/ntp.conf.cluster file.
To enable NTP after it has been disabled, enter the following command:
# /usr/sbin/svcadm enable ntp


Automatic SSH Configuration During Installation
To install Oracle software, Secure Shell (SSH) connectivity should be set up between all cluster member nodes. OUI uses the ssh and scp commands during installation to run remote commands on and copy files to the other cluster nodes. You must configure SSH so that these commands do not prompt for a password.

Note: SSH is used by Oracle configuration assistants for configuration operations from local to remote nodes. It is also used by Oracle Enterprise Manager. You can configure SSH from the OUI interface during installation for the user account running the installation. The automatic configuration creates passwordless SSH connectivity between all cluster member nodes. Oracle recommends that you use the automatic procedure if possible.
To enable the script to run, you must remove stty commands from the profiles of any Oracle software installation owners, and remove other security measures that are triggered during a login, and that generate messages to the terminal. These messages, mail checks, and other displays prevent Oracle software installation owners from using the SSH configuration script that is built into the Oracle Universal Installer. If they are not disabled, then SSH must be configured manually before an installation
can be run.
By default, OUI searches for SSH public keys in the directory /usr/local/etc/, and ssh-keygen binaries in /usr/local/bin. However, on Oracle Solaris, SSH public keys typically are located in the path /etc/ssh, and ssh-keygen binaries are located in the path /usr/bin. To ensure that OUI can set up SSH, use the following command to
create soft links:
# ln -s /etc/ssh /usr/local/etc
# ln -s /usr/bin /usr/local/bin
In rare cases, Oracle Clusterware installation may fail during the "AttachHome" operation when the remote node closes the SSH connection. To avoid this problem, set the following parameter in the SSH daemon configuration file /etc/ssh/sshd_config on all cluster nodes to set the timeout wait to unlimited:
LoginGraceTime 0

Network Requirements:
1.      SCAN VIP IPs:
     a.      Need 3 IPs used for scan VIP with single name for them.
      b.      SCAN is a domain name registered to three IP addresses in the  
                domain name service (DNS). The SCAN name (a domain name)  
                 must be set up to round robin across 3 IP addresses. This
                  requires a SCAN name resolution via domain name service
                  (DNS).

          c.      Cluster Name: portalcluster
          d.      SCAN Name:  Portalcrsscan which will have the three IPs.
2.      One IP for single database.
3.      Private Interconnect:
        a.      Each interface on single machine has to be on different VLAN.
        b.      Interface Name should be identical on all nodes and within the
                  same VLAN.

Issues that you may face during 11.2.0.2  installation:
·      
Both search and domain entries are present in file "/etc/resolv.conf" with the same value will cause the cluster verify utility to fail during checking the prerequisites. You can guarantee everything is working properly by running cluster verify utility before the installation.
·      
Swap Space should be handled in a smart way and as per the oracle documentation requirements for example if you will assign 16G memory then swap space should be at least 16G.The cluster verify utility will calculate the needed swap and will let you know if you must change the swap space assigned in case you run it in verbose mode.

·       Please note that "project.max-shm-memory" represent the maximum shared memory available for a project, so the value for this parameter should be greater than sum of all SGA sizes for all databases in your environment. 

·       The last step of cluster verify utility failed with below error:

Errors &issues appeared during last step of installing Grid Infrastrcuture:

INFO: Liveness check failed for "xntpd"
INFO: Check failed on nodes:
INFO: svprtldb02,svprtldb01
INFO: PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
INFO: PRVF-5415 : Check to see if NTP daemon or service is running failed
INFO: Clock synchronization check using Network Time Protocol(NTP) failed
INFO: PRVF-9652 : Cluster Time Synchronization Services check failed
INFO: Checking VIP configuration.
INFO: Checking VIP Subnet configuration.
INFO: Check for VIP Subnet configuration passed.
INFO: Checking VIP reachability
INFO: Check for VIP reachability passed.
INFO: Starting check for The SSH LoginGraceTime setting ...
INFO: WARNING:
INFO: PRVE-0038 : The SSH LoginGraceTime setting on node "svprtldb02" may result in users being disconnected before login is completed
INFO: PRVE-0038 : The SSH LoginGraceTime setting on node "svprtldb01" may result in users being disconnected before login is completed

The solution was to disable the NTP during the installation and then enable it back after installation because of the process CTSS  will be in observer mode.Please remove or remove /etc/ntp.conf or /etc/xntp.conf from all nodes and check no ntp daemon is running.

After disabling the NTDP process run the below:

grid@svprtldb01:/apps/grid/clusterverify/bin$ ./cluvfy comp clocksync -n svprtldb01,svprtldb02

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP


Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP configuration file "/etc/inet/ntp.conf" existence check passed
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Check of common NTP Time Server passed
Clock time offset check passed

Clock synchronization check using Network Time Protocol(NTP) passed


Oracle Cluster Time Synchronization Services check passed

Verification of Clock Synchronization across the cluster nodes was successful.



·        
I got the below errors as the disks were mounted to one node but refused to mount to the other node .The first question came to my mind if the disks permissions are set correctly on all nodes but disks permissions were ok.

Errors & issues:
======================================================
Sat Feb 08 23:23:52 2014
Starting background process RSMN
Sat Feb 08 23:23:52 2014
RSMN started with pid=34, OS id=12977
Sat Feb 08 23:23:52 2014
Sweep [inc][217]: completed
Sweep [inc][177]: completed
Sweep [inc2][217]: completed
Sweep [inc2][177]: completed
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Sat Feb 08 23:23:54 2014
ALTER DATABASE MOUNT
NOTE: Loaded library: System
ORA-15025: could not open disk "/dev/rdsk/c3d17s0"
ORA-27041: unable to open file
SVR4 Error: 13: Permission denied
Additional information: 9
SUCCESS: diskgroup SYSTEM_WEB was mounted
Errors in file /app/oracle/diag/rdbms/webprd/WEBPRD2/trace/WEBPRD2_ckpt_12943.trc (incident=
4177):
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [
], [], [], []
Incident details in: /app/oracle/diag/rdbms/webprd/WEBPRD2/incident/incdir_4177/WEBPRD2_ckpt_12943_i4177.trc

To solve it follow the below Oracle  note:

Implement the solution provided in oracle note :ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1)

·        
After installation was successful i noticed that the process ora.crf status is not started  on both nodes. Seems that it was a bug in 11.2.0.2 and applying patchset update 12 solved that issue.

Errors & Issues:
================================================
root@svprtldb02:/app/grid/bin# ./crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE svprtldb02 Started
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE ONLINE svprtldb02
ora.cssd
1 ONLINE ONLINE svprtldb02
ora.cssdmonitor
1 ONLINE ONLINE svprtldb02
ora.ctssd
1 ONLINE ONLINE svprtldb02 OBSERVER
ora.diskmon
1 ONLINE ONLINE svprtldb02
ora.evmd
1 ONLINE ONLINE svprtldb02
ora.gipcd
1 ONLINE ONLINE svprtldb02
ora.gpnpd
1 ONLINE ONLINE svprtldb02
ora.mdnsd
1 ONLINE ONLINE svprtldb02

After successful implementation of the patch set update the issue was fixed.

·       After successful installation of grid infrastructure and databases a node was rebooted for maintenance by System admin team and after startup the setup of the private interconnect was removed and required them to re-setup it again.
Also everything was ok and we started the node the issue was that there was a bug with the 11.2.0.2 and this bug was fixed in 11.2.0.3 patchset updates.

Symptoms:

GIPCD
-----
Line 16: 2014-02-10 05:02:04.504: [GIPCDMON][7] gipcdMonitorSaveInfMetrics: inf[ 0] ipmp0 - rank -1, avgms 30000000000.000000 [ 0 / 0 / 0 ]
...
Line 63636: 2014-02-10 12:42:25.549: [GIPCDMON][7] gipcdMonitorSaveInfMetrics: inf[ 0] ipmp0 - rank -1, avgms 30000000000.000000 [ 0 / 0 / 0 ]

Oracle Support notes regarding the issues:
Doc ID 1479380.1: 11gR2 GI Node May not Join the Cluster After Private Network is Functional After Eviction due to Private Network Problem 

For patches in 11.2.0.3 and above I kindly refer to Doc ID 1479380.1



A workaround for this bug follow the below:
a)   On surviving node, during non-peak time if possible, kill gipcd.bin process (kill -15 <gipcd.bin ospid>)
NOTE: In 11.2 will also lead to death of evmd.bin, crsd.bin and ctssd.bin processes. None of these processes are fatal. Clusterware will respawn all them automatically.

b)  Once gipcd.bin, evmd.bin, crsd.bin and ctssd.bin processes have been re-spawned on the surviving node, verify whether other nodes join cluster.

c)   Most of the time, GI will start, but in case it does not, re-start GI on the other nodes with crsctl command.

d)  Finally, if GI is still not starting, as a last resort, restart GI on the surviving node.







General Notes:
To check the RAC network issues, please upload the following:

1) Please upload results:

$GRID_HOME/bin/srvctl config network

$GRID_HOME/bin/srvctl config nodeapps -a

$GRID_HOME/bin/srvctl config scan

2) ++Kindly get the cluvfy utility output:

cluvfy stage -pre crsinst -n [nodelist] -verbose

References:
CTSSD Runs in Observer Mode Even Though No Time Sync Software is Running (Doc ID 1054006.1)
ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1).
ASM Metadata information using:Note:470211.1 "How To Gather/Backup ASM Metadata In A Formatted Manner?".
How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation (Doc ID 942166.1



No comments:

Post a Comment