VIP不能正常启动
描述:我们的环境是2节点RAC,节点1发生物理故障造成宕机。
此时我想将节点1的VIP从节点2上启动,以便单节点对用户程序透明。[Oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` failed.
CRS-1006: No more members to considerCRS-0215: Could not start resource "ora.unid01.vip".[oracle@UNID02 ~]$
但是启动的时候报错CRS-1006: No more members to consider。
查看VIP日志(位于$CRS_HOME/log/<NODENAME>/racg),发现报网卡相关错:
2013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: checkIf: interface eth0 is down
Invalid parameters, or failed to bring up VIP (host=UNID02) ==============================>2013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start unid012013-12-10 09:50:26.877: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s2013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check unid012013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: clsrcexecut: rc = 1, time = 3.130s2013-12-10 09:50:30.010: [ RACG][2345793472] [16627][2345793472][ora.unid01.vip]: end for resource = ora.unid01.vip, action = start, status = 1, time = 6.280s
013-12-10 01:17:41.966: [ COMMCRS][1472985408]clsc_receive: (0x2aaaac1428c0) error 22013-12-10 09:50:23.702: [ CRSRES][1538058560] startRunnable: setting CLI values
2013-12-10 09:50:23.705: [ CRSRES][1538058560] Attempting to start `ora.unid01.vip` on member `UNID02`
2013-12-10 09:50:30.012: [ CRSAPP][1538058560] StartResource error for ora.unid01.vip error code = 1
2013-12-10 09:50:33.198: [ CRSRES][1538058560] Start of `ora.unid01.vip` on member `UNID02` failed.
2013-12-10 09:50:33.204: [ CRSRES][1538058560] CRS-1006: No more members to consider通过srvctl查看发现UNID02-vip的绑定网卡为eth2,而unid01-vip绑定网卡为eth0.
[oracle@UNID02 ~]$ srvctl config nodeapps -n UNID02 -a -g -s -l
VIP exists.: /UNID02-vip/10.0.15.176/255.255.255.0/eth2
GSD exists.
ONS daemon exists.
Listener exists.
[oracle@UNID02 ~]$ srvctl config nodeapps -n unid01 -a -g -s -l
VIP exists.: /unid01-vip/10.0.15.175/255.255.255.0/eth0
GSD exists.
ONS daemon exists.
Listener exists.ifconfig查看发现eth0没有开启
[oracle@UNID02 ~]$[root@UNID02 bin]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:53 errors:0 dropped:0 overruns:0 frame:0
TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8246 (8.0 KiB) TX bytes:6848 (6.6 KiB)
Interrupt:122 Memory:d8000000-d8012800eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.172 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1493987596 (1.3 GiB) TX bytes:1004608379 (958.0 MiB)
Interrupt:130 Memory:da000000-da012800eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.176 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:417953992 (398.5 MiB) TX bytes:417953992 (398.5 MiB)[root@UNID02 bin]#咨询系统工程师,告知这台机器之前Public IP使用的是eth0网卡,后来eth0网卡发生了故障,切换到了eth2网卡,原来如此。
有2个解决方法:
1.将unid01-vip修改为eth2
[root@UNID02 ~]$ srvctl modify nodeapps -n unid01 -A 10.0.15.175/255.255.255.0/eth2
再次启动,启动成功。
[oracle@UNID02 ~]$ crs_start ora.unid01.vip
Attempting to start `ora.unid01.vip` on member `UNID02`
Start of `ora.unid01.vip` on member `UNID02` succeeded.2.因为crs_start会调用racgvip这个脚本启动vip,所以直接修改环境变量,再直接执行sh racgvip start ora.unid01.vip
[root@UNID02 ~]# export _USR_ORA_VIP=10.0.15.175
[root@UNID02 ~]# export _USR_ORA_NETMASK=255.255.255.0
[root@UNID02 ~]# export _USR_ORA_IF=eth2
[root@UNID02 ~]# export _CAA_NAME=ora.unid01.vip
[root@UNID02 bin]# sh -x racgvip start ora.unid01.vip
+ IFCONFIG=/sbin/ifconfig
+ GREP=/bin/grep
+ SED=/bin/sed
+ RM=/bin/rm
+ MV=/bin/mv
+ UNIQ=/usr/bin/uniq
+ PING=/bin/ping
+ WC=/usr/bin/wc
+ NETSTAT=/bin/netstat
+ AWK=/bin/awk
+ WHOAMI=/usr/bin/whoami
+ CAT=/bin/cat
+ UNAME=/bin/uname
+ SLEEP=/bin/sleep
+ SORT=/bin/sort
+ EXPR=/usr/bin/expr
+ DATE=/bin/date
+ RENICE=/usr/bin/renice
+ MIITOOL=/sbin/mii-tool
+ ARPING=/sbin/arping
+ IPCMD="/sbin/ip -f inet"
+ LANG=C
+ LC_ALL=C
+ export LANG LC_ALL
+ FAIL_WHEN_ALL_LINK_DOWN=1
+ FAIL_WHEN_DEFAULTGW_NOT_FOUND=1
+ DEFAULTGW=
+ /usr/bin/renice -20 -p 15145
++ /bin/hostname
+ HOSTNAME=UNID02
+ PING_TIMEOUT="-w 3 -c 1"
+ PING_COUNT=10
+ LOCKED=0
+ CRS_STAT=/bin/crs_stat
+ CHECK_TIMES=2
+ SUCCESS=0
+ ERROR=1
+ DEFAULT_TIMEOUT=60
+ IP=10.0.15.175
+ MASK=255.255.255.0
+ IF=eth2
+ OP=start
++ /usr/bin/whoami
+ USER=root
++ uname
+ [[ Linux != Linux ]]
+ listif_result=
+ "[" root "!=" root -a start "!=" list "]"
+ "[" -n 10.0.15.175 -a -n 255.255.255.0 "]"
++ IFS=.
++ set 10 0 15 175 255 255 255 0
++ echo 10.0.15.255
+ BROADCAST=10.0.15.255
+ logx "Broadcast = 10.0.15.255"
+ "[" -n "" "]"
+ "[" start = list "]"
++ echo ora.unid01.vip
++ /bin/sed "-es/^ora.//;s/.vip$//"
+ VIP_NAME=unid01
+ NAME=ora.unid01.vip
+ "[" -z ora.unid01.vip "]"
+ IF_USING=
+ "[" -n 10.0.15.175 "]"
+ logx Checking interface existance
+ "[" -n "" "]"
+ logx "Calling getifbyip"
+ "[" -n "" "]"
++ getifbyip 10.0.15.175
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx "getifbyip: started for 10.0.15.175"
++ "[" -n "" "]"
+++ /sbin/ip -f inet -o addr
+++ /bin/grep "inet 10.0.15.175/"
+++ /bin/awk "{ print $NF }"
++ gf_retif=
++ logx "getifbyip: returning IP "
++ "[" -n "" "]"
++ "[" -z "" "]"
+ LI=
+ logx Completed getifbyip
+ "[" -n "" "]"
+ logx "Calling getifbyip -a"
+ "[" -n "" "]"
++ getifbyip 10.0.15.175 -a
++ __LOCAL_IP=10.0.15.175
++ gf_retif=
++ logx "getifbyip: started for 10.0.15.175"
++ "[" -n "" "]"
+++ /sbin/ip -f inet -o addr
+++ /bin/grep "inet 10.0.15.175/"
+++ /bin/awk "{ print $NF }"
++ gf_retif=
++ logx "getifbyip: returning IP "
++ "[" -n "" "]"
++ "[" -z -a "]"
++ "[" -n "" "]"
+ LI_A=
+ logx Completed getifbyip
+ "[" -n "" "]"
+ "[" "" "!=" "" "]"
+ echo ""
+ /bin/grep -q :
+ "[" 1 -ne 0 "]"
+ "[" start = stop "]"
+ ping_vip 10.0.15.175
+ logx "ping_vip 10.0.15.175 started"
+ "[" -n "" "]"
+ "[" -n 10.0.15.175 "]"
+ _count=1
+ "[" 1 -le 10 "]"
+ /bin/ping 10.0.15.175 -w 3 -c 1
+ "[" 1 -ne 0 "]"
+ logx "ping_vip: 10.0.15.175 is not pingable, _count = 1"
+ "[" -n "" "]"
+ return 1
+ "[" 1 -eq 0 "]"
+ logx "Completed with initial interface test"
+ "[" -n "" "]"
+ case $OP in
+ "[" start = check "]"
+ "[" start = check "]"
+ "[" -n 10.0.15.175 -a -n 255.255.255.0 -a -n eth2 "]"
+ "[" -n "" "]"
+ logx "Interface tests"
+ "[" -n "" "]"
++ echo eth2
++ /bin/sed "-es/|/ /g"
+ IF=eth2
+ for I in "$IF"
+ "[" eth2 = "" "]"
+ checkIf eth2
+ _IF=eth2
+ _RET=0
+ _LINK_STAT=
+ logx "checkIf: start for if=eth2"
+ "[" -n "" "]"
+ "[" -z eth2 "]"
+ /sbin/ifconfig eth2
+ /bin/grep -q -w UP
+ "[" 0 -ne 0 "]"
+ "[" -x /sbin/mii-tool "]"
++ /sbin/mii-tool eth2
+ _LINK_STAT="eth2: negotiated 100baseTx-FD flow-control, link ok"
+ "[" 0 -eq 0 "]"
+ echo "eth2: negotiated 100baseTx-FD flow-control, link ok"
+ /bin/grep -q "link ok"
+ "[" 0 -eq 0 "]"
+ logx "checkIf: mii-tool checked if=eth2 ok"
+ "[" -n "" "]"
+ _RET=0
+ "[" -z "eth2: negotiated 100baseTx-FD flow-control, link ok" "]"
+ "[" 0 -eq 1 "]"
+ logx "checkIf: end for if=eth2"
+ "[" -n "" "]"
+ return 0
+ "[" 0 -eq 0 "]"
+ getnextli eth2
+ _LOCAL_IF=eth2
+ nextli=
+ _LIN=
+ logx "getnextli: started for if=eth2"
+ "[" -n "" "]"
++ listif
++ logx "listif: starting"
++ "[" -n "" "]"
++ "[" -z "" "]"
+++ /sbin/ip -f inet -o addr
++ /bin/grep eth2:
++ /bin/sed "-es/^.*://"
+++ /bin/awk "{ print $NF }"
++ /bin/sort -n
+++ /bin/grep -vw lo
++ listif_result="eth1
eth2
eth2:1"
++ logx "listif: completed with eth1
eth2
eth2:1"
++ "[" -n "" "]"
++ echo "eth1
eth2
eth2:1"
+ _LIN=1
+ i=1
+ "[" 1 -le 256 "]"
+ _found=0
+ for j in "${_LIN}"
+ "[" 1 -eq 0 "]"
+ "[" 1 -eq 1 "]"
+ _found=1
+ break
+ "[" 1 -eq 0 "]"
+ i=2
+ "[" 2 -le 256 "]"
+ _found=0
+ for j in "${_LIN}"
+ "[" 1 -eq 0 "]"
+ "[" 2 -eq 1 "]"
+ "[" 0 -eq 0 "]"
+ get_lock eth2_2
+ TOUCH=/bin/touch
+ LS=/bin/ls
+ KILL=/bin/kill
+ LOCK=/var/tmp/vip_eth2_2_UNID02.lock
+ /bin/touch /var/tmp/vip_eth2_2_UNID02.lock.15145
+ "[" 0 -ne 0 "]"
++ /bin/ls /var/tmp/vip_eth2_2_UNID02.lock.15145
++ /usr/bin/wc -l
+ "[" 1 -eq 1 "]"
+ logx "get_lock: lock file /var/tmp/vip_eth2_2_UNID02.lock.15145 is created"
+ "[" -n "" "]"
+ LOCKED=1
+ return 0
+ "[" 0 -eq 0 "]"
+ listif_result=
+ listif
+ logx "listif: starting"
+ "[" -n "" "]"
+ "[" -z "" "]"
++ /sbin/ip -f inet -o addr
+ /bin/grep -w eth2:2
++ /bin/awk "{ print $NF }"
++ /bin/grep -vw lo
+ listif_result="eth1
eth2
eth2:1"
+ logx "listif: completed with eth1
eth2
eth2:1"
+ "[" -n "" "]"
+ echo "eth1
eth2
eth2:1"
+ "[" 1 -ne 0 "]"
+ break
+ "[" 2 -eq 256 "]"
+ nextli=eth2:2
+ logx "getnextli: completed with nextli=eth2:2"
+ "[" -n "" "]"
+ return 2
+ LI=eth2:2
+ /sbin/ifconfig eth2:2 10.0.15.175 netmask 255.255.255.0 broadcast 10.0.15.255 up
+ "[" 0 -ne 0 "]"
+ logx "Success exit 1"
+ "[" -n "" "]"
+ "[" -n "" "]"
+ /sbin/arping -q -U -c 3 -I eth2 10.0.15.175
+ release_lock
+ "[" 1 = 1 "]"
+ /bin/rm -f /var/tmp/vip_eth2_2_UNID02.lock.15145
+ logx "release_lock: remove lock file /var/tmp/vip_eth2_2_UNID02.lock.15145"
+ "[" -n "" "]"
+ LOCKED=0
+ exit 0 --返回值为0,启动成功[root@UNID02 bin]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:53 errors:0 dropped:0 overruns:0 frame:0
TX packets:43 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8246 (8.0 KiB) TX bytes:6848 (6.6 KiB)
Interrupt:122 Memory:d8000000-d8012800eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.172 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5778770 errors:0 dropped:0 overruns:0 frame:0
TX packets:2798242 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1493987596 (1.3 GiB) TX bytes:1004608379 (958.0 MiB)
Interrupt:130 Memory:da000000-da012800eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.176 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800eth2:2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.15.175 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:130 Memory:da000000-da012800lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:921339 errors:0 dropped:0 overruns:0 frame:0
TX packets:921339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:417953992 (398.5 MiB) TX bytes:417953992 (398.5 MiB)[root@UNID02 bin]#ifconfig查看vip已经启动成功。Return the outputs if still not working. Then refer the script /u01/app/11.2.0/grid/bin/racgvip
If there is more than one interfaces, remove the cable on the interface
which VIP is set and run check action, the VIP should be set to another interface. # 1. becomes root user
# 2. set environment variables
# - _USR_ORA_VIP for VIP address
# - _USR_ORA_NETMASK for netmask address
# - _USR_ORA_IF for interface names, they are separated by "|" character
# - _CAA_NAME for the VIP resource name, ora.<nodename>.vip
# 3. Test list command
# # sh racgvip list
# 4. Test start command
# # sh racgvip start
# # echo $?
# # ifconfig (to check if the VIP is set)
# 5. Test check command
# # sh racgvip check
# # echo $?
# 6. Test stop command
# # sh racgvip stop
# # echo $?
# # ifconfig (to check if the VIP is unset)
# 7. If there is more than one interfaces, remove the cable on the interface
# which VIP is set and run check action, the VIP should be set to another
# interface.
# Note: if cables are pulled from all interfaces or there is only one
# interface, VIP will stay on the original interface and
# the script returns success. This behavior is to keep VIP resource
# from failover if there is a network brown out.
#
# # sh racgvip check
# # echo $?
# # ifconfig (to check if the VIP is set to another interface)
VIP is brought up using /u01/app/11.2.0/grid/bin/racgvip. From the script, it will check the status of the insterface. If it is down then VIP can not be up.Reviewed the scripts in /u01/app/11.2.0/grid/bin/racgvip:
if [ -z "$_IF" ]
then
echo "checkIf: interface name is NULL"
return 1
fi# check if ther interface is up
$IFCONFIG $_IF | $GREP -q -w UP
if [ $? -ne 0 ]
then
echo "checkIf: interface $_IF is down"
return 1
fi更多Oracle相关信息见Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12GG做init data load时,报错OGG-01192,OGG-01668如何处理HANG住的DB相关资讯 CRS-1006
- VMware 下Oracle RAC搬家引起CRS- (01/04/2013 12:19:35)
本文评论 查看全部评论 (0)