| [cssd(8796)]CRS-1611:node XXdb1 (1) at 75% heartbeat fatal, eviction in 14.118 seconds2014-07-04 22:49:38.556[cssd(8796)]CRS-1611:node XXdb1 (1) at 75% heartbeat fatal, eviction in 13.128 seconds2014-07-04 22:49:46.561[cssd(8796)]CRS-1610:node XXdb1 (1) at 90% heartbeat fatal, eviction in 5.128 seconds2014-07-05 03:00:08.142[cssd(8812)]CRS-1605:CSSD voting file is online: /dev/raw/raw18. Details in /home/Oracle/product/10.2.0/crs/log/XXdb2/cssd/ocssd.log. |
| 2014-07-04 23:00:00.018[cssd(27561)]CRS-1612:node XXdb2 (2) at 50% heartbeat fatal, eviction in 29.144 seconds2014-07-04 23:00:15.017[cssd(27561)]CRS-1611:node XXdb2 (2) at 75% heartbeat fatal, eviction in 14.144 seconds2014-07-04 23:00:24.014[cssd(27561)]CRS-1610:node XXdb2 (2) at 90% heartbeat fatal, eviction in 5.144 seconds2014-07-04 23:00:25.016[cssd(27561)]CRS-1610:node XXdb2 (2) at 90% heartbeat fatal, eviction in 4.144 seconds2014-07-05 01:21:06.620[cssd(31191)]CRS-1605:CSSD voting file is online: /dev/raw/raw18. Details in /home/oracle/product/10.2.0/crs/log/XXdb1/cssd/ocssd.log. |
| 2014-06-24 14:53:21.258[crsd(8825)]CRS-5504:Node down event reported for node "tsrrac02".2014-06-24 14:53:21.259[crsd(8825)]CRS-2773:Server "tsrrac02" has been removed from pool "ora.crmout".2014-06-24 14:53:21.259[crsd(8825)]CRS-2773:Server "tsrrac02" has been removed from pool "Generic". |
| $ crsctl get css diagwait Configuration parameter diagwait is not defined. |
| Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions ( Doc ID 559365.1 ) 《==This setting will provide more time for diagnostic data to be collected by safely and will NOT increase probability of corruption. OPROCD 是用来检查节点是否hang的,当它发现节点hang后,会发起起点重启。 它有两个重要的参数: oprocd.debug -t 1000 -m 500 timeout value (-t <to-millisec>) :每次执行检查的间隔,默认为1000ms(1s). margin (-m <margin-millisec>) :允许延迟的时间,默认为500ms(0.5s)) OPROCD 进程每隔to-millisec(1s)进行一次检查,检查的时候会获取OS的时间,然后用这个时间减去上次获取的OS的时间,如果这个时间差大于to- millisec + margin-millisec,那么OPROCD会认为OS hang了,就会发起重启。简单说来,如果不改变上面两个参数的值,那么默认情况下,如果OPROCD在1.5s都无法获取到OS的时间,就认为OS hang了。 修改了diagwait为13s后,会把margin-millisec设为10s,也就是允许获取OS的时间达到11s(1s+10s). |