Welcome 微信登录

首页 / 数据库 / MySQL / Oracle监听不定期出现异常故障处理

环境:
操作系统为:SunOS 5.10
数据库版本:Oracle RAC 11.2.0.3.0
该主机上有2个库,一套RAC的节点1在上面,还有另外一个库,之所以使用的是DB下的监听而没用GRID下的监听,是为了避免停掉CRS时影响另外一个库的使用。故障现象:
数据库监听不定期出现异常 ,从应用tnsping数据库,时间花费很长甚至连不上故障分析处理过程:从应用主机tnsping如下:racdb1_scenemon$tnsping racdb_new
 
TNS Ping Utility for Solaris: Version 11.2.0.3.0 - Production on 19-2月 -2014 15:46:23
 
Copyright (c) 1997, 2011, Oracle. All rights reserved.
 
已使用的参数文件:已使用 TNSNAMES 适配器来解析别名
 尝试连接 (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.111.11.1)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 11.111.11.2)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5))))
 
OK (60000 毫秒)Oracle监听之动态监听与静态监听特点 http://www.linuxidc.com/Linux/2013-09/89702.htmOracle 11g RAC 环境下单实例非缺省监听及端口配置 http://www.linuxidc.com/Linux/2013-08/88936.htmOracle 监听器日志配置与管理 http://www.linuxidc.com/Linux/2013-08/88935.htmOracle错误- ORA-12514:TNS:无监听程序 http://www.linuxidc.com/Linux/2013-06/85878.htmORA-12514 监听错误解决 http://www.linuxidc.com/Linux/2012-12/76049.htmOracle监听器出现的6种连接问题及其解决方法 http://www.linuxidc.com/Linux/2012-12/75568.htmOracle LISTENER 未监听到Oracle实例问题解决 http://www.linuxidc.com/Linux/2012-05/60910.htm从数据库服务器上查看监听状态:$ lsnrctl status
 
LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 19-FEB-2014 15:45:58
 
Copyright (c) 1991, 2011, Oracle. All rights reserved.
 
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
 
TNS-12535: TNS:operation timed out
 TNS-12560: TNS:protocol adapter error
 TNS-00505: Operation timed out   将监听进程予以kill并重启监听:
$ ps -ef |grep tns
 oracle 18504 18468 0 15:45:15 pts/10 0:00 grep tns
 oracle 24037 1 0 Nov 28 ? 6:08 /oracle/app/asm/11.2.0/grid/bin/tnslsnr LISTENER -inherit
 oracle 9488 1 3 Jan 24 ? 4447:11 /oracle/app/db/product/11.2.0/db/bin/tnslsnr LISTENER -inherit
 $
 $ kill -9 9488
 $ lsnrctl start
 
LSNRCTL for Solaris: Version 11.2.0.3.0 - Production on 19-FEB-2014 15:48:22
 
Copyright (c) 1991, 2011, Oracle. All rights reserved.
 
Starting /oracle/app/db/product/11.2.0/db/bin/tnslsnr: please wait...
 
TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
 Log messages written to /oracle/app/db/product/11.2.0/db/log/diag/tnslsnr/racdb/listener/alert/log.xml
 Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=racdb)(PORT=1521)))
 
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER
 Version TNSLSNR for Solaris: Version 11.2.0.3.0 - Production
 Start Date 19-FEB-2014 15:48:22
 Uptime 0 days 0 hr. 0 min. 0 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Log File /oracle/app/db/product/11.2.0/db/log/diag/tnslsnr/racdb/listener/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=racdb)(PORT=1521)))
 The listener supports no services
 The command completed successfully    检查listener.log如下:
17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41603)) * establish * racdb * 0
 17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41659)) * establish * racdb * 12518
 TNS-12518: TNS:listener could not hand off client connection
 TNS-12560: TNS:protocol adapter error
 TNS-00530: Protocol adapter error
 Solaris Error: 24: Too many open files <<<<<<<<<
 17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41652)) * establish * racdb * 0
 17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41669)) * establish * racdb * 0
 17-FEB-2014 16:02:26 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=racdb)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=180)(DELAY=5))(CID=(PROGRAM=sqlplus@racdb1)(HOST=racdb1)(USER=tmn))) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.111.11.89)(PORT=41682)) * establish * racdb * 12518
 TNS-12518: TNS:listener could not hand off client connection
 TNS-12560: TNS:protocol adapter error
 TNS-00530: Protocol adapter error
 Solaris Error: 24: Too many open files <<<<<<<<<<Too many open files意味着Maximum Number Of Open Files Per Process 达到了上限。因此listener hang住的原因可能是该limit设置过小,进一步查看硬件限制如下:
$ ulimit -Ha
 time(seconds) unlimited
 file(blocks) unlimited
 data(kbytes) unlimited
 stack(kbytes) unlimited
 coredump(blocks) unlimited
 nofiles(descriptors) 65536
 memory(kbytes) unlimited查看软件限制如下:
$ ulimit -Sa
 time(seconds) unlimited
 file(blocks) unlimited
 data(kbytes) unlimited
 stack(kbytes) 8192
 coredump(blocks) unlimited
 nofiles(descriptors) 256
 memory(kbytes) unlimited查看/etc/system如下:
*ident  "@(#)system   1.18    97/06/27 SMI" /* SVR4 1.5 */** SYSTEM SPECIFICATION FILE* 
* moddir:**     Set the search path for modules.  This has a format similar to the*     csh path variable. If the module isn"t found in the first directory*     it tries the second and so on. The default is /kernel /usr/kernel**     Example:*             moddir: /kernel /usr/kernel /other/modules    
* root device and root filesystem configuration:**     The following may be used to override the defaults provided by*     the boot program:**     rootfs:       Set the filesystem type of the root.**     rootdev:        Set the root device.  This should be a fully*                     expanded physical pathname.  The default is the*                     physical pathname of the device where the boot*                     program resides.  The physical pathname is*                     highly platform and configuration dependent.**     Example:*             rootfs:ufs*             rootdev:/sbus@1,f8000000/esp@0,800000/sd@3,0:a**     (Swap device configuration should be specified in /etc/vfstab.)    
* exclude:**     Modules appearing in the moddir path which are NOT to be loaded,*     even if referenced. Note that `exclude" accepts either a module name,*     or a filename which includes the directory.**     Examples:*             exclude: win*             exclude: sys/shmsys    
* forceload:**     Cause these modules to be loaded at boot time, (just before mounting*     the root filesystem) rather than at first reference. Note that*     forceload expects a filename which includes the directory. Also*     note that loading a module does not necessarily imply that it will*     be installed.**     Example:*             forceload: drv/foo    
* set:**     Set an integer variable in the kernel or a module to a new value.*     This facility should be used with caution.  See system(4).**     Examples:**     To set variables in "unix":**             set nautopush=32*             set maxusers=40**     To set a variable named "debug" in the module named "test_module"**             set test_module:debug = 0x13 
* Begin FJSVssf (do not edit)set ftrace_atboot = 1set kmem_flags = 0x100set kmem_lite_maxalign = 8192* End FJSVssf (do not edit)forceload: drv/fjpfca* Begin MDD root info (do not edit)rootdev:/pseudo/md@0:0,0,blk* End MDD root info (do not edit)*** Begin EMCpower added lines *** DO NOT EDIT BELOW THIS LINE ***forceload: drv/emcpsfforceload: drv/sdforceload: drv/ssdforceload: drv/emcpforceload: misc/emcpgpxforceload: misc/emcpmpxforceload: misc/emcpvlumdforceload: misc/emcpxcryptforceload: misc/emcpdmforceload: misc/emcpiocset emcp:bPxEnableInit=1*** End EMCpower added lines *** DO NOT EDIT ABOVE THIS LINE ***   exec_user_stack = 1                           set noexec_user_stack_log = 1查看/etc/project如下:
system:0::::user.root:1::::noproject:2::::default:3::::group.staff:10::::user.oracle:100::oracle::
综上所述,从listener.log的输出看,nofiles的soft limit过小,只有256,这会导致Solaris Error: 24: Too many open files 错误。从/etc/system和/etc/project,我们没有看到设置了soft limit。 
故障处理小结及后续建议将oracle用户的soft limit提升为至少1024,然后重新oracle用户登录,检验ulimit合格后,重新启动数据库和监听。 具体解决办法如下:1、在/etc/system增加以下行set rlim_fd_max=65536set rlim_fd_cur=40962、重新登录ORACLE并检验oracle用户的限制su – oracleulimit -Haulimit –Sa3、重新启动数据库和监听更多Oracle相关信息见Oracle 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=12本文永久更新链接地址