vip &ipc

2010.01.01 9:08 下午 »Author: bosonmaster »
今天在客户这升级数据库从10.2.0.310.2.0.4,升级过程基本很顺利,可是在测试拔网线,VIP切换时速度比较慢,去METALINK搜索了下,发现如下提示:
Cause
This problem is caused by the first address in the listener.ora configuration being an address that uses the TCP protocol.
 
In this circumstance, when a network cable is pulled, "lsnrctl stop" listener has to wait for TCP timeout before it can check next address. On the Solaris platform, TCP timeout is defined by tcp_ip_abort_cinterval with a default value of 180000 (3 minutes).   That is why shutting down listener almost took 3.5 minutes. (TCP timeout on other platforms may vary)The error message "Solaris Error: 145: Connection timed out" in ora.node1.LISTENER_NODE1.lsnr.log also indicates it is waiting for tcp timeout.
 
The listener.ora in this scenario is defined as:
 
 
 
[
LISTENER_NODE1 =
 
(DESCRIPTION_LIST =
  
(DESCRIPTION =
    
(ADDRESS_LIST =
      
(ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
    
)
    
(ADDRESS_LIST =
      
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
    
)
    
(ADDRESS_LIST =
      
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    
)
  
)
 
)
Solution
To prevent this, move the IPC address to be the first address for the listener in the listener.ora, eg:
 
LISTENER_NODE1 =
 
(DESCRIPTION_LIST =
    
(DESCRIPTION =
      
(ADDRESS_LIST =
          
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
      
)
      
(ADDRESS_LIST =
          
(ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
        
)
      
(ADDRESS_LIST =
          
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
        
)
    
)
 
)
 
 
When lsnrctl tries to stop the listener, it will now connect to the IPC address first, which is available during that time. It will not have to wait for tcp timeout.
 
After the above change, the VIP failover only takes 48 to 50 seconds to complete regardless of the tcp_ip_abort_cinterval setting.
 
Please note, listener.ora files newly created from 10.2.0.3 to 11.1.0.7 should have the IPC protocol as the first address in listener.ora in most casesHowever, if you have upgraded from a previous release, or manually modified/copied over a listener.ora from a previous install, you may not have the IPC protocol as the first address, regardless of your version. Manual modification is required to move IPC protocol to be the first address to avoid the problem described in this note.
 
也就说
IPC协议需要放在监听地址第一列,修改后,我们在测试,从原来2分钟缩减到20多秒,符合应用切换的要求

回顾2009年,展望2010年

2009.12.28 9:33 下午 »Author: bosonmaster »
今天收到公司发的生日邮件,发现2009年马上就要过去了,这一年,说忙碌吧,也还可以,技术这一年基本没啥长进,银子也没攒几个,一直梦想的房子也离自己越来越远,这房价就像做了火箭,一个劲的往上冲,国家老是说控制房价,也不知道最后控制到哪去了,地王一个一个接一个,这房价能控制住吗?去年10月和朋友看的一套房子,当时9.6K,现在二期15500,还售罄,今天还看到新闻,说2009年中国农民纯收入超5000,我不知道这是咋算出来,可以去问问农民伯伯们,你们真的纯收入5000了吗?不
管怎么说,日子还要过下去,房子还要继续租下去,不知道在未来的
2010年,房价能否下来,先做个梦吧,希望在2010年自己的技术能进步一点,O(_)O~,也希望自己在个人方面也有所突破。
 
最后祝大家:
 
         元旦快乐,新的一年里健康幸福!

bosn

kcbz_check_objd_typ_3

2009.12.27 10:41 上午 »Author: bosonmaster »
客户系统linux4 x86_64 10.2.0.3,近日有如下报错:
Errors in file /u01/app/oracle/admin/lzrac/bdump/lzrac2_m001_26960.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
查看相关
METALINK和产生的TRACE文件
TRACLE文件部分内容:
 
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Current SQL statement for this session:
SELECT count(*) over () as total_count,        sd_xe_ash_nm.event_name,        sd_xe_ash_nm.event_id,        sd_xe_ash_nm.parameter1 as p1text,        (CASE WHEN (sd_xe_ash_nm.parameter1 is NULL                    OR                    sd_xe_ash_nm.parameter1 = '0')              THEN 0              ELSE 1         END) as p1valid,        sd_xe_ash_nm.parameter2 as p2text,        (CASE WHEN (sd_xe_ash_nm.parameter2 is NULL                    OR                    sd_xe_ash_nm.parameter2 = '0')              THEN 0              ELSE 1         END) as p2valid,        sd_xe_ash_nm.parameter3 as p3text,        (CASE WHEN (sd_xe_ash_nm.parameter3 is NULL                    OR                    sd_xe_ash_nm.parameter3 = '0')              THEN 0              ELSE 1         END) as p3valid,        sd_xe_ash_nm.keh_evt_id,        nvl(xc.class#, 0) as class_num,        sd_xe_ash_nm.wait_class_id,        nvl(xc.keh_id, 0) as keh_ecl_id,        sd_xe_ash_nm.ash_cnt,        sd_xe_ash_nm.tot_wts_diff,        sd_xe_ash_nm.tot_tmo_diff,       sd_xe_ash_nm.tim_wait_diff FROM   ( SELECT sd_xe_ash.*,        evtname.event_name, evtname.wait_class_id,        evtname.parameter1, evtname.parameter2, evtname.parameter3 FROM   ( SELECT sd_xe.*, nvl(ash.cnt, 0) as ash_cnt FROM   ( SELECT nvl(xe.keh_id, 0) as keh_evt_id,        nvl(sd.event_id, xe.event_hash) as event_id,        nvl(sd.tot_wts_diff, 0) as tot_wts_diff,        nvl(sd.tot_tmo_diff, 0) as tot_tmo_diff,        nvl(sd.tim_wait_diff, 0) as tim_wait_diff FROM   ( SELECT endsn.event_id as event_id,        (endsn.total_waits - nvl(begsn.total_waits,0)) as tot_wts_diff,        (endsn.total_timeouts - nvl(begsn.total_timeouts,0))        as tot_tmo_diff,        (endsn.time_waited_micro - nvl(begsn.time_waited_micro,0))        as tim_wait_diff FROM   ( SELECT end_snap.*          FROM  (SELECT t1.* FROM WRH$_SYSTEM_EVENT t1, WRM$_SNAPSHOT s1   WHERE  t1.dbid = s1.dbid AND t1.instance_number = s1.instance_number     AND  t1.snap_id = s1.snap_id AND s1.bl_moved = 0   UNION ALL   SELECT t2.* FROM WRH$_SYSTEM_EVENT_BL t2, WRM$_SNAPSHOT s2   WHERE  t2.dbid = s2.dbid AND t2.instance_number = s2.instance_number     AND  t2.snap_id = s2.snap_id AND s2.bl_moved <> 0) end_snap          WHERE  end_snap.dbid            = :dbid            and  end_snap.instance_number = :instance_number            and  end_snap.snap_id         = :end_snap ) endsn        LEFT OUTER JOIN        ( SELECT beg_snap.*          FROM  (SELECT t1.* FROM WRH$_SYSTEM_EVENT t1, WRM$_SNAPSHOT s1   WHERE  t1.dbid = s1.dbid AND t1.instance_number = s1.instance_number     AND  t1.snap_id = s1.snap_id AND s1.bl_moved = 0   UNION ALL   SELECT t2.* FROM WRH$_SYSTEM_EVENT_BL t2, WRM$_SNAPSHOT s2   WHERE  t2.dbid = s2.dbid AND t2.instance_number = s2.instance_number     AND  t2.snap_id = s2.snap_id AND s2.bl_moved <> 0) beg_snap          WHERE  beg_snap.dbid            = :dbid            and  beg_snap.instance_number = :instance_number            and  beg_snap.snap_id         = :beg_snap ) begsn        ON endsn.event_id = begsn.event_id  ) sd        FULL OUTER JOIN        X$KEHEVTMAP xe        ON sd.event_id = xe.event_hash  ) sd_xe        LEFT OUTER JOIN        (SELECT a.event_id,                count(*) as cnt         FROM  (SELECT t1.* FROM WRH$_ACTIVE_SESSION_HISTORY t1, WRM$_SNAPSHOT s1   WHERE  t1.dbid = s1.dbid AND t1.instance_number = s1.instance_number     AND  t1.snap_id = s1.snap_id AND s1.bl_moved = 0   UNION ALL   SELECT t2.* FROM WRH$_ACTIVE_SESSION_HISTORY_BL t2, WRM$_SNAPSHOT s2   WHERE  t2.dbid = s2.dbid AND t2.instance_number = s2.instance_number     AND  t2.snap_id = s2.snap_id AND s2.bl_moved <> 0) a         WHERE  a.dbid            =  :dbid           and  a.instance_number =  :instance_number           and  a.snap_id         >  :beg_snap           and  a.snap_id         <= :end_snap           and  a.wait_time       =  0         GROUP BY a.event_id) ash        ON sd_xe.event_id = ash.event_id  ) sd_xe_ash,  WRH$_EVENT_NAME evtname WHERE  evtname.event_id = sd_xe_ash.event_id   and  evtname.event_id > 0   and  evtname.dbid     = :dbid  ) sd_xe_ash_nm,        X$KEHECLMAP xc WHERE  sd_xe_ash_nm.wait_class_id = xc.class_hash ORDER  BY sd_xe_ash_nm.wait_class_id,           sd_xe_ash_nm.tim_wait_diff DESC,           sd_xe_ash_nm.event_id
---
-- PL/SQL Call Stack -----
 
object      line  object
 
handle    number  name
0x111fc7790        10  package body SYS.PRVT_HDM
0x10eaf25d0        16  SYS.WRI$_ADV_HDM_T
0x119d97690      1535  package body SYS.PRVT_ADVISOR
0x119d97690      1618  package body SYS.PRVT_ADVISOR
0x111fc7790       106  package body SYS.PRVT_HDM
描述过程如下:
Oracle Server - Enterprise Edition - Version: 10.2.0.2 to 10.2.0.3
This problem can occur on any platform.
 
Symptoms
Segment Advisor is being used.
 
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [4], [0], [15], [], [], [], []
 
With a stack trace similar to:
 
kgerinv kgeasnmierr kcbassertbd3 kcbz_check_objd_typ kcbzib kcbgtcr ktrget kdirfrs
 
ORA-600 [kcbnew_3] may be reported instead.
 
The PLSQL stack, if there is one, may have SYS.PRVT_ADVISOR or SYS.DBMS_SPACE near the top.
 
 
 
Cause
The cause of this problem has been identified and verified in an Unpublished Bug 4430244.
 
It is caused by the Segment Advisor code which can load blocks for dropped objects into cache as CURRENT leading to subsequent operations seeing an incorrect (old) version of a block.
 
Solution
This bug is fixed in our 10.2.0.4 patchset and 11g Release 1.
 
You can check if a patch is available for your patchset release and O/S environment  Patch 4430244
 
To obtain a patch from MetaLink:
1) Click on Patches.
2) Click on Simple Search.
3) Enter your Patch number : 4430244
4) Select your platform
5) Click Go.
6) Read any applicable notes before downloading, then click the Download button.
 
Note: Please review the Readme file for instructions on how to install the patchset.
 
To avoid the issue in the short-term, turn off Segment Advisor by disabling 'Automatic Segment Advisor Job'This is an advisory job so there will be no harm caused to your database by turning it off.
 
If you are encountering this bug, flush the buffer cache with the following command:
SQL> alter system flush buffer_cache;
 
to remove old copies of the block.
If you are using RAC then this will need to be done on all instances.

qksfroFXTStatsLoc

2009.12.27 10:32 上午 »Author: bosonmaster »
做在客户处,OS5305升级到了5309SP5,然后数据库从10.2.0.3升级到10.2.0.4,升级时间很快,启动数据库后也没报错,就撤了,早上发现alert里有如下报错
Errors in file /u01/app/oracle/admin/dzcadb/bdump/dzcadb_ora_139828.trc:
ORA-00600: internal error code, arguments: [qksfroFXTStatsLoc() - unknown KQ], [0], [], [], [], [], [], []
Sat Dec 26 04:00:17 2009
Errors in file /u01/app/oracle/admin/dzcadb/bdump/dzcadb_ora_139828.trc:
ORA-00600: internal error code, arguments: [qksfroFXTStatsLoc() - unknown KQ], [0], [], [], [], [], [], []
Sat Dec 26 04:00:18 2009
Errors in file /u01/app/oracle/admin/dzcadb/bdump/dzcadb_ora_139828.trc:
ORA-00600: internal error code, arguments: [qksfroFXTStatsLoc() - unknown KQ], [0], [], [], [], [], [], []
WARNING: Oracle executable binary mismatch detected.
 
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
WARNING: Oracle executable binary mismatch detected.
 
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Sat Dec 26 04:21:10 2009
Errors in file /u01/app/oracle/admin/dzcadb/bdump/dzcadb_ora_139872.trc:
ORA-07445: exception encountered: core dump [lstclo+0030] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFFF] [] []
 
一看报错就知道,是
ORACLE的执行文件出问题了,因为升级了操作需要重新relink ORACLE的执行文件
 
解决过程就很简单了,关闭监听,数据库,
slibclean 下,在ORACLE用户下执行下relink all就解决

linux 5.4 for oracle rac vipca bug

2009.12.19 12:12 下午 »Author: bosonmaster »
LINUX AS 5 UP4安装10G RAC的时候,在节点2运行root.sh的时候,最后会报如下错误:
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
/
home/oracle/crs/oracle/product/10/crs/jdk/jre//bin/java: error while loading
shared libraries: libpthread.so.0: cannot open shared object file:
No such file or directory
在执行
vipca的时候会报如下错:
# vipca
Error 0(Native: listNetInterfaces:[3])
[
Error 0(Native: listNetInterfaces:[3])]
 
通过查询
METALINKORACLE的一个BUG
解决方法:就是修改
vipcasrvctl ,搜索LD_ASSUME_KERNEL
在下方添加如下
unset LD_ASSUME_KERNEL
然后在配置
PUBLIC和心跳网络
 
<
CRS_HOME>/bin # ./oifcfg setif -global eth0/192.168.1.0:public
<
CRS_HOME>/bin # ./oifcfg setif -global eth1/10.10.10.0:cluster_interconnect
然后在
VIPCAOK