undo_retention and autoextend

2009.06.01 12:33 下午 »Author: bosonmaster »
10g以后,UNDO管理一般都会采用自动管理,且客户一般都是RAC,使用裸设备,所以数据文件自动扩展一般都是关闭的,最近发现设置了undo_retention后,在实际中还是会自动调整的,可以查看视图:v$undostat.tuned_undoretention
METALINK上搜索了下,和Bug 5387030 有关,在10.2.0.4上已经FIX掉了,临时的解决方法有以下3种方法:
1 打开UNDO数据文件自动扩展,但是设定maxsize
2 设置隐含参数:
Alter system set "_smu_debug_mode" = 33554432;
3 Alter system set "_undo_autotune" = false;
相关文档
ID420525.1
 
可以通过以下
SQL查看UNDO使用率
select ((select (nvl(sum(bytes), 0))
          
from dba_undo_extents
          
where tablespace_name = '<current_undo_ts>'
            
and status in ('ACTIVE', 'UNEXPIRED')) * 100) /
      
(select sum(bytes)
          
from dba_data_files
        
where tablespace_name = '<current_undo_ts>') "PCT_INUSE"
 
from dual;

PowerPath and hp-ux 11.31 ia

2009.05.25 11:28 下午 »Author: bosonmaster »
When running PowerPath Version 5.1 on HP-UX 11i v3, note the
following:
PowerPath Version 5.1 SP1 for HP-UX supports HP-UX 11i v3
only with minimum qualified level September 2008 HP-UX.
If you are running HP-UX 11i v3 (March 2008), do one of the
following:
Install the following patches from HP: PHKL_38053,
PHKL_38054, PHCO_38047, PHCO_38064, PHCO_38071.
Upgrade to HP-UX 11i v3 (September 2008).
With CLARiiON arrays, PowerPath Version 5.1 SP1 on HP-UX
11i v3 supports only the ALUA failover mode. The PNR failover
mode is not supported with HP-UX 11i v3.
HP-UX 11i v3 does not support iSCSI attached devices.
PowerPath Version 5.1 supports only legacy-style device special
files (DSFs). New style DSFs are not supported.
By default, after PowerPath installation, legacy style addressing is
enabled in HP-UX 11i v3. Do not explicitly disable legacy style
addressing before or after installing PowerPath. Disabling legacy
addressing will prevent PowerPath from seeing devices.
Type insf -Lv to verify whether legacy mode is enabled.
PowerPath Version 5.1 disables native multipathing at the device
level for legacy-style devices. When devices are configured for
PowerPath, (for example, when you run powermt config)
PowerPath disables native multipathing on the devices it
manages by setting to false the leg_mpath_enable attribute.
Do not change this setting to enable native multipathing for
legacy-style devices while PowerPath is installed. Uninstalling
PowerPath re-enables multipathing for legacy-style devices (the
default HP-UX setting).
Refer to the HP-UX 11i v3 documentation for more information
on legacy style addressing in HP-UX 11i v3.
In earlier releases of HP-UX, failed file system I/Os were retried
indefinitely. HP-UX 11i v3 by default no longer retries failed I/Os
indefinitely, but instead returns I/O errors to the file system after
a finite number of retries. This results in file system error
messages if all paths to a LUN are disabled manually or fail due
to hardware problems. PowerPath does not alter this behavior on
HP-UX 11i v3. You can restore the infinite retry on a per device
basis using the HP-UX scsimgr command, as follows:
scsimgr set_attr -D /dev/rdisk/diskX -a infinite_retries_enable=true
Refer to the HP website and documentation for more information.

可怕的BUG 5987262

2009.05.13 8:23 下午 »Author: bosonmaster »

之前一个客户遇到过一次,这次又一个客户遇到了,看来几率也不小,就是一个表空间空间突然爆发性的增长,短短几分钟内,使用了所有空闲空间,DUMP出问题的数据块发现都是空闲的块,这两次发生的客户OS都是HP-UXIA64, 上METALINK查实BUG 5987262,基础BUG5890312
从9.2.0.8到10.2.0.4 任何平台都可能发生,目前解决方法就是打patch p5890312 ,应急策略,监控表空间的频率,准备好应急添加的数据文件。

unpublished Bug 6955040

2009.05.13 8:13 下午 »Author: bosonmaster »
客户报告他们连接不上数据库了,登陆主机上以看,两个节点的监听服务都DOWN掉了,并且节点1VIP漂移到节点2上,节点2上的VIP漂移到节点1上,在crsd.logvip.log里看到如下信息
2009-05-09 16:40:18.973: [  CRSAPP][270962] CheckResource error for ora.xxzbdb1.vip error code = 1
2009-05-09 16:40:18.979: [  CRSRES][270962] In stateChanged, ora.xxzbdb1.vip target is ONLINE
2009-05-09 16:40:18.979: [  CRSRES][270962] ora.xxzbdb1.vip on xxzbdb1 went OFFLINE unexpectedly
2009-05-09 16:40:18.980: [  CRSRES][270962] StopResource: setting CLI values
2009-05-09 16:40:18.983: [  CRSRES][270962] Attempting to stop `ora.xxzbdb1.vip` on member `xxzbdb1`
2009-05-09 16:40:19.696: [  CRSRES][270962] Stop of `ora.xxzbdb1.vip` on member `xxzbdb1` succeeded.
2009-05-09 16:40:19.697: [  CRSRES][270962] ora.xxzbdb1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2009-05-09 16:40:19.701: [  CRSRES][270962] ora.xxzbdb1.vip failed on xxzbdb1 relocating.
2009-05-09 16:40:19.737: [  CRSRES][270962] StopResource: setting CLI values
2009-05-09 16:40:19.740: [  CRSRES][270962] Attempting to stop `ora.xxzbdb1.LISTENER_xxZBDB1.lsnr` on member `xxzbdb1`
2009-05-09 16:41:34.664: [  CRSRES][270982] startRunnable: setting CLI values
2009-05-09 16:41:37.710: [  CRSRES][270962] Stop of `ora.xxzbdb1.LISTENER_xxZBDB1.lsnr` on member `xxzbdb1` succeeded.
2009-05-09 16:41:37.751: [  CRSRES][270962] Attempting to start `ora.xxzbdb1.vip` on member `xxzbdb2`
2009-05-09 16:41:44.752: [  CRSRES][270962] Start of `ora.xxzbdb1.vip` on member `xxzbdb2` succeeded.
2009-05-12 16:02:53.134: [  CRSRES][281319] xxzbdb1 : CRS-1018: Resource ora.xxzbdb1.vip (application) is already running on xxzbdb2
xxzbdb2 : CRS-1019: Resource ora.xxzbdb1.LISTENER_xxZBDB1.lsnr (application) cannot run on xxzbdb2
 
2009-05-09 16:40:18.952: [    RACG][1] [6140][1][ora.xazbdb1.vip]: Interface lan900 checked failed (host=xazbdb1)
Invalid parameters, or failed to bring up VIP (host=xazbdb1)
 
2009-05-09 16:40:18.961: [    RACG][1] [6140][1][ora.xazbdb1.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/oracle/product/10.2.0/crs_1
 
2009-05-09 16:40:18.961: [    RACG][1] [6140][1][ora.xazbdb1.vip]: clsrcexecut: cmd = /u01/app/oracle/product/10.2.0/crs_1/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/app/oracle/product/10.2.0/crs_1/bin/racgvip check xazbdb1
 
2009-05-09 16:40:18.963: [    RACG][1] [6140][1][ora.xazbdb1.vip]: clsrcexecut: rc = 1, time = 8.694s
 
2009-05-09 16:40:18.963: [    RACG][1] [6140][1][ora.xazbdb1.vip]: end for resource = ora.xazbdb1.vip, action = check, status = 1, time = 8.775s
 
2009-05-12 16:54:33.072: [    RACG][1] [26145][1][ora.xazbdb1.vip]: clsrcstartorp: Error with malloc
可以看到
VIP漂移的原因是因为网卡LAN900问题,说明是网络引起的,询问客户,9号下午确实动过,解决方法很简单了,重启下就好了,在数据的ALERT.log里发现如下信息
 
ALTER SYSTEM SET service_names='' SCOPE=MEMORY SID='xazb2';
Sat May  9 16:41:01 2009
Immediate Kill Session#: 739, Serial#: 954
Immediate Kill Session: sess: c00000049f54ef78  OS pid: 29108
Immediate Kill Session#: 740, Serial#: 2717
Immediate Kill Session: sess: c00000049f5504e0  OS pid: 29192
Immediate Kill Session#: 742, Serial#: 89
Immediate Kill Session: sess: c00000049f552fb0  OS pid: 29055
Immediate Kill Session#: 744, Serial#: 23
Immediate Kill Session: sess: c00000049f555a80  OS pid: 29127
Immediate Kill Session#: 745, Serial#: 25
Immediate Kill Session: sess: c00000049f556fe8  OS pid: 29173
Immediate Kill Session#: 746, Serial#: 22
Immediate Kill Session: sess: c00000049f558550  OS pid: 29093
Immediate Kill Session#: 748, Serial#: 28
Immediate Kill Session: sess: c00000049f55b020  OS pid: 29118
Immediate Kill Session#: 749, Serial#: 21
Immediate Kill Session: sess: c00000049f55c588  OS pid: 29249
Immediate Kill Session#: 750, Serial#: 50186
Immediate Kill Session: sess: c00000049f55daf0  OS pid: 29281
Immediate Kill Session#: 752, Serial#: 16380
Immediate Kill Session: sess: c00000049f5605c0  OS pid: 29264
Immediate Kill Session#: 753, Serial#: 58573
Immediate Kill Session: sess: c00000049f561b28  OS pid: 29104
。。。。。。。。。。。。。。。。。。。。。。
 
METALINK查了下是BUG 文档号:730315.1
Cause
This is caused by unpublished Bug 6955040 ALL THE SESSIONS LOST CONNECTION AFTER KILLING CRSD.BIN.
 
The problem is when CRSD is killed or crashed and restarted, CRSD will run resource check action but CRS resource status will not be available at that time. Then in instance check action, it fails to get the preferred node VIP resource status and considered the preferred node VIP resource is not running. Therefore, instance check action will remove the default database service name and disconnect sessions connected using default database service name.
 
This causes messages "ALTER SYSTEM" and "Immediate Kill Session" printed in alert log.
 
 
Solution
1) The fix is included in 10.2.0.5 patchset and 11.1.0.7 patchset.
    
Apply the patchset once they are available.
 
OR
 
2) Configure a service name other than the default one (same as db_name), and get user to use the non-default service name for connection
 
看来是网络的原因触发的,呵呵

创建根交换引导卷镜像

2009.05.12 9:42 上午 »Author: bosonmaster »
1. 创建用于镜像的可引导LVM 磁盘
# pvcreate -B /dev/rdisk/disk10
2. 将该磁盘添加到当前的根卷组
# vgextend /dev/vg00 /dev/rdisk/disk10
3. 将新磁盘制作成引导磁盘
# mkboot -l /dev/rdisk/disk10
4. 将引导逻辑卷主交换逻辑卷和根逻辑卷镜像到此新的可引导磁盘请确
vg00 中的所有设备如/usr, /swap 等都已镜
 
镜像引导逻辑卷
# lvextend -m 1 /dev/vg00/lvol1 /dev/disk/disk10
镜像主交换逻辑卷
 
 
# lvextend -m 1 /dev/vg00/lvol2 /dev/disk/disk10
镜像根逻辑卷
# lvextend -m 1 /dev/vg00/lvol3 /dev/disk/disk10
5. 为引导根和主交换的镜像副本更新包含在BDRA 中的引导信息
# /usr/sbin/lvlnboot -b /dev/vg00/lvol1
# /usr/sbin/lvlnboot -s /dev/vg00/lvol2
# /usr/sbin/lvlnboot -r /dev/vg00/lvol3
6. 验证镜像是否已正确创建
# lvlnboot -v