Archive for 七月, 2008

winxpsp3激活问题

星期三, 七月 30th, 2008

前段时间升级了windows xp sp3的补丁,用着也没感觉有什么问题,从上个星期开始,就出现了提示还有7天就到期了,让我必须激活,我也忘了我自己装的是D版还是OEM版的了,不过OEM版的可能性比较大,每次开机都有提示,我都给忽略了,到了快剩下二天的时候,我就找了个软件激活了,当时也提示了激活成功,结果周五正好出差去北京,打开电脑一看,完了还是没激活,正常启动进不去了,因为激活不了就注销了,郁闷的半死,想着第2天还有重要的事要干,安全模式到是进去了可惜上不了网,等到了宾馆,找了个同事电脑开始网上大搜索,终于找到了WindowsXP_SP3_激活破解补丁,之前不行是因为那个补丁是针对SP2的,这个SP3补丁必须在安全模式下运行,不过我的那个情况也只能进安全模式了,看了下,说是因为安装SP3后把system32下oembios.bin给删除了的原因,我看了下我的也还有,不管补丁有无木马了,就安装上了,重启后果然OK了,到目前为止没有发现有啥木马。有需要的补丁的朋友可以到这
http://cn0571.blog.zj.com/d-208698.html下载,或者找我QQ:859358

ORA-600[kjbrref:pkey]

星期四, 七月 24th, 2008

查了下METALINK找到如下解决方法:
首先看下是什么是DRM
DRM - Dynamic Resource Mastering
When using Real application Clusters (RAC), Each instance has its own SGA and buffer cache. RAC will ensure that these block changes are co-ordinated to maximize performance and to ensure data intergrity. Each copy of the buffer also called as a cache resource has a master which is one of the nodes of the cluster.

In database releases before 10g (10.1.0.2) once a cache resource is mastered on an instance, a re-mastering or a change in the master would take place only during a reconfiguration that would happen automatically during both normal operations like instance startup or instance shutdown or abnormal events like Node eviction by Cluster Manager. So if Node B is the master of a cache resource, this resource will remain mastered on Node B until reconfiguration.

10g introduces a concept of resource remastering via DRM. With DRM a resource can be re-mastered on another node say from Node B to Node A if it is found that the cache resource is accessed more frequently from Node A. A reconfiguration is no longer the only reason for a resource to be re-mastered.

Bug 5600050

解决方法:

关闭ORACLE RAC DRM 具体操作方法如下:
alter system set “_gc_undo_affinity”=false scope=spfile sid=’*';
alter system set “_gc_affinity_time”=0 scope=spfile sid=’*';
然后重启RAC数据库。

serviceguard配置

星期五, 七月 11th, 2008

安装前系统和存储的准备工作
1. swlist确认ServiceGuard已经安装;

2. 与HP方确认补丁包megpatch已经安装,否则共享卷组不能在二个节点上同时激活;

3. ioscan –fnC disk确认在二台机器上都能够看到共享磁盘,且状态正常
网络/etc/hosts
127.0.0.1 localhost loopback
192.168.51.101 hp101
192.168.51.102 hp102
192.168.51.99 hp101-vip
192.168.51.100 hp102-vip
172.16.0.1 hp101-priv
172.16.0.2 hp102-priv
并且确保能被解析

(全文…)

Read-only file system

星期四, 七月 10th, 2008

昨天接到客户报告,说一个RAC节点的归档存储目录变成只读的了,导致无法创建归档日志,因此重做日志也无法切换,幸好是RAC,客户说系统重启动后,就可以了,但是一会又变成只读的了,一开始判断可能挂载的有问题,于是就去查看了ROOT用户的操作历史,到是有加载混乱的问题,但是把怀疑的地方排除后,还是只读的。于是开始查看系统日志,因为ORACLE BUG 5722352,系统日志里全是
Feb 12 10:16:57 su(pam_unix)[28104]: session opened for user oracle by (uid=0)
Feb 12 10:16:57 su(pam_unix)[28104]: session closed for user oracle
这种信息,没办法,让客户截取了30W行,我才好容易找到启动日志,从而找到了一些有价值的信息
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110988
Jul 9 16:15:38 dbrac2 kernel: Aborting journal on device sdh1.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110989
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110990
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110991
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110992
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110993
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110994
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110995
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110996
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110997
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110998
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_free_blocks_sb: bit already cleared for block 1110999
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_truncate: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_orphan_del: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_reserve_inode_write: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1) in ext3_delete_inode: Journal has aborted
Jul 9 16:15:38 dbrac2 kernel: ext3_abort called.
Jul 9 16:15:38 dbrac2 kernel: EXT3-fs error (device sdh1): ext3_journal_start_sb: Detected aborted journal
Jul 9 16:15:38 dbrac2 kernel: Remounting filesystem read-only

可以看到是系统内核把sdh1(/arch02)REMOUNT成只读的了,在看上边是磁盘系统出现问题了。这个是LINUX系统内核管理的机制,为什么系统重启会好呢?
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning (device sdh1): ext3_clear_journal_err: Marking fs in need of filesystem check.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Jul 8 00:20:37 dbrac2 kernel: EXT3 FS on sdh1, internal journal
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: recovery complete.
Jul 8 00:20:37 dbrac2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
也只能从这里找出原因了。
我没有FSCK修复磁盘系统,因为错误比较严重,上边的归档日志也是7号之前的了,里边的日志也无法拷贝出来,最后决定为了以后的运行文档,把SDH1重新格式化了,然后重新挂载就OK了。
一般遇到次问题后需要检查几个方面
一、空间是否足够
二、inode是否足够
三、目录权限属主是否改过
四、挂载是否有问题,默认是挂载是读写状态的(mount -o rw / /)
五、检查系统日志是否有磁盘错误
六、出现次错误,硬件出问题的可能性比较大

cluvfy—RAC

星期二, 七月 8th, 2008

安装RAC时,有时候因为准备工作的不足,会在安装中遇到一些问题,在安装前做一下检查会减少出错的几率。
RAC安装时必须注意的是时间 HOST配置 以及共享存储的权限属主问题。切忌按照规范来走。

一、检查工作
1 检查 系统是否满足CRS安装需求

cluvfy comp sys -n boson01,boson02 -p crs -verbose

2 检查系统是否满足DATABASE安装需求
cluvfy comp sys -n boson01,boson02 -p database -verbose

3 检查存储( cvuqdisk必须安装)
cluvfy comp ssa -n all -s /dev/sdb
cluvfy comp space -n all -l /u01/ -z 2G -verbose(空间是否满足需求)

4 检查节点之间的对等性访问
cluvfy comp nodereach -n all -verbose

cluvfy comp nodecon -n all -i eth0 -verbose
cluvfy comp nodecon -n all -i eth1 -verbose
cluvfy comp nodecon -n all -verbose

5 检查用户权限

cluvfy comp admprv -n all -o user_equiv -verbose [-sshonly] 只检查SSH
cluvfy comp admprv -n all -o crs_inst -verbose -orainv oinstall
cluvfy comp admprv -n all -o db_inst -verbose -osdba oinstall
cluvfy comp admprv -n all -o db_config -d /u01/app/oracle/product/10.2.0/db_1 -verbose

二、安装检查
1
cluvfy stage -pre crsinst -n all -c /dev/raw/raw1 -r 10gR2 -q /dev/raw/raw3 -osdba oinstall -orainv oinstall -verbose
还有用 -post
2 集群组件查查
cluvfy comp crs -n all -verbose
cluvfy comp clumgr -n all -verbose
cluvfy comp ocr -n all -verbose
3 集群一致性检查
cluvfy comp clu

三、检查已经存在的节点应用

1 检查ons vip gsd
cluvfy comp nodeapp -n all -verbose
2 节点之间的比较(比如包的版本,用户组ID啥的)
cluvfy comp peer -n all -r 10gR2 -orainv oinstall -osdba oinstall -verbose
四 LINUX检查共享存储的命令(需要安装包cvuqdisk-1.0.1-1.rpm)
cvuqdisk /dev/sdb