RMAN Duplicate Error with ASM and 4k Sector Size Disk

The Problem

I recently ran into an issue when performing a RMAN duplicate on a server with a fresh installation of OEL 6.4 x86-64 using a brand new SAN (NetApp Data ONTAP 8.2). The database version is Enterprise Edition 11.2.0.3.

The datafiles restored without an issue but when we got to those archived redo logs I hit an ORA-00600. Specifically it was this error.

ORA-00600: internal error code, arguments: [kfk_verify_io9]

At first I chalked it up as a fluke, so I tried the RMAN duplicate again. Well the datafiles restored just fine but once again when I got to the archived redo logs I hit the ORA-00600 error. A quick trip to MOS and I find 4 documents that somewhat point to a similar issue but not definitive. I open up an SR with Oracle Support to get them looking into the issue while I continue to troubleshoot. Those previous articles mention 4K sector size disk around the same time Oracle Support comes back asking about the same. Let’s see what we find…

Observations

The logical and physical size of the disks that are involved with that particular disk group shows the following:

$ cat /sys/block/dm-19/queue/logical_block_size
512

$cat /sys/block/dm-19/queue/physical_block_size
4096

Checking the ASM disk shows us a 4K sector size.

SQL> select name, sector_size from v$asm_disk;

NAME SECTOR_SIZE
------------------------------ -----------
DATA01 4096

While I’m waiting for Oracle Support to come back with information I find MOS Doc # 1500460.1. After reading the document, I find 2 possible workaround/solutions that will work for us.

1) We can set the ASMLIB config file to use the logical block size of the disk instead of the physical block size.


$ /usr/sbin/oracleasm configure -b

2) On the SAN itself we issue the NetApp command to disable reporting the physical block size of the LUN.


> lun set report-physical-size <path> disable

A short while later Oracle Support comes back with information that there are some known bugs with the 4K sector size disks when using both ASM and Oracle 11.2.0.3. Their suggestion was a straight forward “make sure the disk is both logically and physically presented as 512 bytes”.

Solution

We go with option 1 and make the change in the AMSLIB config file (located in /etc/sysconfig/oracleasm). Let’s do that now.

1) Shutdown ASM

sqlplus "/ as sysasm"

SQL> shutdown normal;
ASM diskgroups dismounted
ASM instance shutdown
SQL>

2) Change the ASMLIB config file to only look at the logical block size.

$ sudo /usr/sbin/oracleasm configure -b
Writing Oracle ASM library driver configuration: done

$ cat /etc/sysconfig/oracleasm
#
# This is a configuration file for automatic loading of the Oracle
# Automatic Storage Management library kernel driver.  It is generated
# By running /etc/init.d/oracleasm configure.  Please use that method
# to modify this file
#

# ORACLEASM_ENABLED: 'true' means to load the driver on boot.
ORACLEASM_ENABLED=true

# ORACLEASM_UID: Default user owning the /dev/oracleasm mount point.
ORACLEASM_UID=oragrid

# ORACLEASM_GID: Default group owning the /dev/oracleasm mount point.
ORACLEASM_GID=asmdba

# ORACLEASM_SCANBOOT: 'true' means scan for ASM disks on boot.
ORACLEASM_SCANBOOT=true

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="mpath dm"

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

# ORACLEASM_USE_LOGICAL_BLOCK_SIZE: 'true' means use the logical block size
# reported by the underlying disk instead of the physical. The default
# is 'false'
ORACLEASM_USE_LOGICAL_BLOCK_SIZE=true

3) Let’s restart ASMLib.

$ sudo /etc/init.d/oracleasm restart
Dropping Oracle ASMLib disks: [ OK ]
Shutting down the Oracle ASMLib driver: [ OK ]
Initializing the Oracle ASMLib driver: [ OK ]
Scanning the system for Oracle ASMLib disks: [ OK ]

4) We’ll need to recreate the disk since they will retain the 4K sector size in the disk header.

$ sudo oracleasm deletedisk DATA01
Clearing disk header: done
Dropping disk: done

sudo oracleasm createdisk DATA01 /dev/mapper/mpathdp1
Writing disk header: done
Instantiating disk: done

5) Let’s start up the ASM instance and create the diskgroup again.

SQL> startup
ASM instance started

Total System Global Area   71303168 bytes
Fixed Size                 1069292 bytes
Variable Size              45068052 bytes
ASM Cache                  25165824 bytes
ASM disk groups mounted

SQL> select name, sector_size from v$asm_disk;

NAME SECTOR_SIZE
------------------------------ -----------
DATA01 512

SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY
DISK 'ORCL:DATA01' NAME DATA01,
ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.2',
'compatible.rdbms' = '11.2';

Diskgroup created.

SQL> select name, sector_size from v$asm_diskgroup;

NAME SECTOR_SIZE
------------------------------ -----------
DATA 512

Issuing a KFED on the disk shows us the sector size is also 512 bytes

$ kfed read DATA01

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

After taking care of all these steps and getting the ASM disks/diskgroups at a 512 byte sector size, our RMAN duplicate is 100% successful!

RMAN Duplicate Error with ASM and 4k Sector Size Disk

The Problem

Observations

Solution

Related Posts

Leave a Comment Cancel reply