Last week we’ve encountered the archived-log-gap-missing issue on Oracle 10.2.0.3 physical standby. Looking from V$MANAGED_STANDBY, it showed waiting for gap of the archived log sequence #53713.
SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY; PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS --------- ------------ ---------- ---------- ---------- ---------- ARCH CONNECTED 0 0 0 0 ARCH CLOSING 2 58067 102401 1360 ARCH CONNECTED 0 0 0 0 ARCH CONNECTED 0 0 0 0 RFS IDLE 0 0 0 0 RFS IDLE 0 0 0 0 RFS IDLE 0 0 0 0 RFS IDLE 0 0 0 0 RFS IDLE 0 0 0 0 RFS IDLE 2 58068 2 1823 RFS IDLE 0 0 0 0 MRP0 WAIT_FOR_GAP 1 53713 0 0
The alert log showed similarly with additional information saying that it was trying to open the missing log and the Fetch Archive Log (FAL) was requesting the gap sequence from primary, but all failed.
Media Recovery Log /opt/oracle/admin/PROD/standby/arch_1_53713_645984751 Error opening /opt/oracle/admin/PROD/standby/arch_1_53713_645984751 Attempting refetch Media Recovery Waiting for thread 1 sequence 53713 Fetching gap sequence in thread 1, gap sequence 53713-53713 FAL[client]: Error fetching gap sequence Wed May 12 15:26:44 2010 FAL[client]: Failed to request gap sequence GAP - thread 1 sequence 53713-53713 DBID 2748812654 branch 645984751 FAL[client]: All defined FAL servers have been attempted.
Interestingly, the specific archived log file (arch_1_53713_645984751) existed already on the standby server with proper permission. Not only that, Data Guard was aware of it when querying the v$archived_log.
select sequence#, name, archived, applied from v$archived_log where sequence# like '%53713%'; SEQUENCE# NAME ARC APP ---------- ------------------------------------------------------- --- --- 53713 /opt/oracle/admin/PROD/standby/arch_3_53713_645984751 YES NO
An attempt to register it again obviously failed.
SQL> ALTER DATABASE REGISTER LOGFILE '/opt/oracle/admin/PROD/standby/arch_1_53713_645984751'; ALTER DATABASE REGISTER LOGFILE '/opt/oracle/admin/PROD/standby/arch_1_53713_645984751' * ERROR at line 1: ORA-16089: archive log has already been registered
At this point, it did not seem like anything we could do. Even trying to copy the archived file to another location and re-register with a different path did not help either.
Reading blog post by Jason, this might be a bug as described in the document ID 5576816.8 – “Bug 5576816 FAL gap resolution does not work with max_connection set in some scenario.” The bug’s description says that “Gap resolution appears to hang after exhausting disk space on the standby system. The hang persists even after additional disk space is made available.” This seems to fit our incident because before having this issue, the disk space on the standby was filled up. And even after cleanup, the gap resolution appeared to hang.
The suggested workaround from the document is to disable parallel archival and bounce the primary instance. Um…
The restart of the primary instance was something we’re hesitant to do especially when without a clear description of what it will do. Even though a suggestion of using “kill -9” on the archiver (ARC) OS processes in the post’s comments with no downtime was very temping, it was still obviously too risky to do in the production environment.
Fortunately, while looking into other options, we’ve found this blog from Andy. It’s interesting that once we tried “alter database register” again but now with “or replace” keyword, it immediately fixed this issue.
ALTER DATABASE REGISTER OR REPLACE PHYSICAL LOGFILE '/opt/oracle/admin/BMCPD/standby/arch_1_53713_645984751';
Hello you cos i do totally love your great site, i would be glad to blog a review on your awesome wordpress site on my little would you allow me that? Hosting Packages
Great dude…,
This adds me something….
Most excellent post – Thanks!!!!!
Useful article.
Thanks
excellent article and wonderfull site
thank a lot
upul.