Issue Details (XML | Word | Printable)

Key: DERBY-298
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Øystein Grøvlen
Reporter: Suresh Thalamati
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

rollforward will not work correctly if the system happens to crash immediately after rollforward backup.

Created: 19/May/05 04:30 AM   Updated: 20/Mar/07 09:53 PM
Return to search "STDCXX SGI MIPSpro issues"
Component/s: Store
Affects Version/s: 10.0.2.1
Fix Version/s: 10.1.3.1, 10.2.1.6

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works derby-298.diff 2005-07-04 07:17 PM Øystein Grøvlen 20 kB
File Licensed for inclusion in ASF works derby-298a.diff 2005-12-22 07:39 PM Øystein Grøvlen 20 kB
File Licensed for inclusion in ASF works derby-298b.diff 2006-01-05 10:02 PM Øystein Grøvlen 20 kB

Resolution Date: 10/Jan/06 06:41 PM


 Description  « Hide
If the system crashes after a rollforward backup; last log file
is empty(say log2.dat). On next crash-recovery system ignores the empty log
file and starts writing to the previous log(say log1.dat),
even thought there was successfule log file switch before the crash.
The reason I belive it is done this way to avoid special
handling of crashes during the log switch process.

Problem is on rollfroward restore from a backup log1.dat will get overwritten
from the copy in the backup, so any transaction that got added to log1.dat
after the backup was taken will be lost.
 
One possible solution that comes to my mind to solve this problem is
 1) check if an empty a log file exist after a redo crash-recovery , if
     the log archive mode is enabled.
 2) If it exists , delete and do log file switch again

 
Repro:
connect 'jdbc:derby:wombat;create=true';
create table t1(a int ) ;
insert into t1 values(1) ;
insert into t1 values(2) ;
call SYSCS_UTIL.SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE(
    'extinout/mybackup', 0);

--crash (NO LOG RECORDS WENT IN AFTER THE BACKUP).

connect 'jdbc:derby:wombat';
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
insert into t1 select a*2 from t1 ;
select count(*) from t1 ;

--exit from jvm and restore from backup

connect
'jdbc:derby:wombat;rollForwardRecoveryFrom=extinout/mybackup/wombat';
select count(*) from t1 ; -- THIS WILL GIVE INCORRECT VALUES

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Øystein Grøvlen made changes - 21/May/05 09:52 PM
Field Original Value New Value
Assignee Øystein Grøvlen [ oysteing ]
Øystein Grøvlen made changes - 24/May/05 03:45 PM
Status Open [ 1 ] In Progress [ 3 ]
Øystein Grøvlen added a comment - 24/May/05 10:50 PM
Looking at the code, I became a bit confused about the definition of an empty log file. Scan.getNextRecordForward contains debug output when it detects an empty log file. It will then return without setting knownGoodLogEnd. Hence, new log records will be written to the end of the previous file. As Suresh says this is probably to be able to handle crashes during log switch.

However, this is not what happens when I run the recovery part of the example in this report. Since, currentLogFileLength is a large number, it detects "zapped log end on log file", goes on to the next file, which does not exist, and returns. (Who sets the length of a log file? Is this maximum size until a log switch is performed?) The effect is the same, but this can not be used to detect an empty log file and apply the solution proposed by Suresh. Instead, one would have to do some hairy file handling at a later stage.

An alternative way to fix this would be to just create a dummy log record in the new log file as part of the backup command. This would make the redo scan end in the new log file. However, this will not work for those who do backup with OS-commands (i.e., copy the files directly).

I would also think it should be possible to do the log switch in such a way that it is possible to detect during recovery whether the log switch had completed or not. If this was the case, one could just set knownGoodLogEnd of the redo scan to the start of the empty file if the log switch was completed. Does anyone know if this is possible?



Øystein Grøvlen added a comment - 25/May/05 07:15 AM
On second thoughts, recovery after an OS-command backup is not an issue since the database will be shut down when the backup is performed. So, why not insert a "dummy" record in the new log file when doing backup? Then, no changes would be needed to the recovery mechanism.

Øystein Grøvlen added a comment - 10/Jun/05 04:33 AM
This is further discussed in the following thread: http://thread.gmane.org/gmane.comp.apache.db.derby.devel/4170

My current view is that as long as the header it valid, it should be safe to use a new empty log file after recovery. Unless someone protests, I will change the recovery code to accept the newest log file even if its empty.

Øystein Grøvlen added a comment - 04/Jul/05 07:17 PM
The attached patch fixes the bug by setting the logEnd after recovery to the beginning of the new empty log file instead of the end of the previous file.
The patch contains changes to the following files:

M java/engine/org/apache/derby/impl/store/raw/log/FileLogger.java
        - At the end of the redo scan, if the scan stopped in a file succeeding the file of the last log record, update logEnd to this position.
        - Change assert to allow logEnd to be in a newer file than that of the last log record.
         
M java/engine/org/apache/derby/impl/store/raw/log/Scan.java
        - Introduced new variable newFileStart which will only have a valid LogInstant value when the scan is at the header of the file.
        - When a new file is entered, set newFileStart to the first possible LogInstant of this file (end of header).
        - When a log record is encountered, set newFileStart to INVALID_LOG_INSTANT.
        - Changed getLogRecordEnd() to return newFileStart if that is valid (i.e., scan is at the start of a file)
        - Removed comment about not starting to write to the new empty log file, since that is not true anymore.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup_app.properties
         - Test properties

M java/testing/org/apache/derbyTesting/functionTests/tests/store/copyfiles.ant
         - Added new property files

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup_app.properties
        - Test properties.
        - useextdirs=true needed so the backup is placed somewhere the next test can find it.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup.java
        - Test that is supposed to be run after RecoveryAfterBackupSetup.java.
        - Does recovery, updates the database, shutdowns the database, and does roll-forward restore.
        - Checks that updates made after recovery is reflected in the database after roll-forward restore.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup.java
        - Test that does the preparation for the RecoveryAfterBackup test.
        - Inserts a few records, makes a backup, and stops without shutting down.

M java/testing/org/apache/derbyTesting/functionTests/harness/RunTest.java
        - For tests where the database is not deleted at the end of the test, do not delete the external directories either.
        - This is necessary to be able to access the backup in suceeding tests.

A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackupSetup.out
        - Test output

A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackup.out
        - Test output

MM java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall
        - Added tests to storerecovery suite.
        - Changed property eol-style.

Øystein Grøvlen made changes - 04/Jul/05 07:17 PM
Attachment derby-298.diff [ 12311078 ]
Suresh Thalamati added a comment - 08/Jul/05 07:29 AM
Hi Øystein,

 I could not understand from the changes, If the new log file will be recognized in the following two cases:
 1) After the log switch, if the first log record to the file gets partially written on a preallocated log file
 2) After the log swith , if the first log write is written out of order incompletely and the checksum check verification fails.
 
In Scan.java : newFileStart gets set to LogCounter.INVALID_LOG_INSTANT, immediately
after reading the instant , but the partial record verification and checksum checks happen
after that and valid logEnd value still refers to previous log file .

and also it might be good idea to make sure all the fields written in the initialization of the log file are correct before using the new log file during recovery: initialization writes 4 fileds seperately ,
whereas verification only looks at 3 fields.

LogToFIle.java() : initLogFile() :
newlog.writeInt(fid);
newlog.writeInt(OBSOLETE_LOG_VERSION_NUMBER); // for silly backwards compatibility reason
newlog.writeLong(number);
newlog.writeLong(prevLogRecordEndInstant);

whereas LogTOFIle: private boolean verifyLogFormat(StorageRandomAccessFile log, longnumber)
which is called before the swicth does not read/verify "prevLogRecordEndInstant" .


Thanks
-suresh
  







Øystein Grøvlen added a comment - 15/Jul/05 10:27 PM
Suresh,

The intent is that the new log file should be recognized after a log switch, regardless of what log records may have been written after this log switch. I see your point that if the log file contains a single corrupt record, the log switch will not have an effect since newFileStart has been reset and logEnd has not been advanced to the new file. Thanks for detecting this.

My suggestion for fixing this, is to drop the introduction of newFileStart and just update logEnd to the start of the new log file when the log file is detected. In addition, logEnd must not be reset to INVALID_LOG_INSTANT when a scan is closed. This way it is possible to get the logEnd after the scan is closed and check whether it has been advanced since the last log record was processed.

For the verification of prevLogRecordEndInstant, I guess that the best is to verify this for all files processed during recovery. I assume that the prevLogRecordEndInstant of a file should be equal to the current logEnd when switching to this file.

Thanks for the advice, I will create a new patch.
 
Øystein

Øystein Grøvlen made changes - 22/Dec/05 07:39 PM
Attachment derby-298a.diff [ 12321515 ]
Øystein Grøvlen made changes - 22/Dec/05 07:41 PM
Comment [ This new patch should address the issues from the last review. The only file that has changed from the earlier patch is java/engine/org/apache/derby/impl/store/raw/log/Scan.java, which now contains the following changes:
]
Øystein Grøvlen added a comment - 22/Dec/05 08:12 PM
The recently attached patch (derby-298a.diff) addresses Suresh's
review comments. The only major change from the previous patch is in
java/engine/org/apache/derby/impl/store/raw/log/Scan.java. The changes
to this file compared to the current head of trunk are:
 
   - When a new log file is entered, check that the header of this
     file refers to the end of the last log record of the previous log
     file. If not, stop the scan.
   - If the header was consistent, update knowGoodLogEnd to the first
     possible LogInstant of this file (end of header).
   - close() no longer reset knownGoodLogEnd since it is needed by
     FileLogger after the scan is closed.
   - Changed comment of getLogRecordEnd() to reflect that it can be
     used after the scan is closed, and that it at that time may
     return the start of an empty log file.
   - Removed comment about not starting to write to the new empty log
     file, since that is not true anymore.

In addition, the property files for the tests have been updated so
they are run without the security manager.

derbyall has been run with no new failures.

Suresh Thalamati added a comment - 05/Jan/06 11:18 AM
Øystein ,

I reviewed the latest patch, it looks good. But I could not understand why you would need the following check ?

FileLogge.java :
+ long end = redoScan.getLogRecordEnd();
+ if (end != LogCounter.INVALID_LOG_INSTANT

If end is LogCounter.INVALID_LOG_INSTANT then logEnd is also likely to be LogCounter.INVALID_LOG_INSTANT right ?

+ && (LogCounter.getLogFileNumber(logEnd)
+ < LogCounter.getLogFileNumber(end))) {
+ logEnd = end;
+ }

In what secnario condition will be false ? If end is LogCounter.INVALID_LOG_INSTANT then logEnd is also likely to be


Another minor thing I notices is test files copyrigth notices have wrong file names :
RecoveryAfterBackup.java:
Derby - Class org.apache.derbyTesting.functionTests.store.LogChecksumSetup
RecoveryAfterBackupSetup.java:
Derby - Class org.apache.derbyTesting.functionTests.store.LogChecksumSetup


Thanks
-suresh

Øystein Grøvlen added a comment - 05/Jan/06 05:45 PM
Suresh,

You are right that with my latest version of the patch,
getLogRecorEnd() will only return INVALID_LOG_INSTANT when the scan
did not get out of the starting blocks. Regardsless of that, the
second part of the if test will always be false when end is
INVALID_LOG_INSTANT.

I will remove the unecessary check and fix the copyright notices.

Øystein Grøvlen added a comment - 05/Jan/06 10:02 PM
derby-298b.diff is an updated patch that fixes the issues raised by Suresh in his latest review.
The storeall testsuite has been run without errors. Are currently running the derbyall testsuite.
Will notify list later today if derbyall should fail.

Øystein Grøvlen made changes - 05/Jan/06 10:02 PM
Attachment derby-298b.diff [ 12321717 ]
Repository Revision Date User Message
ASF #367352 Mon Jan 09 18:23:46 UTC 2006 mikem fix for DERBY-298. committing change for ·¹ystein Gr±±±vlen

 The attached patch fixes the bug by setting the logEnd after recovery to the beginning of the new empty log file instead of the end of the previous file.
The patch contains changes to the following files:

M java/engine/org/apache/derby/impl/store/raw/log/FileLogger.java
        - At the end of the redo scan, if the scan stopped in a file succeeding the file of the last log record, update logEnd to this position.
        - Change assert to allow logEnd to be in a newer file than that of the last log record.
         
M java/engine/org/apache/derby/impl/store/raw/log/Scan.java
        - Introduced new variable newFileStart which will only have a valid LogInstant value when the scan is at the header of the file.
        - When a new file is entered, set newFileStart to the first possible LogInstant of this file (end of header).
        - When a log record is encountered, set newFileStart to INVALID_LOG_INSTANT.
        - Changed getLogRecordEnd() to return newFileStart if that is valid (i.e., scan is at the start of a file)
        - Removed comment about not starting to write to the new empty log file, since that is not true anymore.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup_app.properties
         - Test properties

M java/testing/org/apache/derbyTesting/functionTests/tests/store/copyfiles.ant
         - Added new property files

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup_app.properties
        - Test properties.
        - useextdirs=true needed so the backup is placed somewhere the next test can find it.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup.java
        - Test that is supposed to be run after RecoveryAfterBackupSetup.java.
        - Does recovery, updates the database, shutdowns the database, and does roll-forward restore.
        - Checks that updates made after recovery is reflected in the database after roll-forward restore.

A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup.java
        - Test that does the preparation for the RecoveryAfterBackup test.
        - Inserts a few records, makes a backup, and stops without shutting down.

M java/testing/org/apache/derbyTesting/functionTests/harness/RunTest.java
        - For tests where the database is not deleted at the end of the test, do not delete the external directories either.
        - This is necessary to be able to access the backup in suceeding tests.

A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackupSetup.out
        - Test output

A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackup.out
        - Test output

MM java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall
        - Added tests to storerecovery suite.
        - Changed property eol-style.

 The recently attached patch (DERBY-298a.diff) addresses Suresh's
review comments. The only major change from the previous patch is in
java/engine/org/apache/derby/impl/store/raw/log/Scan.java. The changes
to this file compared to the current head of trunk are:
 
   - When a new log file is entered, check that the header of this
     file refers to the end of the last log record of the previous log
     file. If not, stop the scan.
   - If the header was consistent, update knowGoodLogEnd to the first
     possible LogInstant of this file (end of header).
   - close() no longer reset knownGoodLogEnd since it is needed by
     FileLogger after the scan is closed.
   - Changed comment of getLogRecordEnd() to reflect that it can be
     used after the scan is closed, and that it at that time may
     return the start of an empty log file.
   - Removed comment about not starting to write to the new empty log
     file, since that is not true anymore.

In addition, the property files for the tests have been updated so
they are run without the security manager.
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/copyfiles.ant
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup_app.properties
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup_app.properties
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/harness/RunTest.java
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackupSetup.out
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackup.out
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/log/Scan.java
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/log/FileLogger.java

Øystein Grøvlen added a comment - 10/Jan/06 06:41 PM
Verified that the added test run without failure.

Øystein Grøvlen made changes - 10/Jan/06 06:41 PM
Status In Progress [ 3 ] Resolved [ 5 ]
Fix Version/s 10.1.3.0 [ 12310616 ]
Fix Version/s 10.2.0.0 [ 11187 ]
Resolution Fixed [ 1 ]
Daniel John Debrunner added a comment - 14/Jan/06 06:44 AM
This patch added two new tests, RecoveryAfterBackup and RecoveryAfterBackupSetup, both of which are being run without a SecurityManager due to the noSecurityManager=true in their _app.properties files.

Why are these tests excluded from running with a SecurityManager?
It should be rare that a test is to not to be run with a SecurityManager, if such a need arises the noSecurityManager=true must be commented to indicate why this is the case.

The default for any new tests in the existing harness should be to run with the SecurityManager.

There are a handful of tests that have noSecurityManager=true with no comment or a comment that says needs investigating, I'm working on all of these since they are exisiting tests, but it's not my itch to clean up new tests.

Andrew McIntyre made changes - 20/Mar/07 09:53 PM
Status Resolved [ 5 ] Closed [ 6 ]