|
Øystein Grøvlen made changes - 21/May/05 09:52 PM
Øystein Grøvlen made changes - 24/May/05 03:45 PM
On second thoughts, recovery after an OS-command backup is not an issue since the database will be shut down when the backup is performed. So, why not insert a "dummy" record in the new log file when doing backup? Then, no changes would be needed to the recovery mechanism.
This is further discussed in the following thread: http://thread.gmane.org/gmane.comp.apache.db.derby.devel/4170
My current view is that as long as the header it valid, it should be safe to use a new empty log file after recovery. Unless someone protests, I will change the recovery code to accept the newest log file even if its empty. The attached patch fixes the bug by setting the logEnd after recovery to the beginning of the new empty log file instead of the end of the previous file.
The patch contains changes to the following files: M java/engine/org/apache/derby/impl/store/raw/log/FileLogger.java - At the end of the redo scan, if the scan stopped in a file succeeding the file of the last log record, update logEnd to this position. - Change assert to allow logEnd to be in a newer file than that of the last log record. M java/engine/org/apache/derby/impl/store/raw/log/Scan.java - Introduced new variable newFileStart which will only have a valid LogInstant value when the scan is at the header of the file. - When a new file is entered, set newFileStart to the first possible LogInstant of this file (end of header). - When a log record is encountered, set newFileStart to INVALID_LOG_INSTANT. - Changed getLogRecordEnd() to return newFileStart if that is valid (i.e., scan is at the start of a file) - Removed comment about not starting to write to the new empty log file, since that is not true anymore. A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup_app.properties - Test properties M java/testing/org/apache/derbyTesting/functionTests/tests/store/copyfiles.ant - Added new property files A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup_app.properties - Test properties. - useextdirs=true needed so the backup is placed somewhere the next test can find it. A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackup.java - Test that is supposed to be run after RecoveryAfterBackupSetup.java. - Does recovery, updates the database, shutdowns the database, and does roll-forward restore. - Checks that updates made after recovery is reflected in the database after roll-forward restore. A java/testing/org/apache/derbyTesting/functionTests/tests/store/RecoveryAfterBackupSetup.java - Test that does the preparation for the RecoveryAfterBackup test. - Inserts a few records, makes a backup, and stops without shutting down. M java/testing/org/apache/derbyTesting/functionTests/harness/RunTest.java - For tests where the database is not deleted at the end of the test, do not delete the external directories either. - This is necessary to be able to access the backup in suceeding tests. A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackupSetup.out - Test output A java/testing/org/apache/derbyTesting/functionTests/master/RecoveryAfterBackup.out - Test output MM java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall - Added tests to storerecovery suite. - Changed property eol-style.
Øystein Grøvlen made changes - 04/Jul/05 07:17 PM
Hi Øystein,
I could not understand from the changes, If the new log file will be recognized in the following two cases: 1) After the log switch, if the first log record to the file gets partially written on a preallocated log file 2) After the log swith , if the first log write is written out of order incompletely and the checksum check verification fails. In Scan.java : newFileStart gets set to LogCounter.INVALID_LOG_INSTANT, immediately after reading the instant , but the partial record verification and checksum checks happen after that and valid logEnd value still refers to previous log file . and also it might be good idea to make sure all the fields written in the initialization of the log file are correct before using the new log file during recovery: initialization writes 4 fileds seperately , whereas verification only looks at 3 fields. LogToFIle.java() : initLogFile() : newlog.writeInt(fid); newlog.writeInt(OBSOLETE_LOG_VERSION_NUMBER); // for silly backwards compatibility reason newlog.writeLong(number); newlog.writeLong(prevLogRecordEndInstant); whereas LogTOFIle: private boolean verifyLogFormat(StorageRandomAccessFile log, longnumber) which is called before the swicth does not read/verify "prevLogRecordEndInstant" . Thanks -suresh Suresh,
The intent is that the new log file should be recognized after a log switch, regardless of what log records may have been written after this log switch. I see your point that if the log file contains a single corrupt record, the log switch will not have an effect since newFileStart has been reset and logEnd has not been advanced to the new file. Thanks for detecting this. My suggestion for fixing this, is to drop the introduction of newFileStart and just update logEnd to the start of the new log file when the log file is detected. In addition, logEnd must not be reset to INVALID_LOG_INSTANT when a scan is closed. This way it is possible to get the logEnd after the scan is closed and check whether it has been advanced since the last log record was processed. For the verification of prevLogRecordEndInstant, I guess that the best is to verify this for all files processed during recovery. I assume that the prevLogRecordEndInstant of a file should be equal to the current logEnd when switching to this file. Thanks for the advice, I will create a new patch. Øystein
Øystein Grøvlen made changes - 22/Dec/05 07:39 PM
Øystein Grøvlen made changes - 22/Dec/05 07:41 PM
The recently attached patch (derby-298a.diff) addresses Suresh's
review comments. The only major change from the previous patch is in java/engine/org/apache/derby/impl/store/raw/log/Scan.java. The changes to this file compared to the current head of trunk are: - When a new log file is entered, check that the header of this file refers to the end of the last log record of the previous log file. If not, stop the scan. - If the header was consistent, update knowGoodLogEnd to the first possible LogInstant of this file (end of header). - close() no longer reset knownGoodLogEnd since it is needed by FileLogger after the scan is closed. - Changed comment of getLogRecordEnd() to reflect that it can be used after the scan is closed, and that it at that time may return the start of an empty log file. - Removed comment about not starting to write to the new empty log file, since that is not true anymore. In addition, the property files for the tests have been updated so they are run without the security manager. derbyall has been run with no new failures. Øystein ,
I reviewed the latest patch, it looks good. But I could not understand why you would need the following check ? FileLogge.java : + long end = redoScan.getLogRecordEnd(); + if (end != LogCounter.INVALID_LOG_INSTANT If end is LogCounter.INVALID_LOG_INSTANT then logEnd is also likely to be LogCounter.INVALID_LOG_INSTANT right ? + && (LogCounter.getLogFileNumber(logEnd) + < LogCounter.getLogFileNumber(end))) { + logEnd = end; + } In what secnario condition will be false ? If end is LogCounter.INVALID_LOG_INSTANT then logEnd is also likely to be Another minor thing I notices is test files copyrigth notices have wrong file names : RecoveryAfterBackup.java: Derby - Class org.apache.derbyTesting.functionTests.store.LogChecksumSetup RecoveryAfterBackupSetup.java: Derby - Class org.apache.derbyTesting.functionTests.store.LogChecksumSetup Thanks -suresh Suresh,
You are right that with my latest version of the patch, getLogRecorEnd() will only return INVALID_LOG_INSTANT when the scan did not get out of the starting blocks. Regardsless of that, the second part of the if test will always be false when end is INVALID_LOG_INSTANT. I will remove the unecessary check and fix the copyright notices. derby-298b.diff is an updated patch that fixes the issues raised by Suresh in his latest review.
The storeall testsuite has been run without errors. Are currently running the derbyall testsuite. Will notify list later today if derbyall should fail.
Øystein Grøvlen made changes - 05/Jan/06 10:02 PM
Verified that the added test run without failure.
Øystein Grøvlen made changes - 10/Jan/06 06:41 PM
This patch added two new tests, RecoveryAfterBackup and RecoveryAfterBackupSetup, both of which are being run without a SecurityManager due to the noSecurityManager=true in their _app.properties files.
Why are these tests excluded from running with a SecurityManager? It should be rare that a test is to not to be run with a SecurityManager, if such a need arises the noSecurityManager=true must be commented to indicate why this is the case. The default for any new tests in the existing harness should be to run with the SecurityManager. There are a handful of tests that have noSecurityManager=true with no comment or a comment that says needs investigating, I'm working on all of these since they are exisiting tests, but it's not my itch to clean up new tests.
Andrew McIntyre made changes - 20/Mar/07 09:53 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
However, this is not what happens when I run the recovery part of the example in this report. Since, currentLogFileLength is a large number, it detects "zapped log end on log file", goes on to the next file, which does not exist, and returns. (Who sets the length of a log file? Is this maximum size until a log switch is performed?) The effect is the same, but this can not be used to detect an empty log file and apply the solution proposed by Suresh. Instead, one would have to do some hairy file handling at a later stage.
An alternative way to fix this would be to just create a dummy log record in the new log file as part of the backup command. This would make the redo scan end in the new log file. However, this will not work for those who do backup with OS-commands (i.e., copy the files directly).
I would also think it should be possible to do the log switch in such a way that it is possible to detect during recovery whether the log switch had completed or not. If this was the case, one could just set knownGoodLogEnd of the redo scan to the start of the empty file if the log switch was completed. Does anyone know if this is possible?