Issue Details (XML | Word | Printable)

Key: DERBY-662
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Mike Matrigali
Reporter: Mike Matrigali
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

during crash recovery of a drop table, on case insensitive files systems derby may delete wrong file

Created: 29/Oct/05 11:19 PM   Updated: 12/Jul/06 03:24 AM
Return to search
Component/s: Store
Affects Version/s: 10.0.2.0, 10.1.2.1
Fix Version/s: 10.0.2.2, 10.1.2.1, 10.2.1.6

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works dropcrash.diff 2005-10-30 09:51 AM Mike Matrigali 14 kB
Environment: jvm/os/filesystem where file names are case insensitive such that delete of C2080.dat will remove c2080.dat if it exists.

Resolution Date: 23/Nov/05 04:01 AM


 Description  « Hide
Sometimes during redo the system will incorrectly remove the file associated
with a table. The bug requires the following conditions to reproduce:
1) The OS/filesystem must be case insensitive such that a request to delete
   a file named C2080.dat would also remove c2080.dat. This is true in
   windows default file systems, not true in unix/linux filesystems that
   I am aware of.
2) The system must be shutdown not in a clean manner, such that a subsequent
   access of the database causes a REDO recovery action of a drop table
   statement. This means that a drop table statement must have happened
   since the last checkpoint in the log file. Examples of things that cause
   checkpoints are:
   o clean shutdown from ij using the "exit" command
   o clean shutdown of database using the "shutdown=true" url
   o calling the checkpoint system procedure
   o generating enough log activity to cause a regularly scheduled checkpoint.
3) If the conglomerate number of the above described drop table is TABLE_1,
   then for a problem to occur there must also exist in the database a table
   such that it's HEX(TABLE_2) = TABLE_1
4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation
   of the drop of TABLE_1 or there must be enough other table references during
   the REDO phase to push the caching of of the open of TABLE_2 out of cache.

If all of the above conditions are met then during REDO the system will
incorrectly delete TABLE_2 while trying to redo the drop of TABLE_1.
<p>
I will be adding the following test to reproduce the problem:
1) create 500 tables, need enough tables to insure that conglomerate number
   2080 (c820.dat) and 8320 (c2080.dat) exist.
2) checkpoint the database so that create does not happen during REDO
3) drop table with conglomerate number 2080, mapping to c820.dat. It looks
   it up in the catalog in case conglomerate number assignment changes for
   some reason.
4) exit the database without a clean shudown, this is the default for test
   suites which run multiple tests in a single db - no clean shutdown is done.
   Since we only do a single drop since the last checkpoint, test will cause
   the drop during the subsequent REDO.
5) run next test program dropcrash2, which will cause redo of the drop. At
   this point the bug will cause file c2080.dat to be incorrectly deleted and
   thus accesses to conglomerate 8320 will throw container does not exist
   errors.
6) check the consistency of the database which will find the container does
   not exist error.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #329494 Sun Oct 30 00:02:02 UTC 2005 mikem Fix for DERBY-662.

The change is an obvious fix to BaseDataFileFactory.java code which creates
a conglomerates container file name given it's conglomerate number. This is
a simple hex conversion which was missing from one of the paths through the
code.

The path is almost never taken, but in the following circumstance during
redo crasch recovery this bug
could cause derby to delete the underlying file of a table. The circumstances
are as follows:

1) The OS/filesystem must be case insensitive such that a request to delete
a file named C2080.dat would also remove c2080.dat. This is true in
windows default file systems, not true in unix/linux filesystems that
I am aware of.
2) The system must be shutdown not in a clean manner, such that a subsequent
access of the database causes a REDO recovery action of a drop table
statement. This means that a drop table statement must have happened
since the last checkpoint in the log file. Examples of things that cause
checkpoints are:
o clean shutdown from ij using the "exit" command
o clean shutdown of database using the "shutdown=true" url
o calling the checkpoint system procedure
o generating enough log activity to cause a regularly scheduled checkpoint.
3) If the conglomerate number of the above described drop table is TABLE_1,
then for a problem to occur there must also exist in the database a table
such that it's HEX(TABLE_2) = TABLE_1
4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation
of the drop of TABLE_1 or there must be enough other table references during
the REDO phase to push the caching of of the open of TABLE_2 out of cache.

The fix adds a test case which before the fix will force a container not
found error on an existing table.
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/BaseTest.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash.java
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash2.out
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash.out
ADD /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash2.java

Repository Revision Date User Message
ASF #329733 Mon Oct 31 04:33:42 UTC 2005 mikem merging fix for DERBY-662 from trunk to 10.1 codeline, which was committed
as svn 329494.

The change is an obvious fix to BaseDataFileFactory.java code which creates
a conglomerates container file name given it's conglomerate number. This is
a simple hex conversion which was missing from one of the paths through the
code.

The path is almost never taken, but in the following circumstance during
redo crasch recovery this bug
could cause derby to delete the underlying file of a table. The circumstances
are as follows:

1) The OS/filesystem must be case insensitive such that a request to delete
a file named C2080.dat would also remove c2080.dat. This is true in
windows default file systems, not true in unix/linux filesystems that
I am aware of.
2) The system must be shutdown not in a clean manner, such that a subsequent
access of the database causes a REDO recovery action of a drop table
statement. This means that a drop table statement must have happened
since the last checkpoint in the log file. Examples of things that cause
checkpoints are:
o clean shutdown from ij using the "exit" command
o clean shutdown of database using the "shutdown=true" url
o calling the checkpoint system procedure
o generating enough log activity to cause a regularly scheduled checkpoint.
3) If the conglomerate number of the above described drop table is TABLE_1,
then for a problem to occur there must also exist in the database a table
such that it's HEX(TABLE_2) = TABLE_1
4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation
of the drop of TABLE_1 or there must be enough other table references during
the REDO phase to push the caching of of the open of TABLE_2 out of cache.

The fix adds a test case which before the fix will force a container not
found error on an existing table.
Files Changed
ADD /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash.java (from /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash.java)
MODIFY /db/derby/code/branches/10.1/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/tests/store/BaseTest.java
MODIFY /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/suites/storerecovery.runall
ADD /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash.out (from /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash.out)
ADD /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash2.out (from /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/dropcrash2.out)
ADD /db/derby/code/branches/10.1/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash2.java (from /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/store/dropcrash2.java)

Repository Revision Date User Message
ASF #330583 Thu Nov 03 17:16:22 UTC 2005 mikem some minor comment changes and some new asserts that I found helpful
while debugging DERBY-662.
Files Changed
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java

Repository Revision Date User Message
ASF #348213 Tue Nov 22 19:00:06 UTC 2005 mikem DERBY-662 porting patch from trunk into 10.0 line.

Sometimes during redo the system will incorrectly remove the file associated
with a table. The bug requires the following conditions to reproduce:
1) The OS/filesystem must be case insensitive such that a request to delete
a file named C2080.dat would also remove c2080.dat. This is true in
windows default file systems, not true in unix/linux filesystems that
I am aware of.
2) The system must be shutdown not in a clean manner, such that a subsequent
access of the database causes a REDO recovery action of a drop table
statement. This means that a drop table statement must have happened
since the last checkpoint in the log file. Examples of things that cause
checkpoints are:
o clean shutdown from ij using the "exit" command
o clean shutdown of database using the "shutdown=true" url
o calling the checkpoint system procedure
o generating enough log activity to cause a regularly scheduled checkpoint.
3) If the conglomerate number of the above described drop table is TABLE_1,
then for a problem to occur there must also exist in the database a table
such that it's HEX(TABLE_2) = TABLE_1
4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation
of the drop of TABLE_1 or there must be enough other table references during
the REDO phase to push the caching of of the open of TABLE_2 out of cache.

If all of the above conditions are met then during REDO the system will
incorrectly delete TABLE_2 while trying to redo the drop of TABLE_1.
<p>
I will be adding the following test to reproduce the problem:
1) create 500 tables, need enough tables to insure that conglomerate number
2080 (c820.dat) and 8320 (c2080.dat) exist.
2) checkpoint the database so that create does not happen during REDO
3) drop table with conglomerate number 2080, mapping to c820.dat. It looks
it up in the catalog in case conglomerate number assignment changes for
some reason.
4) exit the database without a clean shudown, this is the default for test
suites which run multiple tests in a single db - no clean shutdown is done.
Since we only do a single drop since the last checkpoint, test will cause
the drop during the subsequent REDO.
5) run next test program dropcrash2, which will cause redo of the drop. At
this point the bug will cause file c2080.dat to be incorrectly deleted and
thus accesses to conglomerate 8320 will throw container does not exist
errors.
6) check the consistency of the database which will find the container does
not exist error.
Files Changed
MODIFY /db/derby/code/branches/10.0/java/engine/org/apache/derby/impl/store/raw/data/BaseDataFileFactory.java