Derby
  1. Derby
  2. DERBY-4589

Corrupted database prevents startup and should be automatically repaired perhaps

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.5.3.0
    • Fix Version/s: 10.8.1.2, 10.9.1.0
    • Component/s: Store
    • Labels:
      None
    • Environment:
      Windows 2000, SP4. J2SE 1.6
    • Urgency:
      Urgent
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Data corruption

      Description

      I have found a database in my application that prevents startup due to it being corrupted.
      The driver reports that the database does not exist, even though it does. Then when my app tries to create the database using ;create=true; on the URL it fails.

      I think this happened due to the app being killed in Task Manager while it was creating the database.

      I have the database saved so that you can reproduce the problem. (I'm not sure if I can attach it yet)

      1. 2010-03.zip
        12 kB
        Jeff Mckenzie
      2. derby-4589-01-ab-missingServiceProperties.diff
        10 kB
        Rick Hillegas
      3. derby-4589-01-ac-missingServiceProperties.diff
        11 kB
        Rick Hillegas
      4. Test_4589.java
        2 kB
        Rick Hillegas
      5. Test_4589.java
        2 kB
        Rick Hillegas

        Issue Links

          Activity

          Hide
          Jeff Mckenzie added a comment -

          This is a database that is empty, but is corrupted somehow.

          Derby tries to create a new database because it doesn't see this database, but fails to create a new database, thus preventing the app from ever using it.

          Show
          Jeff Mckenzie added a comment - This is a database that is empty, but is corrupted somehow. Derby tries to create a new database because it doesn't see this database, but fails to create a new database, thus preventing the app from ever using it.
          Hide
          Rick Hillegas added a comment -

          Other people have been confused by this behavior it seems.

          Various tasks have to be performed before the database exists. For instance, service.properties must be written and the system tables must be created. What should happen if you try to connect to a partially created database?

          1) Should you get a message saying that the database is unusable rather than a message saying that the database does not exist?

          2) Should Derby raise an error saying that it is deleting or renaming the half-created database so that you can retry database creation?

          Thanks.

          Show
          Rick Hillegas added a comment - Other people have been confused by this behavior it seems. Various tasks have to be performed before the database exists. For instance, service.properties must be written and the system tables must be created. What should happen if you try to connect to a partially created database? 1) Should you get a message saying that the database is unusable rather than a message saying that the database does not exist? 2) Should Derby raise an error saying that it is deleting or renaming the half-created database so that you can retry database creation? Thanks.
          Hide
          Jeff Mckenzie added a comment - - edited

          If it is possible to determine that the database is half-created, then
          either option 1 or 2 would be good. Otherwise, your app must make a guess.
          At the moment my app renames the database to DB_NAME_corrupted, and carries
          on making a new database (but it has no way of knowing whether there was any
          data in the corrupted database, and no way of recovering it).

          The main issue is that Derby should tell you whether there is any data in
          the database, or that it is just a half-created database. I suppose writing
          a creation completed flag to the database somewhere would be helpful.
          Database creation should be atomic like a transaction - all or nothing. If
          that creation completed flag is missing on next startup, then Derby could
          safely assume that the creation was not completed and finish it off or
          rebuild the DB from scratch. I'd favour option 1, because it gives the app
          the chance to cancel the operation.

          I've only encountered one other corrupt database, but that one had 20 MB of
          data in it. All other problems I've seen were related to this DB creation
          issue.

          Thanks for your comments.

          Show
          Jeff Mckenzie added a comment - - edited If it is possible to determine that the database is half-created, then either option 1 or 2 would be good. Otherwise, your app must make a guess. At the moment my app renames the database to DB_NAME_corrupted, and carries on making a new database (but it has no way of knowing whether there was any data in the corrupted database, and no way of recovering it). The main issue is that Derby should tell you whether there is any data in the database, or that it is just a half-created database. I suppose writing a creation completed flag to the database somewhere would be helpful. Database creation should be atomic like a transaction - all or nothing. If that creation completed flag is missing on next startup, then Derby could safely assume that the creation was not completed and finish it off or rebuild the DB from scratch. I'd favour option 1, because it gives the app the chance to cancel the operation. I've only encountered one other corrupt database, but that one had 20 MB of data in it. All other problems I've seen were related to this DB creation issue. Thanks for your comments.
          Hide
          Rick Hillegas added a comment -

          May be related to DERBY-4098.

          Show
          Rick Hillegas added a comment - May be related to DERBY-4098 .
          Hide
          Rick Hillegas added a comment -

          Attaching Test_4589.java, a repro for this problem. The program creates a database, shuts it down, deletes the service.properties in the database, then tries to re-create the database. This raises:

          ERROR XBM0J: Directory .../db_4589 already exists.

          Show
          Rick Hillegas added a comment - Attaching Test_4589.java, a repro for this problem. The program creates a database, shuts it down, deletes the service.properties in the database, then tries to re-create the database. This raises: ERROR XBM0J: Directory .../db_4589 already exists.
          Hide
          Rick Hillegas added a comment -

          Attaching revamped version of the repro which deletes the database at startup in order to have a clean slate.

          Show
          Rick Hillegas added a comment - Attaching revamped version of the repro which deletes the database at startup in order to have a clean slate.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-4589-01-ab-missingServiceProperties.diff. This patch adds an error message for the case when the database directory exists but it is missing service.properties. I am running regression tests now.

          I agree with Mike's analysis on DERBY-4733. We should not delete a directory just because it is missing the files which signify an intact database. The best we can do is raise an error suggesting that the problem might be that Derby crashed during database creation. It should remain the user's responsibility to throw away the half-created database and try again.

          Touches the following files:

          ----------------

          M java/engine/org/apache/derby/impl/services/monitor/StorageFactoryService.java

          Raise a special error if the database directory exists but it is missing service.properties.

          ----------------

          M java/engine/org/apache/derby/loc/messages.xml
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java

          Special error message suggesting that Derby may have crashed during database creation.

          ----------------

          A java/testing/org/apache/derbyTesting/functionTests/tests/lang/HalfCreatedDatabaseTest.java
          M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java

          Test for this error condition.

          Show
          Rick Hillegas added a comment - Attaching derby-4589-01-ab-missingServiceProperties.diff. This patch adds an error message for the case when the database directory exists but it is missing service.properties. I am running regression tests now. I agree with Mike's analysis on DERBY-4733 . We should not delete a directory just because it is missing the files which signify an intact database. The best we can do is raise an error suggesting that the problem might be that Derby crashed during database creation. It should remain the user's responsibility to throw away the half-created database and try again. Touches the following files: ---------------- M java/engine/org/apache/derby/impl/services/monitor/StorageFactoryService.java Raise a special error if the database directory exists but it is missing service.properties. ---------------- M java/engine/org/apache/derby/loc/messages.xml M java/shared/org/apache/derby/shared/common/reference/SQLState.java Special error message suggesting that Derby may have crashed during database creation. ---------------- A java/testing/org/apache/derbyTesting/functionTests/tests/lang/HalfCreatedDatabaseTest.java M java/testing/org/apache/derbyTesting/functionTests/tests/lang/_Suite.java Test for this error condition.
          Hide
          Dag H. Wanvik added a comment -

          Change looks good to me. Using ij, I verified manually that when deleting "service.properties", I got the expected error.

          +1

          Show
          Dag H. Wanvik added a comment - Change looks good to me. Using ij, I verified manually that when deleting "service.properties", I got the expected error. +1
          Hide
          Rick Hillegas added a comment -

          Thanks, Dag. The tests showed one problem: ErrorCodeTest needed the new error message. Attaching derby-4589-01-ac-missingServiceProperties.diff, which fixes ErrorCodeTest. Committed at subversion revision 1091772.

          Show
          Rick Hillegas added a comment - Thanks, Dag. The tests showed one problem: ErrorCodeTest needed the new error message. Attaching derby-4589-01-ac-missingServiceProperties.diff, which fixes ErrorCodeTest. Committed at subversion revision 1091772.
          Hide
          Rick Hillegas added a comment -

          Ported revision 1091772 from trunk to 10.8 branch at subversion revision 1091774.

          Show
          Rick Hillegas added a comment - Ported revision 1091772 from trunk to 10.8 branch at subversion revision 1091774.
          Hide
          Rick Hillegas added a comment -

          Further work could be done in the vetService() method if we want to catch other symptoms of failed database creation. However, I think that the check for the missing service.properties file covers much of the problem. Resolving the issue.

          Show
          Rick Hillegas added a comment - Further work could be done in the vetService() method if we want to catch other symptoms of failed database creation. However, I think that the check for the missing service.properties file covers much of the problem. Resolving the issue.
          Hide
          Kathey Marsden added a comment -

          The SQState change on the error on database creation if a database exists at a minimum needs a release note, but I started a conversation about why it was changed in DERBY-5526

          https://issues.apache.org/jira/browse/DERBY-5526?focusedCommentId=13165598&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13165598

          Show
          Kathey Marsden added a comment - The SQState change on the error on database creation if a database exists at a minimum needs a release note, but I started a conversation about why it was changed in DERBY-5526 https://issues.apache.org/jira/browse/DERBY-5526?focusedCommentId=13165598&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13165598
          Hide
          Jeff Mckenzie added a comment -

          If the database is completely empty and has no tables, why not just recreate the whole thing from scratch?

          You would only need a little flag somewhere for this. It gets written to once the database is fully created and opens normally. If it is in the wrong state, delete and recreate the database.

          OTOH, if you have definitely identified the problem, carry on...

          Show
          Jeff Mckenzie added a comment - If the database is completely empty and has no tables, why not just recreate the whole thing from scratch? You would only need a little flag somewhere for this. It gets written to once the database is fully created and opens normally. If it is in the wrong state, delete and recreate the database. OTOH, if you have definitely identified the problem, carry on...
          Hide
          Kathey Marsden added a comment -

          I don't think it would be good to change the behavior which has always existed which is that Derby will not attempt to create a database in a directory that already exists. Also I think we never should delete user files on database creation. The corruption might be something other than a failed create attempt. I actually have seen user cases come in where someone had deleted the service.properties file and we were able to recover their data by reconstructing the file. If Derby had simply blown it away, such recovery would not have been possible.

          But yes, it looks like for DERBY-5526, the problem is the new error message has been thrown where it shouldn't be, so fixing that should resolve the issue for the user expecting the old error in that case.

          Show
          Kathey Marsden added a comment - I don't think it would be good to change the behavior which has always existed which is that Derby will not attempt to create a database in a directory that already exists. Also I think we never should delete user files on database creation. The corruption might be something other than a failed create attempt. I actually have seen user cases come in where someone had deleted the service.properties file and we were able to recover their data by reconstructing the file. If Derby had simply blown it away, such recovery would not have been possible. But yes, it looks like for DERBY-5526 , the problem is the new error message has been thrown where it shouldn't be, so fixing that should resolve the issue for the user expecting the old error in that case.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Rick Hillegas
              Reporter:
              Jeff Mckenzie
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development