Uploaded image for project: 'Derby'
  1. Derby
  2. DERBY-3393

Database corruption when adding sleep() in RAFContainer4.writePage()

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 10.2.2.1, 10.3.1.4, 10.4.1.3
    • None
    • Store
    • None
    • Solaris 10, OpenSolaris snv_80, Sun Java SE 5.0, Sun Java SE 6, Derby trunk #618305
    • High Value Fix
    • Data corruption

    Description

      In order to test whether RAFContainer4.writePage() was properly synchronized, I made it sleep for 100 ms each time after it had written a page, right before it set its needsSync flag to true, like this:

      Index: java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
      ===================================================================
      — java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (revision 618305)
      +++ java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (working copy)
      @@ -350,6 +350,11 @@
      dataFactory.writeFinished();
      }
      } else {
      + try

      { + Thread.sleep(100); + }

      catch (InterruptedException ie)

      { + // ignored + }

      synchronized(this)

      { needsSync = true; }

      When I ran derbyall with this change, I saw some failures in the storerecovery suite. I reran the storerecovery suite a couple of times, seeing failures each time, although the actual failures varied a bit.

      The most common failure was the following (page numbers and container numbers varied):

      > Exception in thread "main" java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
      > Caused by: java.sql.SQLException: Failed to start database 'wombat', see the next exception for details.
      > ... 17 more
      > Caused by: java.sql.SQLException: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
      > ... 14 more
      > Caused by: ERROR XSDB4: Page Page(19,Container(0, 1073)) is at version 0, the log file contains change version 578, either there are log records of this page missing, or this page did not get written out to disk properly.
      > ... 14 more

      This failure was seen in oc_rec3, oc_rec4, dropcrash and dropcrash2.

      In some cases, I saw this failure in oc_rec3

      > Exception while trying to insert row number: 0
      > ERROR XBCA0: Cannot create new object with key Page(2,Container(0, 1040)) in PageCache cache. The object already exists in the cache.

      which would be followed by this error in oc_rec4:

      > ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'TEST1_2_IDX_INDCOL3' defined on 'TEST1_2'.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mikem Mike Matrigali
            knutanders Knut Anders Hatlen
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment