Uploaded image for project: 'Derby'
  1. Derby
  2. DERBY-3611

ERROR XSDG2: Invalid checksum on Page occurs during mass inserts into two-column bigint PK table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 10.3.1.4, 10.3.2.1
    • 10.3.3.0, 10.4.1.3
    • Store
    • None
    • Occurred on 6 separate quad-core machines running either Vista, Vista SP1 and Server 2008. Also seen on AMD64 dual core 4200 with 4 GB of ram running 32 bit XP pro.

    Description

      The original extensive email thread reporting this issue can be seen from here: http://www.nabble.com/ERROR-XSDG2%3A-Invalid-checksum-on-Page-Page%280%2CContainer%280%2C-1313%29%29-td16389697.html.

      I have an intensive data-processing application which utilises Apache Derby, using 6 quad-core machines running Vista SP1 and/or Vista Server 2008. Each quad-core machine typically runs 4 separate JVM worker processes, each running their own embedded derby database.

      I have found after 5 or 10 hours of processing, one or a couple of my worker processes, start reporting the following error in their derby.log file:

      ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313))

      The worker process never seems to recover. Derby locates the error, reboots the database, but seems to inevitably report the same error again. I have tried both 10.3.1.4 and 10.3.2.1 with the same results. The conglomerate and page number is always the same.

      I know it is not a hardware issue, as this is across 6 separate machines, and it has happened with software / hardware raid, and no disk errors have been reported. A customer of our software also reported this error occurring on their AMD64 dual core 4200 with 4 GB of ram running 32 bit XP pro.

      The table the conglomerate refers to is as follows:

      CREATE TABLE text_table (guidhigh BIGINT NOT NULL,
      guid BIGINT NOT NULL,
      data BLOB (1G) NOT NULL,
      PRIMARY KEY (guidhigh, guid))

      In this application, essentially random values for guidhigh and guid were being created, with data being compressed text, that could range from anything from a few bytes to many megabytes in size.

      The processing code effectively did a select from the table on guidhigh and guid to check if an entry exists, before inserting a new row within a transaction.

      If I forceable shut the application down, I could connect to the database using ij, and would get the same error:

      ij> select count from text_table;
      ERROR XSDG2: Invalid checksum on Page Page(0,Container(0, 1313)), expected=304,608,373, on-disk version=2,462,088,751, page dump follows: Hex dump:
      00000000: 0076 0000 0001 0000 0000 0000 27ea 0000 .v..............
      00000010: 0000 0006 0000 0000 0000 0000 0000 0000 ................
      00000020: 0000 0000 0001 0000 0000 0000 0000 0000 ................
      ....

      A workaround which we managed to implement in our application, as suggested from derby-user via Stanley Bradbury, was to not have the PK during the load, which we managed to implement. We also replaced the two column PK with a single column and the problem has since never occurred.

      I'll attach a number of example derby.log files which contain the error messages.

      Attachments

        1. d3347-1a+2a.diff
          19 kB
          Knut Anders Hatlen
        2. derby-worker0.log
          69 kB
          David Sitsky
        3. derby-worker3.log
          183 kB
          David Sitsky
        4. derby-worker4.log
          59 kB
          David Sitsky

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sits David Sitsky
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: