Derby
  1. Derby
  2. DERBY-4917

zero byte dbex.lck file can cause dual boot warning saying Severe and non-recoverable corruption can result and may have already occurred.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1.2.1
    • Fix Version/s: None
    • Component/s: Store
    • Environment:
    • Urgency:
      Normal
    • Issue & fix info:
      Workaround attached
    • Bug behavior facts:
      Seen in production

      Description

      User reports the following warning booting Derby 10.2 with JDK 1.5 SR11 FP1 on z/OS.

      ij> connnect 'jdbc:derby:wombat';
      IJ ERROR: Unable to establish connection
      ij> connect 'jdbc:derby:wombat';
      WARNING: Derby (instance c013800d-012c-8027-19b4-000000037b18) is attempting to
      boot the database <snip dbname > even though Derby (instance c0
      13800d-012c-74ae-07c3-0000000af3f0) may still be active. Only one instance of D
      erby should boot a database at a time. Severe and non-recoverable corruption can
      result and may have already occurred.

      The dbex.lck file is zero length. The code seems to indicate in DirFile4.getExclusiveLock() that a zero length dbex.lck file is possible if the product is booted with less than JDK 1.4 and Derby should show the warning under these circumstances, but investigation shows that even if the dbex,lck file is removed it is recreated with zero length using JDK 1.5. So there seem to be two issues.
      1) dbex.lck is somehow getting created with zero length with JDK 1.5 on this machine with JDK 1.5 SR11 FP1.

      2) We have this logic pertaining to JDK 1.3.1 in the product that probably can be removed and perhaps masks a real exclusive locking problem that perhaps should issue an Error rather than a warning if the file cannot be created with 4 bytes as expected.

      I can mimic the Warning on my system with a manufactured zero length dbex.lck file e.g.
      mv dbex.lck dbex.lck.orig
      touch dbex.lck
      java org.apache.derby.tools.ij
      ij version 10.2
      ij> connnect 'jdbc:derby:wombat';
      IJ ERROR: Unable to establish connection
      ij> connect 'jdbc:derby:wombat';
      WARNING: Derby (instance c013800d-012c-8027-19b4-000000037b18) is attempting to
      boot the database /u/kmarsd/repro/lockwarn/wombat even though Derby (instance c0
      13800d-012c-74ae-07c3-0000000af3f0) may still be active. Only one instance of D
      erby should boot a database at a time. Severe and non-recoverable corruption can
      result and may have already occurred.

      I see the same warning with 10.7.

      Possible workaround for eliminating the warning is to do a clean shutdown of the derby database before exiting the jvm which will remove the lck files, but it is hard to know if the user is getting actual dual boot protection under these circumstances.

      1. SimpleConnect.java
        1 kB
        Kathey Marsden
      2. ExLockFile.java
        6 kB
        Kathey Marsden
      3. derby-4917_10_2_debug_diff.txt
        8 kB
        Kathey Marsden

        Activity

        Kathey Marsden created issue -
        Hide
        Kathey Marsden added a comment -

        Here are a couple of small programs for diagnosing the problem that I asked the user to run.,

        The first program SimpleConnect, makes a connection to the database and checks the size of the lock file created.
        It should be run as
        java SimpleConnect <path to database>
        The System.out output should be captured and returned along with the derby.log.
        On my system the output is:
        $ java SimpleConnect /u/kmarsd/repro/lockwarn/wombat
        dbex.lck exists and is length:4
        Rename dbex.lck to dbex.lck.sav with f.renameTo
        Connection successfully madeorg.apache.derby.impl.jdbc.EmbedConnection30@1023687
        94 (XID = 40), (SESSIONID = 0), (DATABASE = /u/kmarsd/repro/lockwarn/wombat), (D
        RDAID = null)
        length of dbex.lck file:4

        I expect on the user machine the dbex.lck file will be of length 0 which will narrow the problem outside of the applicatoin to just Derby. If it shows length 4 then something in the application environment must be influencing the locking.

        Assuming the length shows 0 with SimpleConnect, the second program has just the lock file creation and locking code:

        run like
        java ExLockFile <path to db>

        The output on my system is:
        java ExLockFile /u/kmarsd/repro/lockwarn/wombat

        /u/kmarsd/repro/lockwarn/wombat exists. Attempt exclusive lock
        create file succeded. validExclusiveLock=true
        1)lockFileOpen = new RandomAccessFile((File) this, "rw")
        1) Complete lockFileOpen =java.io.RandomAccessFile@44ba44ba
        2) lockFileChannel = lockFileOpen.getChannel()
        2) Complete lockFileChannel = sun.nio.ch.FileChannelImpl@51ac51ac
        3) dbLock = LockFileChannel.tryLock()
        3) Complete dbLock = sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive val
        id]
        lockFileOpen.writeInt(EXCLUSIVE_FILE_LOCK)
        writeIntSuccessful
        lockFileChannel.force(true)
        lockChannel.force(true) successful
        status = EXCLUSIVE_FILE_LOCK
        Lock status is:1-EXCLUSIVE_FILE_LOCK
        f.length() = 4

        I am thinking maybe at the site we will see an IOException or some sort of other clue.

        Show
        Kathey Marsden added a comment - Here are a couple of small programs for diagnosing the problem that I asked the user to run., The first program SimpleConnect, makes a connection to the database and checks the size of the lock file created. It should be run as java SimpleConnect <path to database> The System.out output should be captured and returned along with the derby.log. On my system the output is: $ java SimpleConnect /u/kmarsd/repro/lockwarn/wombat dbex.lck exists and is length:4 Rename dbex.lck to dbex.lck.sav with f.renameTo Connection successfully madeorg.apache.derby.impl.jdbc.EmbedConnection30@1023687 94 (XID = 40), (SESSIONID = 0), (DATABASE = /u/kmarsd/repro/lockwarn/wombat), (D RDAID = null) length of dbex.lck file:4 I expect on the user machine the dbex.lck file will be of length 0 which will narrow the problem outside of the applicatoin to just Derby. If it shows length 4 then something in the application environment must be influencing the locking. Assuming the length shows 0 with SimpleConnect, the second program has just the lock file creation and locking code: run like java ExLockFile <path to db> The output on my system is: java ExLockFile /u/kmarsd/repro/lockwarn/wombat /u/kmarsd/repro/lockwarn/wombat exists. Attempt exclusive lock create file succeded. validExclusiveLock=true 1)lockFileOpen = new RandomAccessFile((File) this, "rw") 1) Complete lockFileOpen =java.io.RandomAccessFile@44ba44ba 2) lockFileChannel = lockFileOpen.getChannel() 2) Complete lockFileChannel = sun.nio.ch.FileChannelImpl@51ac51ac 3) dbLock = LockFileChannel.tryLock() 3) Complete dbLock = sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive val id] lockFileOpen.writeInt(EXCLUSIVE_FILE_LOCK) writeIntSuccessful lockFileChannel.force(true) lockChannel.force(true) successful status = EXCLUSIVE_FILE_LOCK Lock status is:1-EXCLUSIVE_FILE_LOCK f.length() = 4 I am thinking maybe at the site we will see an IOException or some sort of other clue.
        Kathey Marsden made changes -
        Field Original Value New Value
        Attachment SimpleConnect.java [ 12465013 ]
        Attachment ExLockFile.java [ 12465014 ]
        Hide
        Kathey Marsden added a comment -

        At the site using a user privileged to run command line programs, the test programs ran and created the lock file with 4 bytes as expected. They are not able to run the programs with the problematic id with the scheduled task so I am going to work to make a debug 10.2 build to try to help diagnose the problem.

        Show
        Kathey Marsden added a comment - At the site using a user privileged to run command line programs, the test programs ran and created the lock file with 4 bytes as expected. They are not able to run the programs with the problematic id with the scheduled task so I am going to work to make a debug 10.2 build to try to help diagnose the problem.
        Hide
        Kathey Marsden added a comment -

        This is the debug patch I sent to the user to help diagnose the problem. It prints a javacore on boot and adds lots of debug printout in DirFile4.getExclusiveLock and also throws an assertion if we for some reason get into DirFile.getExclusiveLock() instead of Dirfile4. It is not for commit!

        Show
        Kathey Marsden added a comment - This is the debug patch I sent to the user to help diagnose the problem. It prints a javacore on boot and adds lots of debug printout in DirFile4.getExclusiveLock and also throws an assertion if we for some reason get into DirFile.getExclusiveLock() instead of Dirfile4. It is not for commit!
        Kathey Marsden made changes -
        Attachment derby-4917_10_2_debug_diff.txt [ 12468678 ]
        Hide
        Mike Matrigali added a comment -

        triaged for 10.8.

        Discussed with kathy who reported this issue and has worked with only site that has seen this issue. Her suggestions:
        There are really two issues here. 1) Why might the lock be created at zero length. This one we don't understand so can't do anything about it.
        2) If there exists a zero length file for whatever reason (e.g.) was created with old JDK, we always produce this warning and don't do proper dual boot checking.
        So I think we should fix the second part of this issue, by removing DirFile and just having DirFile4 now that JDK 1.3.1 is no longer supported.

        Show
        Mike Matrigali added a comment - triaged for 10.8. Discussed with kathy who reported this issue and has worked with only site that has seen this issue. Her suggestions: There are really two issues here. 1) Why might the lock be created at zero length. This one we don't understand so can't do anything about it. 2) If there exists a zero length file for whatever reason (e.g.) was created with old JDK, we always produce this warning and don't do proper dual boot checking. So I think we should fix the second part of this issue, by removing DirFile and just having DirFile4 now that JDK 1.3.1 is no longer supported.
        Mike Matrigali made changes -
        Labels derby_triage10_8
        Urgency Normal
        Hide
        Kathey Marsden added a comment -

        Unchecking HVF. I think the cases at this point where there is a residual zero byte db.lck from JDK 1.3 are rare. It would be great to get rid of DirFile as a refactoring and code simplification effort but as a HVF bug this issue probably no longer qualifies.

        Show
        Kathey Marsden added a comment - Unchecking HVF. I think the cases at this point where there is a residual zero byte db.lck from JDK 1.3 are rare. It would be great to get rid of DirFile as a refactoring and code simplification effort but as a HVF bug this issue probably no longer qualifies.
        Kathey Marsden made changes -
        Issue & fix info High Value Fix,Workaround attached [ 10422, 10427 ] Workaround attached [ 10427 ]
        Gavin made changes -
        Workflow jira [ 12527763 ] Default workflow, editable Closed status [ 12802006 ]
        Hide
        Myrna van Lunteren added a comment -

        Rather than removing DirFile, with r1492111 for DERBY-6213, DirFile4 was removed. DirFile was modified at that time.
        Does that qualify for a fix for this issue? Or did we specifically 'keep' the problem?

        Show
        Myrna van Lunteren added a comment - Rather than removing DirFile, with r1492111 for DERBY-6213 , DirFile4 was removed. DirFile was modified at that time. Does that qualify for a fix for this issue? Or did we specifically 'keep' the problem?
        Hide
        Knut Anders Hatlen added a comment -

        I don't think r1492111 changed the behaviour (intentionally, anyway) so the problem is most likely still there.

        DirFile.getExclusiveFileLock() now has this comment (in code copied from DirFile4):

                /*
                ** There can be  a scenario where there is some other JVM that is before jkdk1.4
                ** had booted the system and jdk1.4 trying to boot it, in this case we will get the
                ** Exclusive Lock even though some other JVM has already booted the database. But
                ** the lock is not a reliable one , so we should  still throw the warning.
                ** The Way we identify this case is if "dbex.lck" file size  is differen
                ** for pre jdk1.4 jvms and jdk1.4 or above.
                ** Zero size "dbex.lck" file  is created by a jvm i.e before jdk1.4 and
                ** File created by jdk1.4 or above writes EXCLUSIVE_FILE_LOCK value into the file.
                ** If we are unable to acquire the lock means other JVM that
                ** currently booted the system is also JDK1.4 or above;
                ** In this case we could confidently throw a exception instead of
                ** of a warning.
                **/
        
        Show
        Knut Anders Hatlen added a comment - I don't think r1492111 changed the behaviour (intentionally, anyway) so the problem is most likely still there. DirFile.getExclusiveFileLock() now has this comment (in code copied from DirFile4): /* ** There can be a scenario where there is some other JVM that is before jkdk1.4 ** had booted the system and jdk1.4 trying to boot it, in this case we will get the ** Exclusive Lock even though some other JVM has already booted the database. But ** the lock is not a reliable one , so we should still throw the warning. ** The Way we identify this case is if "dbex.lck" file size is differen ** for pre jdk1.4 jvms and jdk1.4 or above. ** Zero size "dbex.lck" file is created by a jvm i.e before jdk1.4 and ** File created by jdk1.4 or above writes EXCLUSIVE_FILE_LOCK value into the file. ** If we are unable to acquire the lock means other JVM that ** currently booted the system is also JDK1.4 or above; ** In this case we could confidently throw a exception instead of ** of a warning. **/

          People

          • Assignee:
            Unassigned
            Reporter:
            Kathey Marsden
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development