Derby
  1. Derby
  2. DERBY-700

Derby does not prevent dual boot of database from different classloaders on Linux and Mac OS X

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 10.1.3.3, 10.2.2.1, 10.3.3.1, 10.4.2.1, 10.5.3.1, 10.6.1.0
    • Component/s: Store
    • Labels:
      None
    • Environment:
    • Urgency:
      Urgent
    • Issue & fix info:
      High Value Fix
    • Bug behavior facts:
      Data corruption

      Description

      Derby does not prevent dual boot from two different classloaders on Linux and Mac OS X.

      To reproduce run the program DualBootRepro with no derby jars in your classpath. The program assumes derby.jar is in 10.1.2.1/derby.jar, you can change the location by changing the DERBY_LIB_DIR variable.

      On Linux the output is:

      $java -cp . DualBootRepro
      Loading derby from file:10.1.2.1/derby.jar
      10.1.2.1/derby.jar
      Booted database in loader java.net.URLClassLoader@8ed465
      FAIL: Booted database in 2nd loader java.net.URLClassLoader@dc6a77

      On Windows I get the expected output.
      $ java -cp . DualBootRepro
      Loading derby from file:10.1.2.1/derby.jar
      10.1.2.1/derby.jar
      Booted database in loader java.net.URLClassLoader@1ac04e8
      PASS: Expected exception for dualboot:Another instance of Derby may have already booted the database D:\marsden\repro\dualboot\mydb.

      1. releaseNote.html
        3 kB
        Kathey Marsden
      2. releaseNote.html
        3 kB
        Rick Hillegas
      3. releaseNote.html
        4 kB
        Kathey Marsden
      4. releaseNote.html
        4 kB
        Myrna van Lunteren
      5. DualBootRepro2.zip
        2 kB
        Kathey Marsden
      6. DualBootRepro.java
        2 kB
        Kathey Marsden
      7. DualBootRepro.java
        6 kB
        Rick Hillegas
      8. DualBootRepro_mutltithreaded.tar.bz2
        2 kB
        V.Narayanan
      9. derby-700-02-aa-testCleanup.diff
        8 kB
        Rick Hillegas
      10. derby-700-01-ab-catchOverlappingFileLockException.diff
        16 kB
        Rick Hillegas
      11. derby-700-01-aa-catchOverlappingFileLockException.diff
        2 kB
        Rick Hillegas
      12. DERBY-700.stat
        0.3 kB
        V.Narayanan
      13. DERBY-700.diff
        5 kB
        V.Narayanan
      14. derby-700_with_NPE_fix_stat.txt
        1.0 kB
        Kathey Marsden
      15. derby-700_with_NPE_fix_diff.txt
        38 kB
        Kathey Marsden
      16. DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.stat
        0.3 kB
        V.Narayanan
      17. DERBY-700_v1_use_to_run_DualBootrepro_multithreaded.diff
        6 kB
        V.Narayanan
      18. derby-700_stat.txt
        0.9 kB
        Kathey Marsden
      19. derby700_singleproperty_v1.stat
        0.7 kB
        Suresh Thalamati
      20. derby700_singleproperty_v1.diff
        34 kB
        Suresh Thalamati
      21. derby-700_diff.txt
        37 kB
        Kathey Marsden
      22. derby-700_06_19_2007
        71 kB
        Kathey Marsden
      23. derby-700_06_07_07_stat.txt
        0.9 kB
        Kathey Marsden
      24. derby-700_06_07_07_diff.txt
        76 kB
        Kathey Marsden
      25. derby.log
        12 kB
        Kathey Marsden

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          DualBootRepro Program to reproduce dual boot from two classloaders in Linux.

          Change DERBY_LIB_DIR to derby directory and run without derby jars in our classpath.
          java -cp . DualBootRepro

          Show
          Kathey Marsden added a comment - DualBootRepro Program to reproduce dual boot from two classloaders in Linux. Change DERBY_LIB_DIR to derby directory and run without derby jars in our classpath. java -cp . DualBootRepro
          Hide
          Kathey Marsden added a comment -

          Sysinfo output.

          java -cp ./10.1.2.1/derby.jar org.apache.derby.tools.sysinfo
          ------------------ Java Information ------------------
          Java Version: 1.4.2_08
          Java Vendor: Sun Microsystems Inc.
          Java home: /usr/lib/SunJava2-1.4.2/jre
          Java classpath: ./10.1.2.1/derby.jar
          OS name: Linux
          OS architecture: i386
          OS version: 2.6.5-7.201-smp
          ...
          java.specification.name: Java Platform API Specification
          java.specification.version: 1.4
          --------- Derby Information --------
          JRE - JDBC: J2SE 1.4.2 - JDBC 3.0
          [/.../derby.jar] 10.1.2.1 - (330608)
          ------------------------------------------------------
          ----------------- Locale Information -----------------
          ------------------------------------------------------

          Show
          Kathey Marsden added a comment - Sysinfo output. java -cp ./10.1.2.1/derby.jar org.apache.derby.tools.sysinfo ------------------ Java Information ------------------ Java Version: 1.4.2_08 Java Vendor: Sun Microsystems Inc. Java home: /usr/lib/SunJava2-1.4.2/jre Java classpath: ./10.1.2.1/derby.jar OS name: Linux OS architecture: i386 OS version: 2.6.5-7.201-smp ... java.specification.name: Java Platform API Specification java.specification.version: 1.4 --------- Derby Information -------- JRE - JDBC: J2SE 1.4.2 - JDBC 3.0 [/.../derby.jar] 10.1.2.1 - (330608) ------------------------------------------------------ ----------------- Locale Information ----------------- ------------------------------------------------------
          Hide
          Suresh Thalamati added a comment -

          My guess is , currently Derby seems to catch the IO exception and turn it into a readonly db , when file lock fails with OverlappingFileLockException. By correctly catching java.nio.channels.OverlappingFileLockException, we might be able to prevent multi boots with in the same jvm using different class loaders.

          Show
          Suresh Thalamati added a comment - My guess is , currently Derby seems to catch the IO exception and turn it into a readonly db , when file lock fails with OverlappingFileLockException. By correctly catching java.nio.channels.OverlappingFileLockException, we might be able to prevent multi boots with in the same jvm using different class loaders.
          Hide
          Suresh Thalamati added a comment -

          Hi Kathy ,

          I am attempting to fix this bug, just realized I may not be able to add the repro in jira into the functional tests., repro is expecting derby classes should not be in there in the classpath, I was wondering by chance did u write a custom class loader test for Derby ? I could not find one in the functional tests.

          Thanks
          -suresh

          Show
          Suresh Thalamati added a comment - Hi Kathy , I am attempting to fix this bug, just realized I may not be able to add the repro in jira into the functional tests., repro is expecting derby classes should not be in there in the classpath, I was wondering by chance did u write a custom class loader test for Derby ? I could not find one in the functional tests. Thanks -suresh
          Hide
          Kathey Marsden added a comment -

          Here is an updated repro with a classloader that should not require the classpath remove other derby jars

          Show
          Kathey Marsden added a comment - Here is an updated repro with a classloader that should not require the classpath remove other derby jars
          Hide
          Suresh Thalamati added a comment -

          After doing bit of testing realized checking for OverlappingFileLockException does not work. This exception is reliable only if the same file channel instance is used., With multiple class loaders
          I don't know how it is possible to have a sigle file channel object.

          On Linux , one can acquire the lock on the same region using diferent file channels on the same file.

          Realated info from the java docs:
          java.nio.channel.FileLock
          ,On some systems, closing a channel releases all locks held by the Java virtual machine on the underlying file regardless of whether the locks were acquired via that channel or via another channel open on the same file. It is strongly recommended that, within a program, a unique channel be used to acquire all locks on any given file.

          Show
          Suresh Thalamati added a comment - After doing bit of testing realized checking for OverlappingFileLockException does not work. This exception is reliable only if the same file channel instance is used., With multiple class loaders I don't know how it is possible to have a sigle file channel object. On Linux , one can acquire the lock on the same region using diferent file channels on the same file. Realated info from the java docs: java.nio.channel.FileLock ,On some systems, closing a channel releases all locks held by the Java virtual machine on the underlying file regardless of whether the locks were acquired via that channel or via another channel open on the same file. It is strongly recommended that, within a program, a unique channel be used to acquire all locks on any given file.
          Hide
          Mike Matrigali added a comment -

          Database corruption can result anytime 2 different systems boot the same Derby database. DB corruption can happen
          even if the booting systems perform no updates if the there is recovery to do during boot, in that case 2 separate recovery systems will be opertating in an uncoordinated manner on the same log file, and applying uncoordinated changes to the data files. Without multiple class loaders or with multiple JVM's derby uses file locking to prevent this.
          It looks like file locking from the same jvm and 2 class loaders does not work, does anyone have any idea how to
          implement a locking scheme that will work from the same jvm , with 2 class loaders.

          Show
          Mike Matrigali added a comment - Database corruption can result anytime 2 different systems boot the same Derby database. DB corruption can happen even if the booting systems perform no updates if the there is recovery to do during boot, in that case 2 separate recovery systems will be opertating in an uncoordinated manner on the same log file, and applying uncoordinated changes to the data files. Without multiple class loaders or with multiple JVM's derby uses file locking to prevent this. It looks like file locking from the same jvm and 2 class loaders does not work, does anyone have any idea how to implement a locking scheme that will work from the same jvm , with 2 class loaders.
          Hide
          Kathey Marsden added a comment -

          Marking this issue Critical as it is a corruption issue that has in the past and surely will in the future affect users and cause unrecoverable corruption in production environments. The corruption is triggered by a usage error, but unrecoverable database corruption is a high price to pay for deciding to connect with IJ while your APP Server, Eclipse or other environment has loaded Derby with a classloader.

          Also marking this issue as 10.2 as a high value fix. I think there may be technical issues that make it difficult to address in the 10.2 time frame, so if someone with expertise in the area thinks it is not possible to address in that timeframe, please move it to an appropriate release, maybe 10.2.2. It is not a regression so as serious as it is I don't see it as an absolute showstopper for the release.

          Show
          Kathey Marsden added a comment - Marking this issue Critical as it is a corruption issue that has in the past and surely will in the future affect users and cause unrecoverable corruption in production environments. The corruption is triggered by a usage error, but unrecoverable database corruption is a high price to pay for deciding to connect with IJ while your APP Server, Eclipse or other environment has loaded Derby with a classloader. Also marking this issue as 10.2 as a high value fix. I think there may be technical issues that make it difficult to address in the 10.2 time frame, so if someone with expertise in the area thinks it is not possible to address in that timeframe, please move it to an appropriate release, maybe 10.2.2. It is not a regression so as serious as it is I don't see it as an absolute showstopper for the release.
          Hide
          Rick Hillegas added a comment -

          Bumping urgency.

          Show
          Rick Hillegas added a comment - Bumping urgency.
          Hide
          V.Narayanan added a comment -

          Hi,

          We could solve this problem by setting a System property before booting a database and checking for the property during subsequent boots. When the database is shutdown we set the property to false. For example when we boot a database named mydb then we set the property derby.dblock.mydb = true. Now during subsequent boots we could check for this system variable and if it is set to true throw an exception. During the shutdown of the database we set this variable to false. I tried an attempt along this line in the attached patch.

          I HAVE NOT run the patch with security manager enabled. The sample repro attached with this issue passes with this fix.

          Pls note that the patch is not for a commit but is just to represent what I have in mind as a solution, in the form of code.

          thanx
          Narayanan

          Show
          V.Narayanan added a comment - Hi, We could solve this problem by setting a System property before booting a database and checking for the property during subsequent boots. When the database is shutdown we set the property to false. For example when we boot a database named mydb then we set the property derby.dblock.mydb = true. Now during subsequent boots we could check for this system variable and if it is set to true throw an exception. During the shutdown of the database we set this variable to false. I tried an attempt along this line in the attached patch. I HAVE NOT run the patch with security manager enabled. The sample repro attached with this issue passes with this fix. Pls note that the patch is not for a commit but is just to represent what I have in mind as a solution, in the form of code. thanx Narayanan
          Hide
          Daniel John Debrunner added a comment -

          Using system properties will require that every user running with the security manager grant a new permission in their policy file to allow setting these system properties. I thought an earlier discussion on the list had recommended not to use system properties this way.

          One issue with system properties is that how do atomically set and get the property, I think that is needed to perform locking?
          In your current patch, there is a window between where you get and set the property where another thread in a anoter class loader could
          come in and "lock" the database, resulting in two active derby instances connecting to the same database.

          I also don't understand why on getting the system property you are catching NullPointerException and IllegalArgumentException, how would these
          be thrown by System.getProperty()?

          Show
          Daniel John Debrunner added a comment - Using system properties will require that every user running with the security manager grant a new permission in their policy file to allow setting these system properties. I thought an earlier discussion on the list had recommended not to use system properties this way. One issue with system properties is that how do atomically set and get the property, I think that is needed to perform locking? In your current patch, there is a window between where you get and set the property where another thread in a anoter class loader could come in and "lock" the database, resulting in two active derby instances connecting to the same database. I also don't understand why on getting the system property you are catching NullPointerException and IllegalArgumentException, how would these be thrown by System.getProperty()?
          Hide
          Daniel John Debrunner added a comment -

          LUCENE-305 and LUCENE-635 may be worth looking at, lucene is re-factoring their locking (similar requirements to Derby) and maybe they have a solution for the intra-jvm lock. Otherwise has anyone done a google search on the issue?

          Show
          Daniel John Debrunner added a comment - LUCENE-305 and LUCENE-635 may be worth looking at, lucene is re-factoring their locking (similar requirements to Derby) and maybe they have a solution for the intra-jvm lock. Otherwise has anyone done a google search on the issue?
          Hide
          V.Narayanan added a comment -

          Hi,

          THESE ATTACHMENTS ARE JUST AN ATTEMPT IN THE SIMULATION OF THE COMMENTS ABOVE. THEY ARE NOT FOR A COMMIT. PLS FOLLOW BELOW COMMENTS FOR DETAILS.

          Thanx for going through the patch and giving comments

          >Using system properties will require that every user running with the security
          manager grant a new permission in their policy file to allow setting these system
          properties.

          I Agree, My apologies for not having gone through the earlier thread on System
          Properties

          > One issue with system properties is that how do atomically set and get the
          property, I think that is needed to perform locking? In your current patch, there is a
          window between where you get and set the property where another thread in a
          anoter class loader could come in and "lock" the database, resulting in two active
          derby instances connecting to the same database.

          Again a very valid point. I wanted to try to simulate this. So converted the whole
          repro into a crude multi -threaded application. But found found it difficult to
          simulate it on my machine since one of the threads always managed to set this
          property before the other. So modified my fix by creating a sort of sync point after
          the getProperties. Just to show that two seperate threads could actually reach
          there.

          This could actually occur without this modification

          1) When one of the threads gets preempted after doing a getProperties and then
          the other does a getProperties. And then they proceed to completion.

          Also this might just be one of the cases when this could have occurred.

          > I also don't understand why on getting the system property you are catching
          NullPointerException and IllegalArgumentException, how would these be thrown
          by System.getProperty()?

          NullPointerException - System.getProperty(null);

          IllegalArgumentException - System.getProperty("");

          I found it interesting to simulate the condition where the above fix might fail.
          Thought I could share the same with the community. So attaching the relevant files
          I used for generating this condition.

          thanx once again for the patient comments
          Narayanan

          Show
          V.Narayanan added a comment - Hi, THESE ATTACHMENTS ARE JUST AN ATTEMPT IN THE SIMULATION OF THE COMMENTS ABOVE. THEY ARE NOT FOR A COMMIT. PLS FOLLOW BELOW COMMENTS FOR DETAILS. Thanx for going through the patch and giving comments >Using system properties will require that every user running with the security manager grant a new permission in their policy file to allow setting these system properties. I Agree, My apologies for not having gone through the earlier thread on System Properties > One issue with system properties is that how do atomically set and get the property, I think that is needed to perform locking? In your current patch, there is a window between where you get and set the property where another thread in a anoter class loader could come in and "lock" the database, resulting in two active derby instances connecting to the same database. Again a very valid point. I wanted to try to simulate this. So converted the whole repro into a crude multi -threaded application. But found found it difficult to simulate it on my machine since one of the threads always managed to set this property before the other. So modified my fix by creating a sort of sync point after the getProperties. Just to show that two seperate threads could actually reach there. This could actually occur without this modification 1) When one of the threads gets preempted after doing a getProperties and then the other does a getProperties. And then they proceed to completion. Also this might just be one of the cases when this could have occurred. > I also don't understand why on getting the system property you are catching NullPointerException and IllegalArgumentException, how would these be thrown by System.getProperty()? NullPointerException - System.getProperty(null); IllegalArgumentException - System.getProperty(""); I found it interesting to simulate the condition where the above fix might fail. Thought I could share the same with the community. So attaching the relevant files I used for generating this condition. thanx once again for the patient comments Narayanan
          Hide
          Rick Hillegas added a comment -

          Where can Derby store VM-global state? It has been pointed out that System properties are an awkward kind of VM-global state for the following reasons:

          1) SecurityManager problems
          2) Race conditions

          What other VM-global state can Derby access? One Derby object that's global across the VM is the Derby driver registered with the DriverManager.

          1) Could we use the registered Derby driver to anchor VM-global state?
          2) Are there other VM-global Derby objects which we could use?

          Show
          Rick Hillegas added a comment - Where can Derby store VM-global state? It has been pointed out that System properties are an awkward kind of VM-global state for the following reasons: 1) SecurityManager problems 2) Race conditions What other VM-global state can Derby access? One Derby object that's global across the VM is the Derby driver registered with the DriverManager. 1) Could we use the registered Derby driver to anchor VM-global state? 2) Are there other VM-global Derby objects which we could use?
          Hide
          Rick Hillegas added a comment -

          It appears that fixing this bug will require us to agree on some mechanism for caching VM-global Derby state. This seems to be an architectural decision which requires careful thought and experiment. I think we should defer this to a future release. I am moving this to 10.3 because thatt's the next release available in the dropdown. I agree with Kathey that this is a good candidate for a bug fix release in the 10.2 lineage.

          Show
          Rick Hillegas added a comment - It appears that fixing this bug will require us to agree on some mechanism for caching VM-global Derby state. This seems to be an architectural decision which requires careful thought and experiment. I think we should defer this to a future release. I am moving this to 10.3 because thatt's the next release available in the dropdown. I agree with Kathey that this is a good candidate for a bug fix release in the 10.2 lineage.
          Hide
          Rick Hillegas added a comment -

          Thanks for creating the new 10.2.2.0 option, Andrew! I've reassigned this issue to that release.

          Show
          Rick Hillegas added a comment - Thanks for creating the new 10.2.2.0 option, Andrew! I've reassigned this issue to that release.
          Hide
          V.Narayanan added a comment -

          Hi,
          I am not working actively on this issue. Hence unassigning myself from this issue and moving on to other issues I am working on.

          thanx
          Narayanan

          Show
          V.Narayanan added a comment - Hi, I am not working actively on this issue. Hence unassigning myself from this issue and moving on to other issues I am working on. thanx Narayanan
          Hide
          Rick Hillegas added a comment -

          Move to 10.2.3.0.

          Show
          Rick Hillegas added a comment - Move to 10.2.3.0.
          Hide
          Andrew McIntyre added a comment -

          Unsetting Fix Version for unassigned issues.

          Show
          Andrew McIntyre added a comment - Unsetting Fix Version for unassigned issues.
          Hide
          Suresh Thalamati added a comment -

          I took a quick looks at fixes for LUCENE-305 and LUCENE-635 , which are fixed now. I did not see any code that is attempting to solve locking within a jvm across class loaders boundaries. May be I am missing something !

          Show
          Suresh Thalamati added a comment - I took a quick looks at fixes for LUCENE-305 and LUCENE-635 , which are fixed now. I did not see any code that is attempting to solve locking within a jvm across class loaders boundaries. May be I am missing something !
          Hide
          Suresh Thalamati added a comment -

          There does not seem to be any way to get process id info that can be used to fix this problem. Existing JVM enhancement requests filed to get process id info:

          http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4250622
          http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=7c75c6fc92c662c739fb11f620a4:YfiG?bug_id=4244896

          It was interesting to reade these reports, other developers are finding it hard to generate unique id across jvms!

          Show
          Suresh Thalamati added a comment - There does not seem to be any way to get process id info that can be used to fix this problem. Existing JVM enhancement requests filed to get process id info: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4250622 http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=7c75c6fc92c662c739fb11f620a4:YfiG?bug_id=4244896 It was interesting to reade these reports, other developers are finding it hard to generate unique id across jvms!
          Hide
          Mike Matrigali added a comment -

          Here is a description of a possible solution - any feedback is appreciated:

          I would like to see some solution to DERBY-700 go into the 10.3 release, as it is too easy currently for users to access their same database from 2 different classloaders in the same JVM and corrupt their database.

          Let me try to explain the problem again. Derby protects multiple thread
          access to databases using in memory locking. Any situation where 2 instances of Derby can access the same on disk database but not coordinate through the same in-memory locking scheme creates a situation
          that can lead to log/data corruption.

          Derby coordinates access to an ondisk database using a 2 level per db
          locking scheme. The following is high level and probably misses some
          detail.

          1) first it obtains what I refer to as an OS file lock, this is not a java defined mechanism. It depends on an OS specific behavior that it
          is not possible to delete a file on some OS's if the file is still open
          (the most common OS where this is true are I believe all window's
          OS's 98, NT, 2000, xp, ...).
          A) If lock file does not exist we open it and leave it open --> lock granted
          B) If lock file does exist we try to delete it, if we can then
          we go back to A. If we can't we assume file is locked and
          give up.

          2) The second level is that we use java base file locking. This only
          became available in 1.4.2 and later jvm's. This does standard locking and automatically takes care of releasing lock if JVM goes
          away.

          We kept both steps, even though it seems that step 2 is sufficient to
          provide backward compatible protection. So anyone running on a JVM
          prior to 1.4.2 on a windows platform would still be protected. The
          pre-derby code base also had to worry about protecting access against
          versions of the code that did not have step 2 implemented.

          The problem is that in cases where derby is run in 2 class loaders in
          the same jvm the step 2 file locking does not work. It is meant only
          to protect threads in different JVMS, so it provides no help in preventing access from 2 different class loaders in the same JVM. It
          turns out that step 1 on windows system solves the locking problem
          on windows systems for multiple class loaders also.

          So a solution should provide 2 things:
          1) prevent access from another classloader in the same JVM
          2) not allow false positives. So for instance a standard "lock file" could be used on unix systems, creating it and when one boots check
          for existence of the lock file and give up if it exists. The problem
          is that it is very easy to cause a JVM to exit without properly cleaning
          the lock file and thus one would get into situation where user may have
          to clean lock files by hand.

          Here is my current proposal, but I really don't like it – I am hoping someone out there can come up with something better.

          o keep the step 1 and step 2 locking as described above. It solves cross JVM locking completely.
          o create a step 3 locking step, the only purpose is to recognize situations step 1 and 2 can't.
          o use simple file system lock file to implement step 3 locking:
          A) on db boot if no lock file exists, create it and put a timestamp in the file. This db boot is responsible for updating that timestamp
          every N seconds, for this we need to come up with a guaranteed executing background thread - there are known problems with current background thread, and long checkpoints for instance. On shutdown of db we delete lock file.
          B) on db boot if lock file exists we open it, and get the timestamp and compare it with the current timestamp. If the difference is greater
          than N we assume the lock file has been left around incorrectly and we
          delete the file and go to A (or just open it and update it with our
          current timestamp). It probably is worth logging this event in derby.log.

          One nice thing about the above solution is that I think it also solves
          our problem with muliple machines accessing the same disk (as long as
          their timestamps are the same or close). I think we can pick a large N
          as this should be an error case (ie. the purpose of N is to catch the
          case where a classloader went away without allowing us to clean up - I don't know classloader stuff to know how likely this is), but it is probably worth making it
          configurable so we could adjust in the field if necessary.

          I really don't like forcing Derby to run a job every N seconds. It could be hard to explain to users why derby is doing work every N
          seconds even when nothing else is happening. I worked on a different
          product a long time ago that required us to maintain our own timer for use in scheduling waits and users noticed and complained such that we
          had to add a backoff mechanism based on the amount of work being done
          just so the process would not show up at a steady 1% (or whatever) on
          there process status monitors. Now that timer was a lot shorter than
          seconds so it may not be an issue.

          Some extensions I considered:
          1) come up with a unique ID that is specific to a JVM and can be queried by any thread in any classloader in the JVM. If we had that then we could write that value into the step 3 lock file and we know that if we
          opened the file and saw a different ID, the lock file was invalid. If
          we did this then we narrow the chance of a false positive but we eliminate the chance to catch cross machine access.

          2) Have the opener db log some unique id specific to it in addition to the timestamp, maybe this is just the first timestamp of the open. Then it could at least tell the next time if another
          class loader had incorrectly started, and throw/log some error so we
          could catch this problem.

          3) maybe the value of N should be logged in the file, I think it has to
          be if we allow it to be configurable.

          Show
          Mike Matrigali added a comment - Here is a description of a possible solution - any feedback is appreciated: I would like to see some solution to DERBY-700 go into the 10.3 release, as it is too easy currently for users to access their same database from 2 different classloaders in the same JVM and corrupt their database. Let me try to explain the problem again. Derby protects multiple thread access to databases using in memory locking. Any situation where 2 instances of Derby can access the same on disk database but not coordinate through the same in-memory locking scheme creates a situation that can lead to log/data corruption. Derby coordinates access to an ondisk database using a 2 level per db locking scheme. The following is high level and probably misses some detail. 1) first it obtains what I refer to as an OS file lock, this is not a java defined mechanism. It depends on an OS specific behavior that it is not possible to delete a file on some OS's if the file is still open (the most common OS where this is true are I believe all window's OS's 98, NT, 2000, xp, ...). A) If lock file does not exist we open it and leave it open --> lock granted B) If lock file does exist we try to delete it, if we can then we go back to A. If we can't we assume file is locked and give up. 2) The second level is that we use java base file locking. This only became available in 1.4.2 and later jvm's. This does standard locking and automatically takes care of releasing lock if JVM goes away. We kept both steps, even though it seems that step 2 is sufficient to provide backward compatible protection. So anyone running on a JVM prior to 1.4.2 on a windows platform would still be protected. The pre-derby code base also had to worry about protecting access against versions of the code that did not have step 2 implemented. The problem is that in cases where derby is run in 2 class loaders in the same jvm the step 2 file locking does not work. It is meant only to protect threads in different JVMS, so it provides no help in preventing access from 2 different class loaders in the same JVM. It turns out that step 1 on windows system solves the locking problem on windows systems for multiple class loaders also. So a solution should provide 2 things: 1) prevent access from another classloader in the same JVM 2) not allow false positives. So for instance a standard "lock file" could be used on unix systems, creating it and when one boots check for existence of the lock file and give up if it exists. The problem is that it is very easy to cause a JVM to exit without properly cleaning the lock file and thus one would get into situation where user may have to clean lock files by hand. Here is my current proposal, but I really don't like it – I am hoping someone out there can come up with something better. o keep the step 1 and step 2 locking as described above. It solves cross JVM locking completely. o create a step 3 locking step, the only purpose is to recognize situations step 1 and 2 can't. o use simple file system lock file to implement step 3 locking: A) on db boot if no lock file exists, create it and put a timestamp in the file. This db boot is responsible for updating that timestamp every N seconds, for this we need to come up with a guaranteed executing background thread - there are known problems with current background thread, and long checkpoints for instance. On shutdown of db we delete lock file. B) on db boot if lock file exists we open it, and get the timestamp and compare it with the current timestamp. If the difference is greater than N we assume the lock file has been left around incorrectly and we delete the file and go to A (or just open it and update it with our current timestamp). It probably is worth logging this event in derby.log. One nice thing about the above solution is that I think it also solves our problem with muliple machines accessing the same disk (as long as their timestamps are the same or close). I think we can pick a large N as this should be an error case (ie. the purpose of N is to catch the case where a classloader went away without allowing us to clean up - I don't know classloader stuff to know how likely this is), but it is probably worth making it configurable so we could adjust in the field if necessary. I really don't like forcing Derby to run a job every N seconds. It could be hard to explain to users why derby is doing work every N seconds even when nothing else is happening. I worked on a different product a long time ago that required us to maintain our own timer for use in scheduling waits and users noticed and complained such that we had to add a backoff mechanism based on the amount of work being done just so the process would not show up at a steady 1% (or whatever) on there process status monitors. Now that timer was a lot shorter than seconds so it may not be an issue. Some extensions I considered: 1) come up with a unique ID that is specific to a JVM and can be queried by any thread in any classloader in the JVM. If we had that then we could write that value into the step 3 lock file and we know that if we opened the file and saw a different ID, the lock file was invalid. If we did this then we narrow the chance of a false positive but we eliminate the chance to catch cross machine access. 2) Have the opener db log some unique id specific to it in addition to the timestamp, maybe this is just the first timestamp of the open. Then it could at least tell the next time if another class loader had incorrectly started, and throw/log some error so we could catch this problem. 3) maybe the value of N should be logged in the file, I think it has to be if we allow it to be configurable.
          Hide
          Suresh Thalamati added a comment -

          While reading comments for this issue yet again, noticed Rick mentioned long
          time ago , it might be possible to make Derby jdbc driver hold a state that
          is global to jvm, not specific to a class loader. Is that how it really works
          even if the user loads the driver using class loaders ?

          Basically , is it possible to make org.apache.derby.jdbc.EmbeddedDriver.java
          statically initialize an JVMID(a UUID) that can be accessed from any class loader ?

          Show
          Suresh Thalamati added a comment - While reading comments for this issue yet again, noticed Rick mentioned long time ago , it might be possible to make Derby jdbc driver hold a state that is global to jvm, not specific to a class loader. Is that how it really works even if the user loads the driver using class loaders ? Basically , is it possible to make org.apache.derby.jdbc.EmbeddedDriver.java statically initialize an JVMID(a UUID) that can be accessed from any class loader ?
          Hide
          Mike Matrigali added a comment -

          With respect to a global state in the jdbc driver when 2 class loaders are involved, how would that work with 2 different class loaders trying to run 2 different versions of derby - say 10.3 and 10.4.

          Show
          Mike Matrigali added a comment - With respect to a global state in the jdbc driver when 2 class loaders are involved, how would that work with 2 different class loaders trying to run 2 different versions of derby - say 10.3 and 10.4.
          Hide
          Daniel John Debrunner added a comment -

          if you alluding to this comment:

          rick> One Derby object that's global across the VM is the Derby driver registered with the DriverManager.

          then I think that is false. Derby will have multiple drivers registered, one per classloader that booted derby.
          (see comments for DERBY-1228 or java.sql.DriverManager documentation)

          Show
          Daniel John Debrunner added a comment - if you alluding to this comment: rick> One Derby object that's global across the VM is the Derby driver registered with the DriverManager. then I think that is false. Derby will have multiple drivers registered, one per classloader that booted derby. (see comments for DERBY-1228 or java.sql.DriverManager documentation)
          Hide
          Suresh Thalamati added a comment -

          Thanks for confirming , Dan. I was referring to the commnet you noted. Looks like there is NO way to maintain an in-memory state that is global to JVM across class loader , other than using a system property.

          Show
          Suresh Thalamati added a comment - Thanks for confirming , Dan. I was referring to the commnet you noted. Looks like there is NO way to maintain an in-memory state that is global to JVM across class loader , other than using a system property.
          Hide
          Suresh Thalamati added a comment -

          Thanks a lot for summarizing the problems and possible solutions
          for this issue, Mike. I think the timer base solution you mentioned
          might work, but I am not comfortable with a timer based solution. As you
          mentioned, users might complain about the background writes, and also
          I think configuring N to the right value to differentiate false
          negatives/positives boots going to be hard. It will depend on the load
          and the machine configuration (no of cpus ) ..etc.

          I was trying to find alternative solutions, without much success. Only
          solution I could come up with involves using a system property. I
          understand, earlier we discussed using the system properties and it was
          decided as not a such a good idea. But considering there are NO other
          better solutions found for this problem, so far. I was think having one
          property to maintain a JVMID may not be so bad, user just need to give
          security permission to set one property, i.e if what I
          describe below actually works!

          I would really appreciate any suggestions/feedback for this solution .

          My understanding is a solution to this problem need to solve primarily
          following three issues:

          1) Maintaining a state that a database is already booted, if the database
          if booted successfully.
          2) Change the state to NOT_BOOTED, if it is not booted any more because of a
          a) Shutdown of the database
          b) Class loader that booted the db is garbage collected.
          c) JVM exited.

          3) synchronization across class loaders.

          Pseudo code below that attempts to solve this problems by making the
          following Assumptions :

          1) It is ok to use ONE system property "derby.storage.jvmid" to identify
          a jvm instance id.
          2) It is ok to use interned strings to synchronize across class loader.
          3) It is ok to call getCanonicalPath(), i think this may require permission
          for "user.dir" property if it is not already required. Other solution
          may be to assign an ID string on create of the DB and user that for
          DB level synchronization.
          4) It is ok to rely on the class finalizer to cleanup db lock state,
          when the database is NOT any more because the loader that booted
          the database is garbage collected.

          /*
          Pseudo code to lock the DB to prevent multiple instance of a database running
          concurrently through class loaders in a single instance of jvm or
          multiple instance of jvm.

          Note: Following code class is in a separate class just to understand it
          as separate issue , this code should probably go into the
          dataFactory class where current db-locking is done.
          */
          Class DbLock {

          private static final String DERYB_JVM_ID = "derby.storage.jvmid";
          private String dbCannonicalPath; // canonical of the db being booted.
          private FileLock fileLock = null;
          private boolean dbLocked = false;

          DbLock (String dbCannonicalPath)

          { this.dbCannonicalPath = dbCannonicalPath; }

          /*

          • get a unique JVM ID
            */
            private getJvmId () {
            // synchronize across class loaders.
            synchronize(DERYB_JVM_ID) {

          jvmid = System.getProperty(DERYB_JVM_ID);
          // if jvm id is not already exist, generate one
          // and save it into the "derby.storage.jvmid" system
          // property.
          if (jvmid == null)

          { //generate a new UUID based on the time and IP ..etc. jvmid = generateJvmId() System.setProperty("derby.storage.jvmid"); }

          }
          }

          /*

          • Lock the db, so that other class loader or
          • another jvm won't be able to boot the same database.
            */
            public lock_db_onboot(String dbCannonicalPath) {

          // Get a file Lock on boot() ; // this already works
          fileLock = getFileLock("dbex.lck");
          if (lock == null)

          { // if we don't get lock means , some other jvm already // booted it throws ALREADY_BOOTED error. throw ALREADY_BOOTED; }

          else {

          // file lock can be acquired even if the database is already
          // booted by a different class loader. Check if another class
          // loader has booted the DB. This is done by checking the
          // JVMID written in the dbex.lck file. If the JVMID is same
          // as what is stored in the system property,
          // then database is already booted , throw the error.
          currentJvmId = getJvmId();
          synchronize(dbCannonicalPath) {
          onDisk_JVM_ID = readIdFromDisk() ; // read ID from the dbex.lck file.
          if (OnDisk_JVM_ID == current_jvm_id )
          throw ("DATABASE IS ALREADY BOOTED");
          else

          { dbLocked = true; writeId(currentJvmId); //update the dbex.lck file) . }

          }
          }
          }

          /*

          • Called on shutdown/garbage collection.
            */
            unlock_db() {
            if (dbLocked)
            Unknown macro: { Strinng Ondisk_jvm_id = "-1"; //jvm id should never have been a -1. synchronize(dbCannonicalPath) { writeIdToDisk(Ondisk_jvm_id); //update the dbex.lck file) . } releaseFileLock(fileLock); dbLocked = false; }

            }

          /*

          • if the db is not shutdown, this method should release
          • the db lock related resources during this class finalization.
            */
            protected void finalize() throws Throwable { unlock_db(); }

            }

          Show
          Suresh Thalamati added a comment - Thanks a lot for summarizing the problems and possible solutions for this issue, Mike. I think the timer base solution you mentioned might work, but I am not comfortable with a timer based solution. As you mentioned, users might complain about the background writes, and also I think configuring N to the right value to differentiate false negatives/positives boots going to be hard. It will depend on the load and the machine configuration (no of cpus ) ..etc. I was trying to find alternative solutions, without much success. Only solution I could come up with involves using a system property. I understand, earlier we discussed using the system properties and it was decided as not a such a good idea. But considering there are NO other better solutions found for this problem, so far. I was think having one property to maintain a JVMID may not be so bad, user just need to give security permission to set one property, i.e if what I describe below actually works! I would really appreciate any suggestions/feedback for this solution . My understanding is a solution to this problem need to solve primarily following three issues: 1) Maintaining a state that a database is already booted, if the database if booted successfully. 2) Change the state to NOT_BOOTED, if it is not booted any more because of a a) Shutdown of the database b) Class loader that booted the db is garbage collected. c) JVM exited. 3) synchronization across class loaders. Pseudo code below that attempts to solve this problems by making the following Assumptions : 1) It is ok to use ONE system property "derby.storage.jvmid" to identify a jvm instance id. 2) It is ok to use interned strings to synchronize across class loader. 3) It is ok to call getCanonicalPath(), i think this may require permission for "user.dir" property if it is not already required. Other solution may be to assign an ID string on create of the DB and user that for DB level synchronization. 4) It is ok to rely on the class finalizer to cleanup db lock state, when the database is NOT any more because the loader that booted the database is garbage collected. /* Pseudo code to lock the DB to prevent multiple instance of a database running concurrently through class loaders in a single instance of jvm or multiple instance of jvm. Note: Following code class is in a separate class just to understand it as separate issue , this code should probably go into the dataFactory class where current db-locking is done. */ Class DbLock { private static final String DERYB_JVM_ID = "derby.storage.jvmid"; private String dbCannonicalPath; // canonical of the db being booted. private FileLock fileLock = null; private boolean dbLocked = false; DbLock (String dbCannonicalPath) { this.dbCannonicalPath = dbCannonicalPath; } /* get a unique JVM ID */ private getJvmId () { // synchronize across class loaders. synchronize(DERYB_JVM_ID) { jvmid = System.getProperty(DERYB_JVM_ID); // if jvm id is not already exist, generate one // and save it into the "derby.storage.jvmid" system // property. if (jvmid == null) { //generate a new UUID based on the time and IP ..etc. jvmid = generateJvmId() System.setProperty("derby.storage.jvmid"); } } } /* Lock the db, so that other class loader or another jvm won't be able to boot the same database. */ public lock_db_onboot(String dbCannonicalPath) { // Get a file Lock on boot() ; // this already works fileLock = getFileLock("dbex.lck"); if (lock == null) { // if we don't get lock means , some other jvm already // booted it throws ALREADY_BOOTED error. throw ALREADY_BOOTED; } else { // file lock can be acquired even if the database is already // booted by a different class loader. Check if another class // loader has booted the DB. This is done by checking the // JVMID written in the dbex.lck file. If the JVMID is same // as what is stored in the system property, // then database is already booted , throw the error. currentJvmId = getJvmId(); synchronize(dbCannonicalPath) { onDisk_JVM_ID = readIdFromDisk() ; // read ID from the dbex.lck file. if (OnDisk_JVM_ID == current_jvm_id ) throw ("DATABASE IS ALREADY BOOTED"); else { dbLocked = true; writeId(currentJvmId); //update the dbex.lck file) . } } } } /* Called on shutdown/garbage collection. */ unlock_db() { if (dbLocked) Unknown macro: { Strinng Ondisk_jvm_id = "-1"; //jvm id should never have been a -1. synchronize(dbCannonicalPath) { writeIdToDisk(Ondisk_jvm_id); //update the dbex.lck file) . } releaseFileLock(fileLock); dbLocked = false; } } /* if the db is not shutdown, this method should release the db lock related resources during this class finalization. */ protected void finalize() throws Throwable { unlock_db(); } }
          Hide
          Suresh Thalamati added a comment -

          Test to parallely boot the database using multiple threads in different class loaders might fail
          with NPE if DERBY-2649 is not fixed. Tempory workaround is to check for null in DirFile.java:deleteAll().

          Show
          Suresh Thalamati added a comment - Test to parallely boot the database using multiple threads in different class loaders might fail with NPE if DERBY-2649 is not fixed. Tempory workaround is to check for null in DirFile.java:deleteAll().
          Hide
          Suresh Thalamati added a comment -

          Attached is a patch with partial implementation of the intra-jvm db
          lock mechanism to prevent users from running multiple instances of the same
          database using different class loaders in the same jvm. Existing
          solution already prevents users from running multiple instances across jvms.

          Although I have not assigned this issue to myself, have been working on this
          issue on/off for some time. But I don't have free cycles to work on this bug for
          the upcoming release. I am posting the patch with whatever work I have done
          sofar. If some one can enhance the attached patch and fix this issue,
          that will be great.

          Intra-jvm db lock is provided by using global derby specific jvm instance id.
          On a first boot of any derby database in a jvm, an UUID is generated and
          stored in a system property ("derby.storage.jvmInstanceID"). This id is written
          to the dbex.lck file when the database is booted. If the ID in the dbex.lck file
          and the current jvm id stored in the system property
          "derby.storage.jvmInstanceID" are same then the database is already booted, and
          an exception will be thrown and database will not be booted. On a database
          shutdown , an Invalid JVM id is written to the dbex.lck file, so that if the
          database booted again ID's will not be equal, which means database is not
          already booted, the database will be booted successfully. I am using the
          existing UUID factory in the derby to generate the derby jvm id.

          Synchronization is done by using the interned strings. Synchronization
          across the JVM is done on "derby.storage.jvmInstanceID" string. And
          Synchronization for a database is done on the database directory. This
          may need to be changed to a database name, because it may not be possible
          to get the canonical path always and it is necessary to synchronize on
          a string, that is unique to a database in a jvm.

          In my earlier proposed solution I mentioned about releasing the db
          locks using finalizer. After doing some testing , I realized there is
          no need to address unlocking the database on garbage collection,
          using the finalizer. My understanding is unless users shutdown
          the database rawStoreDaemon and antiGC thread will hold on to the
          resources and the classes will not be unloaded. So the only way users
          can boot an already booted database but not in use using another class
          loader instance in the same jvm is by doing a shutdown from the class
          loader that booted the database. If some thinks this is not
          true, please correct me.

          To do :

          1) cleanup error handling on IOExceptions and add new message for the
          intra-jvm db lock.

          2) Currently dataDirectory path string is is used for synchronization to
          prevent multiple boots of a database. This may need to be changed to
          the db name.

          3) Make the classLoaderBootTest.java run under security manager.

          4) Add test cases for booting different databases parallelly on different
          threads with different class loaders. May not be really required , because
          even booting a single database through different threads shoud test
          the same thing. But it may be better to add a test case, just to be safe!

          5) Run the test with large number of threads.

          6) Anything else I have forgotten !

          Thanks
          -suresh

          Show
          Suresh Thalamati added a comment - Attached is a patch with partial implementation of the intra-jvm db lock mechanism to prevent users from running multiple instances of the same database using different class loaders in the same jvm. Existing solution already prevents users from running multiple instances across jvms. Although I have not assigned this issue to myself, have been working on this issue on/off for some time. But I don't have free cycles to work on this bug for the upcoming release. I am posting the patch with whatever work I have done sofar. If some one can enhance the attached patch and fix this issue, that will be great. Intra-jvm db lock is provided by using global derby specific jvm instance id. On a first boot of any derby database in a jvm, an UUID is generated and stored in a system property ("derby.storage.jvmInstanceID"). This id is written to the dbex.lck file when the database is booted. If the ID in the dbex.lck file and the current jvm id stored in the system property "derby.storage.jvmInstanceID" are same then the database is already booted, and an exception will be thrown and database will not be booted. On a database shutdown , an Invalid JVM id is written to the dbex.lck file, so that if the database booted again ID's will not be equal, which means database is not already booted, the database will be booted successfully. I am using the existing UUID factory in the derby to generate the derby jvm id. Synchronization is done by using the interned strings. Synchronization across the JVM is done on "derby.storage.jvmInstanceID" string. And Synchronization for a database is done on the database directory. This may need to be changed to a database name, because it may not be possible to get the canonical path always and it is necessary to synchronize on a string, that is unique to a database in a jvm. In my earlier proposed solution I mentioned about releasing the db locks using finalizer. After doing some testing , I realized there is no need to address unlocking the database on garbage collection, using the finalizer. My understanding is unless users shutdown the database rawStoreDaemon and antiGC thread will hold on to the resources and the classes will not be unloaded. So the only way users can boot an already booted database but not in use using another class loader instance in the same jvm is by doing a shutdown from the class loader that booted the database. If some thinks this is not true, please correct me. To do : 1) cleanup error handling on IOExceptions and add new message for the intra-jvm db lock. 2) Currently dataDirectory path string is is used for synchronization to prevent multiple boots of a database. This may need to be changed to the db name. 3) Make the classLoaderBootTest.java run under security manager. 4) Add test cases for booting different databases parallelly on different threads with different class loaders. May not be really required , because even booting a single database through different threads shoud test the same thing. But it may be better to add a test case, just to be safe! 5) Run the test with large number of threads. 6) Anything else I have forgotten ! Thanks -suresh
          Hide
          Kathey Marsden added a comment -

          Thanks Suresh for the patch! I was wondering, why do we need setContextClassLoader privileges for derby.jar with the change?

          Show
          Kathey Marsden added a comment - Thanks Suresh for the patch! I was wondering, why do we need setContextClassLoader privileges for derby.jar with the change?
          Hide
          Suresh Thalamati added a comment -

          [ Show » ] Kathey Marsden [22/May/07 03:22 PM] Thanks Suresh for the patch! I was wondering, why do we need
          setContextClassLoader privileges for derby.jar with the change?

          I must have added that permission while debugging to run the test under security manager.
          Only security permission that is required for derby.jar is read/write of derby.storage.jvmInstanceID
          property.

          Thanks for for volunteering to fix the bug, Kathey.
          .

          Show
          Suresh Thalamati added a comment - [ Show » ] Kathey Marsden [22/May/07 03:22 PM] Thanks Suresh for the patch! I was wondering, why do we need setContextClassLoader privileges for derby.jar with the change? I must have added that permission while debugging to run the test under security manager. Only security permission that is required for derby.jar is read/write of derby.storage.jvmInstanceID property. Thanks for for volunteering to fix the bug, Kathey. .
          Hide
          Kathey Marsden added a comment -

          Suresh said

          >2) Currently dataDirectory path string is is used for synchronization to
          > prevent multiple boots of a database. This may need to be changed to
          > the db name.

          Why does this need to be changed? If we switched to databaseName might there be potential for duplicate database names? The dataDirectory path seems logical to me.

          Show
          Kathey Marsden added a comment - Suresh said >2) Currently dataDirectory path string is is used for synchronization to > prevent multiple boots of a database. This may need to be changed to > the db name. Why does this need to be changed? If we switched to databaseName might there be potential for duplicate database names? The dataDirectory path seems logical to me.
          Hide
          Daniel John Debrunner added a comment -

          The issue with the path is that the canonical path would be needed which requires permission to read system property user.dir, which might not be granted.
          The base name of the path ("database name") is easy to obtain and works just as well. The synchronization on this object is very short lived, thus it does not stop two databases with the same name but different paths from booting.

          Show
          Daniel John Debrunner added a comment - The issue with the path is that the canonical path would be needed which requires permission to read system property user.dir, which might not be granted. The base name of the path ("database name") is easy to obtain and works just as well. The synchronization on this object is very short lived, thus it does not stop two databases with the same name but different paths from booting.
          Hide
          Kathey Marsden added a comment -

          This patch is not yet ready for commit. The multithreaded test testBootingDatabaseInMultipleThread fails intermittently allowing the database to be booted twice. I am looking to see where there may be a synchronization issue.

          Show
          Kathey Marsden added a comment - This patch is not yet ready for commit. The multithreaded test testBootingDatabaseInMultipleThread fails intermittently allowing the database to be booted twice. I am looking to see where there may be a synchronization issue.
          Hide
          Kathey Marsden added a comment -

          When the multithreaded failure occurs there is an NPE accessing the lockfile. Attaching the derby.log.

          Show
          Kathey Marsden added a comment - When the multithreaded failure occurs there is an NPE accessing the lockfile. Attaching the derby.log.
          Hide
          Kathey Marsden added a comment -

          Attached is a patch that fixes the NPE, but there are still issues with the multithreaded test. Sometimes with 9 or more threads in startConcurrentDatabaseBoots we get dual boot without any message to the derby.log. I think there is one of two issues here:

          1) There is no synchronization across classloaders, so even though we synchronize on the databaseName() it is not useful when two threads from different ClassLoaders come into the synchronized code.

          2) Since getting the exFileLock is not within the synchronized code. It is possible that the state of the dbex.lck file on disk changes before the boot.

          I am not sure how to resolve these issues, especially 1. I'd appreciate any advice on the issue.

          Thanks

          Kathey

          Show
          Kathey Marsden added a comment - Attached is a patch that fixes the NPE, but there are still issues with the multithreaded test. Sometimes with 9 or more threads in startConcurrentDatabaseBoots we get dual boot without any message to the derby.log. I think there is one of two issues here: 1) There is no synchronization across classloaders, so even though we synchronize on the databaseName() it is not useful when two threads from different ClassLoaders come into the synchronized code. 2) Since getting the exFileLock is not within the synchronized code. It is possible that the state of the dbex.lck file on disk changes before the boot. I am not sure how to resolve these issues, especially 1. I'd appreciate any advice on the issue. Thanks Kathey
          Hide
          Kathey Marsden added a comment -

          Originally I thought I would make read/write access to the property mandatory but since policy files typically have to be changed out at the deployed site, I think I will make it so it just prints a warning if access is not granted for the property.

          Kathey

          Show
          Kathey Marsden added a comment - Originally I thought I would make read/write access to the property mandatory but since policy files typically have to be changed out at the deployed site, I think I will make it so it just prints a warning if access is not granted for the property. Kathey
          Hide
          Mike Matrigali added a comment -

          keep in mind that if we make setting/reading the property optional it presents a very mixed set of possiblities. We need to at least document and maybe refer to them in the error message. Each derby.jar in each different class loader may have a different property set so one set may be able to read but not write, another can't read or write, and another can read and write. Some app may get different locking behaviors if they can read by not write based on whether they are the first one in or not ( they sometimes may get error message and sometimes not based on what someone else has done).

          Show
          Mike Matrigali added a comment - keep in mind that if we make setting/reading the property optional it presents a very mixed set of possiblities. We need to at least document and maybe refer to them in the error message. Each derby.jar in each different class loader may have a different property set so one set may be able to read but not write, another can't read or write, and another can read and write. Some app may get different locking behaviors if they can read by not write based on whether they are the first one in or not ( they sometimes may get error message and sometimes not based on what someone else has done).
          Hide
          Kathey Marsden added a comment -

          Here is the latest patch for this issue. I think we are getting close, but I have reached a sticking point that I can't seem to get past.

          If I run with classes or even with jars from within Eclipse, it all runs fine and even 100 threads pass with no problem. But for some unknown reason, when I run with jars from the commandline, it allows dual boot.

          The primary change from the last patch is that I had to synchronize the entire call to privGetJBMSLockOnDB() to avoid intermittent dual boot failures in the multiThreaded test.

          I also added a warning instead of failure if the user did not have permission to access the system property.

          Anyway, if someone is so inclined it would be great to have another pair of eyes on this to figure out what the problem is running with jars.
          I ran derbyall and suites.All. All passed with classes, but with jars ClassLoaderBootTest fails miserably.

          Kathey

          Show
          Kathey Marsden added a comment - Here is the latest patch for this issue. I think we are getting close, but I have reached a sticking point that I can't seem to get past. If I run with classes or even with jars from within Eclipse, it all runs fine and even 100 threads pass with no problem. But for some unknown reason, when I run with jars from the commandline, it allows dual boot. The primary change from the last patch is that I had to synchronize the entire call to privGetJBMSLockOnDB() to avoid intermittent dual boot failures in the multiThreaded test. I also added a warning instead of failure if the user did not have permission to access the system property. Anyway, if someone is so inclined it would be great to have another pair of eyes on this to figure out what the problem is running with jars. I ran derbyall and suites.All. All passed with classes, but with jars ClassLoaderBootTest fails miserably. Kathey
          Hide
          Kathey Marsden added a comment -

          Run without security manager the test passes with jars. So I will try to figure out what permission is missing/needed with jars.

          Show
          Kathey Marsden added a comment - Run without security manager the test passes with jars. So I will try to figure out what permission is missing/needed with jars.
          Hide
          Kathey Marsden added a comment -

          I have been unable to figure out exactly what the permission issue is when running with security manager and jars. I did find that it was corrected by giving AllPermissions to derbyTesting.jar. I am thinking therfore that the issue is with the test and not the change itself. I would like to go ahead and checkin the change, disabling security manager for the jars and file another issue to correct that problem. Reviews welcome and let me know if anyone has objections to going ahead and checking in the change. Substantive changes from the initial patch are:

          • Fix intermittent NPE
          • Fix problem with dual boot with many threads, by expanding the synchronization to include the full privJBMSLockOnDB
          • Log warning not fail if System Property permission is not granted instead of failing.

          Kathey

          Show
          Kathey Marsden added a comment - I have been unable to figure out exactly what the permission issue is when running with security manager and jars. I did find that it was corrected by giving AllPermissions to derbyTesting.jar. I am thinking therfore that the issue is with the test and not the change itself. I would like to go ahead and checkin the change, disabling security manager for the jars and file another issue to correct that problem. Reviews welcome and let me know if anyone has objections to going ahead and checking in the change. Substantive changes from the initial patch are: Fix intermittent NPE Fix problem with dual boot with many threads, by expanding the synchronization to include the full privJBMSLockOnDB Log warning not fail if System Property permission is not granted instead of failing. Kathey
          Hide
          Daniel John Debrunner added a comment -

          From a very quick look at the code two things jumped out at me:

          1) Why does the test need a Derby specific class loader? The upgrade tests don't and there are performing the same style of operation.

          2) The new method StorageFile.getLockedFile() is part of the api so its definition needs to be clear.

          2a) In the interface it says it returns the file for the lock file, but I don't think lock file is defined anywhere in the interface description.
          2b) The interface doesn't indicate it can return null, but one implementation of it does.
          2c) Why is this method on StorageFile, it's not an attribute of a specific file, at least I don;t think it is. Is there a lock file per open file?

          Show
          Daniel John Debrunner added a comment - From a very quick look at the code two things jumped out at me: 1) Why does the test need a Derby specific class loader? The upgrade tests don't and there are performing the same style of operation. 2) The new method StorageFile.getLockedFile() is part of the api so its definition needs to be clear. 2a) In the interface it says it returns the file for the lock file, but I don't think lock file is defined anywhere in the interface description. 2b) The interface doesn't indicate it can return null, but one implementation of it does. 2c) Why is this method on StorageFile, it's not an attribute of a specific file, at least I don;t think it is. Is there a lock file per open file?
          Hide
          Kathey Marsden added a comment -

          Thanks Dan for the comment.

          getExclusiveLock() and getLockedFile() are only substatively relevant to DirFile4 and for that file yes, I think there is a lock file per open file. I will change the definition to say that getLockedFile() will return a RandomAcess file to the lock file obtained by getExcluciveLock() if an exclusive lock was successfully obtained with getExclusiveLock() otherwise it will return null.

          Does that sound right?

          I'll look at the test. Maybe I will fix it post checkin if I am not able to get it fixed up today.

          Show
          Kathey Marsden added a comment - Thanks Dan for the comment. getExclusiveLock() and getLockedFile() are only substatively relevant to DirFile4 and for that file yes, I think there is a lock file per open file. I will change the definition to say that getLockedFile() will return a RandomAcess file to the lock file obtained by getExcluciveLock() if an exclusive lock was successfully obtained with getExclusiveLock() otherwise it will return null. Does that sound right? I'll look at the test. Maybe I will fix it post checkin if I am not able to get it fixed up today.
          Hide
          Kathey Marsden added a comment -

          Applications will need to set a new permission in their policy file. So this will affect existing applications and release note is needed.

          Show
          Kathey Marsden added a comment - Applications will need to set a new permission in their policy file. So this will affect existing applications and release note is needed.
          Hide
          Kathey Marsden added a comment -

          release note

          Show
          Kathey Marsden added a comment - release note
          Hide
          Daniel John Debrunner added a comment -

          > I think there is a lock file per open file. I will change the definition to say that getLockedFile() will return a RandomAcess file to the lock file obtained by getExcluciveLock() if an exclusive lock was successfully obtained with getExclusiveLock() otherwise it will return null.

          Sorry, still somewhat lost. Why is a call to get a random access file on StorageFile needed since it already has such a method?

          Show
          Daniel John Debrunner added a comment - > I think there is a lock file per open file. I will change the definition to say that getLockedFile() will return a RandomAcess file to the lock file obtained by getExcluciveLock() if an exclusive lock was successfully obtained with getExclusiveLock() otherwise it will return null. Sorry, still somewhat lost. Why is a call to get a random access file on StorageFile needed since it already has such a method?
          Hide
          Daniel John Debrunner added a comment -

          -1 on this patch (now applied as 547042 & 547046):

          The additional method on StorageFile is not well defined.

          StorageFile.getExclusiveFileLock() indicates it gets an exclusive lock on the name represented by StorageFile.

          The new method StorageFile.getLockedFile() seems to imply that if a StorageFile is locked, then there is a new
          different file created to perform the lock, thus one can modify the contents of the two files independently,
          using the existing StorageFile.getRandomAccessFile() and now the new StorageFile.getLockedFile().
          (Otherwise why have the new method, one can already get a random access to a StorageFile??).

          This seems problematic for a few reasons:

          • seems inefficient, creating a new file to lock another
          • seems to change the contract of the existing method getExclusiveFileLock() (no mention of requiring a separate file)
          • possibly hard to implement for an arbitary IO implementation.

          Maybe though I'm misunderstanding what is going on, the javadoc comment for getLockedFile() does seem to have
          a couple of typos in it (e.g. "exclusing e lock"). If there's some good explanation for what is going on and the relationship
          between StorageFile.getLockedFile() and StorageFile.getRandomAccessFile() can be well defined then I can frop my veto.

          Show
          Daniel John Debrunner added a comment - -1 on this patch (now applied as 547042 & 547046): The additional method on StorageFile is not well defined. StorageFile.getExclusiveFileLock() indicates it gets an exclusive lock on the name represented by StorageFile. The new method StorageFile.getLockedFile() seems to imply that if a StorageFile is locked, then there is a new different file created to perform the lock, thus one can modify the contents of the two files independently, using the existing StorageFile.getRandomAccessFile() and now the new StorageFile.getLockedFile(). (Otherwise why have the new method, one can already get a random access to a StorageFile??). This seems problematic for a few reasons: seems inefficient, creating a new file to lock another seems to change the contract of the existing method getExclusiveFileLock() (no mention of requiring a separate file) possibly hard to implement for an arbitary IO implementation. Maybe though I'm misunderstanding what is going on, the javadoc comment for getLockedFile() does seem to have a couple of typos in it (e.g. "exclusing e lock"). If there's some good explanation for what is going on and the relationship between StorageFile.getLockedFile() and StorageFile.getRandomAccessFile() can be well defined then I can frop my veto.
          Hide
          Daniel John Debrunner added a comment -

          Note I'm reviewing the patch called; derby-700_06_07_07_diff.txt
          Comments on 6/12 seem to indicate a new patch for review but nothing seems to have been attached that day, I think those comments are about the previous version.

          Some comments on the patch:

          • In getIntraJvmDbLock() there is this comment:
            + // synchronizing across the same database, by using interned
            + // version of the database name

          but I can't see any synchronization. It seems the synchronization has moved to wrap the entire call to privGetJBMSLockOnDB().
          Since this synchronization is key it would be good to get the comments clear so that future readers can understand how this works.
          Also I think putting the synchronization in the privGetJBMSLockOnDB() method would be much clearer than hiding it in the run() method.
          I see the comment above indicating the synchronization was expanded to be the full privGetJBMSLockOnDB() method, but there is
          no reason given. Can you share your thoughts on why this is required?

          • the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object?
            If not it should be clearly commented how the synchronization is working.
          • In getIntraJvmDbLock() and releaseIntraJvmDbLock() the StorageAccessFile lckFileRaf is never closed, I think it needs to be or have comments indicating why it is not closed. Maybe it's related to the issue of the new method added to StorageFile??
          • In releaseIntraJvmDbLock() I was a little unclear by the invalid uuid concept. Is it an invalid uuid such that recreateUUID will throw an exception in getIntraJvmDbLock() or is it a valid representation of a uuid that can never be created by Derby?
          • Using dataDirectory.intern() to synchronize across class loaders in releaseIntraJvmDbLock() could probably use more comments, basically describing what guarantees that all derby systems have the same path for the data directory for a given database?

          Thanks for working on this, it will help stop users corrupting their own databases.

          Show
          Daniel John Debrunner added a comment - Note I'm reviewing the patch called; derby-700_06_07_07_diff.txt Comments on 6/12 seem to indicate a new patch for review but nothing seems to have been attached that day, I think those comments are about the previous version. Some comments on the patch: In getIntraJvmDbLock() there is this comment: + // synchronizing across the same database, by using interned + // version of the database name but I can't see any synchronization. It seems the synchronization has moved to wrap the entire call to privGetJBMSLockOnDB(). Since this synchronization is key it would be good to get the comments clear so that future readers can understand how this works. Also I think putting the synchronization in the privGetJBMSLockOnDB() method would be much clearer than hiding it in the run() method. I see the comment above indicating the synchronization was expanded to be the full privGetJBMSLockOnDB() method, but there is no reason given. Can you share your thoughts on why this is required? the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object? If not it should be clearly commented how the synchronization is working. In getIntraJvmDbLock() and releaseIntraJvmDbLock() the StorageAccessFile lckFileRaf is never closed, I think it needs to be or have comments indicating why it is not closed. Maybe it's related to the issue of the new method added to StorageFile?? In releaseIntraJvmDbLock() I was a little unclear by the invalid uuid concept. Is it an invalid uuid such that recreateUUID will throw an exception in getIntraJvmDbLock() or is it a valid representation of a uuid that can never be created by Derby? Using dataDirectory.intern() to synchronize across class loaders in releaseIntraJvmDbLock() could probably use more comments, basically describing what guarantees that all derby systems have the same path for the data directory for a given database? Thanks for working on this, it will help stop users corrupting their own databases.
          Hide
          Daniel John Debrunner added a comment -

          Couple of minor additonal comments:

          • The code would be a little simpler if getJvmId() just returned a String instead of a UUID. Then there's no need to convert
            the value read back from disk to a uuid. String's compare just as well as UUIDs for this purpose.
          • The reading of the on-disk identifier has this code:

          + try
          +

          { + if (exFileLock.length() != 0) + onDiskJvmId = uuidFactory.recreateUUID(lckFileRaf.readUTF()); + }


          + catch (Exception e)
          +

          { + // The previous owner of the lock may have died before we + // finish writing its UUID down. Assume uuid file is invalid + // Set the id on the disk to null value. + onDiskJvmId = null; + }
          • Why is the file length of zero special cased?
          • I think the comment could be improved to

          + // The previous owner of the lock may have died before we
          + // finish writing its UUID down. Assume uuid file is invalid
          + // and thus the database is not in use (locked).

          Show
          Daniel John Debrunner added a comment - Couple of minor additonal comments: The code would be a little simpler if getJvmId() just returned a String instead of a UUID. Then there's no need to convert the value read back from disk to a uuid. String's compare just as well as UUIDs for this purpose. The reading of the on-disk identifier has this code: + try + { + if (exFileLock.length() != 0) + onDiskJvmId = uuidFactory.recreateUUID(lckFileRaf.readUTF()); + } + catch (Exception e) + { + // The previous owner of the lock may have died before we + // finish writing its UUID down. Assume uuid file is invalid + // Set the id on the disk to null value. + onDiskJvmId = null; + } Why is the file length of zero special cased? I think the comment could be improved to + // The previous owner of the lock may have died before we + // finish writing its UUID down. Assume uuid file is invalid + // and thus the database is not in use (locked).
          Hide
          Mike Matrigali added a comment -

          Dan I would like to understand your concerns with respect to the locking interfaces on StorageFile. We are only going to call locking on the lock files, not on other RandomAccessFiles. The current code is using the exising RandomAccessFile support
          to do this - do you think we should be creating a different type of file rather than putting the interfaces on StorageFile?

          I agree the comments could be improved, and I guess since we are so close to a release it is better to get it right before checkin. I was ok before with checking it in as it was and then fixing in subsequent checkin (I was planning on helping with this once code got in, instead of a patch).

          At this point it looks like this won't make it into first release candidate, but I plan on helping to get this to a good state for a subsequent release candidate or at least get it into the branch for anyone who wants it if we don't have another release candidate. I would not hold up release candidate for this. I think it is reasonable to get the documentation for the requested permission into the release even if we can't get the code in this week, is that a problem?

          Show
          Mike Matrigali added a comment - Dan I would like to understand your concerns with respect to the locking interfaces on StorageFile. We are only going to call locking on the lock files, not on other RandomAccessFiles. The current code is using the exising RandomAccessFile support to do this - do you think we should be creating a different type of file rather than putting the interfaces on StorageFile? I agree the comments could be improved, and I guess since we are so close to a release it is better to get it right before checkin. I was ok before with checking it in as it was and then fixing in subsequent checkin (I was planning on helping with this once code got in, instead of a patch). At this point it looks like this won't make it into first release candidate, but I plan on helping to get this to a good state for a subsequent release candidate or at least get it into the branch for anyone who wants it if we don't have another release candidate. I would not hold up release candidate for this. I think it is reasonable to get the documentation for the requested permission into the release even if we can't get the code in this week, is that a problem?
          Hide
          Daniel John Debrunner added a comment -

          Simple question is why is a new method needed on StorageFile. Since one can already get an object to perform random access i/o on a StorageFile, why is a new method needed to perform the same functionality.

          Either it is because for some reason the api requires the lock files hold different contents to the file being locked. I don't think this is the case.

          Or it is because it's not needed. I think this is the case, but there must have been some reason for adding it. I'm just asking what is the reason for adding such a method.

          I think we are creating a StorageFile to represent "dbex.lck" (whatever the name is) and then locking that file. That's all good, but why can we not modify that file using the existing methods on StorageFactory?

          Show
          Daniel John Debrunner added a comment - Simple question is why is a new method needed on StorageFile. Since one can already get an object to perform random access i/o on a StorageFile, why is a new method needed to perform the same functionality. Either it is because for some reason the api requires the lock files hold different contents to the file being locked. I don't think this is the case. Or it is because it's not needed. I think this is the case, but there must have been some reason for adding it. I'm just asking what is the reason for adding such a method. I think we are creating a StorageFile to represent "dbex.lck" (whatever the name is) and then locking that file. That's all good, but why can we not modify that file using the existing methods on StorageFactory?
          Hide
          Suresh Thalamati added a comment -

          I agree with Dan, getLockedFile() is confusing and should not be added
          to the storage factory interfaces , if it can be avoided or have better
          comments.

          Just thought I will take a moment and explain, why in the first place
          I added this method, I hope it will help in finding an alternative solution
          or make the interface better.

          I was testing and developing my solution on Windows. When I
          first implemented without adding the getLockedFile() method, by
          just getting the RandomAccessFile using StorageFile.getRandomAccessFile(),
          after it is locked. I was hitting the error , java.io.IOException:
          "
          The process cannot access the file because another process has locked a
          portion of the file. When it write UUID to the dbex.lck file.
          "

          After bit of debugging realized , I need access to the same
          RandomAccesFile, that is used to the get the lock. So I simply
          added the getLockedFile, which just return the same RandomAccessFile,
          that is used to get the file lock.

          public class writeLock {
          public static void main(String[] args) throws Exception

          { File lf = new File("dbex.lck"); RandomAccessFile raf1 = new RandomAccessFile(lf, "rw"); FileChannel fc = raf1.getChannel(); FileLock lock = fc.tryLock(); // attempt to write to locked file using another // RandomAccessFile raf2 = new RandomAccessFile(lf , "rw"); // the following write is failing on windows. raf2.writeChars("Derby is great "); }

          }

          For example , in the above code. Last write will fail with the error:
          Exception in thread "main" java.io.IOException: The process cannot access the
          file because another process has locked a portion of the file.

          I did not verify, if this is the case in the non-window environment too. This
          fix is mainly for non-window environment, so If on other platforms this is
          not issue then the getLockedFile() can be replaced with the
          StorageFile.getRandomAccesFile(). My concern is, How to confirm all
          non-window environments will not give the above error.

          First I thought of putting the new intraJVM lock code in the storage
          factory, where getExclusiveFileLock() is implemented, For me
          It looked worse alternative too getLockedFile() method, because then
          getExlusiveFileLock() method semantics gets more confusing.

          Another solution I was thinking that may work without adding the
          *getLockedFile() is to use the range lock. like FileLock lock =
          fc.tryLock(0, 10 ,false), and write to the file after the 10th byte.
          This may need a new method getExclusiveLock() , which takes range.

          May be simple solution is to add a new call, getExclusiveFileLock()
          method that takes RandomAcceesFile as the argument.

          To start with the reason we are in this mess is the getExclusiveLock()
          implementation is not matching the way java interfaces work. Ideally
          store code should hold on to the RandomAccessFile used for the lock,
          not the storage factory implementraion. Ideally StorageRandomAccessFile
          should give handle to the StorageFileChannel, which has the tryLock()
          method, that returns FileLock.

          Thanks
          -suresh

          Show
          Suresh Thalamati added a comment - I agree with Dan, getLockedFile() is confusing and should not be added to the storage factory interfaces , if it can be avoided or have better comments. Just thought I will take a moment and explain, why in the first place I added this method, I hope it will help in finding an alternative solution or make the interface better. I was testing and developing my solution on Windows. When I first implemented without adding the getLockedFile() method, by just getting the RandomAccessFile using StorageFile.getRandomAccessFile(), after it is locked. I was hitting the error , java.io.IOException: " The process cannot access the file because another process has locked a portion of the file. When it write UUID to the dbex.lck file. " After bit of debugging realized , I need access to the same RandomAccesFile, that is used to the get the lock. So I simply added the getLockedFile, which just return the same RandomAccessFile, that is used to get the file lock. public class writeLock { public static void main(String[] args) throws Exception { File lf = new File("dbex.lck"); RandomAccessFile raf1 = new RandomAccessFile(lf, "rw"); FileChannel fc = raf1.getChannel(); FileLock lock = fc.tryLock(); // attempt to write to locked file using another // RandomAccessFile raf2 = new RandomAccessFile(lf , "rw"); // the following write is failing on windows. raf2.writeChars("Derby is great "); } } For example , in the above code. Last write will fail with the error: Exception in thread "main" java.io.IOException: The process cannot access the file because another process has locked a portion of the file. I did not verify, if this is the case in the non-window environment too. This fix is mainly for non-window environment, so If on other platforms this is not issue then the getLockedFile() can be replaced with the StorageFile.getRandomAccesFile(). My concern is, How to confirm all non-window environments will not give the above error. First I thought of putting the new intraJVM lock code in the storage factory, where getExclusiveFileLock() is implemented, For me It looked worse alternative too getLockedFile() method, because then getExlusiveFileLock() method semantics gets more confusing. Another solution I was thinking that may work without adding the *getLockedFile() is to use the range lock. like FileLock lock = fc.tryLock(0, 10 ,false), and write to the file after the 10th byte. This may need a new method getExclusiveLock() , which takes range. May be simple solution is to add a new call, getExclusiveFileLock() method that takes RandomAcceesFile as the argument. To start with the reason we are in this mess is the getExclusiveLock() implementation is not matching the way java interfaces work. Ideally store code should hold on to the RandomAccessFile used for the lock, not the storage factory implementraion. Ideally StorageRandomAccessFile should give handle to the StorageFileChannel, which has the tryLock() method, that returns FileLock. Thanks -suresh
          Hide
          Kathey Marsden added a comment -

          Dan, based on what Suresh said, does the solution to add a new call, getExclusiveFileLock()
          method that takes RandomAcceesFile as the argument seem like a reasonable solution to get rid of getLockedFile or should we just document getLockedFile better, or something else ...?

          Show
          Kathey Marsden added a comment - Dan, based on what Suresh said, does the solution to add a new call, getExclusiveFileLock() method that takes RandomAcceesFile as the argument seem like a reasonable solution to get rid of getLockedFile or should we just document getLockedFile better, or something else ...?
          Hide
          Daniel John Debrunner added a comment -

          Thanks Suresh, that was very useful. The piece of missing information for me was that on windows one can only have one open descriptor against a locked file (from Suresh's description).

          So really this method was added because the existing (pre-patch) implementation of StorageFile does not honour the contract of StorageFile. This is because with this implementation (on some platforms) if StorageFile.getExclusiveFileLock() is called then StorageFile.getRandomAccessFile() does not return an object that can be used to modify the file.

          In fact looking at the DirFile implementation it seems the file locking implementation is not what a reader of the IO api (the javadoc for StorageFile) might expect. I read the StorageFile javadoc and expect that getExclusiveFileLock() puts a lock on the file, I can then use it as a regular file and then release the lock. Then another thread/jvm could perform the same action and we would have normal file locking sematics. That's why the change confused me, because I was expecting certain behaviour based upon the contract of StorageFile.

          But the DirFile/DirFile4 implementations are not that and are not even identical to each other (in terms of visible behaviour).

          • DirFile Calling getExclusiveFileLock() deletes the file if it exists and then creates it
          • DirFile4 Creates the file and writes four bytes into it.
          • Once a storage file is locked, it can not be used as a regular file
          • releasing the lock deletes the file(!!)

          I can see why all of this is done, it's for a specific purpose for locking Derby databases, but I don't think anyone would expect either
          of those implementations from reading the api.

          Thus I think first the StorageFile javadoc should be changed to describe the behaviour implemented (since it's all that is required by Derby today).
          I think the generalities of it are:

          • Any StorageFile that has getExclusiveFileLock() called on it is not intended to be read from or written to. It's sole purpose is to provide
            a locked entity to avoid multiple instances of Derby accessing the same database.
          • getExclusiveFileLock() may delete or overwrite any existing file
          • releaseExclusiveFileLock() may delete the file

          Then I wonder if this new locking mechanism for intra-jvm locks should be contained within the DirFile4 StorageFile implementation.
          The code is completly reliant on that implementation, so why not enclose it? Then there would be no need to change the StorageFactory interface (after from the comments clarifying behaviour).

          Show
          Daniel John Debrunner added a comment - Thanks Suresh, that was very useful. The piece of missing information for me was that on windows one can only have one open descriptor against a locked file (from Suresh's description). So really this method was added because the existing (pre-patch) implementation of StorageFile does not honour the contract of StorageFile. This is because with this implementation (on some platforms) if StorageFile.getExclusiveFileLock() is called then StorageFile.getRandomAccessFile() does not return an object that can be used to modify the file. In fact looking at the DirFile implementation it seems the file locking implementation is not what a reader of the IO api (the javadoc for StorageFile) might expect. I read the StorageFile javadoc and expect that getExclusiveFileLock() puts a lock on the file, I can then use it as a regular file and then release the lock. Then another thread/jvm could perform the same action and we would have normal file locking sematics. That's why the change confused me, because I was expecting certain behaviour based upon the contract of StorageFile. But the DirFile/DirFile4 implementations are not that and are not even identical to each other (in terms of visible behaviour). DirFile Calling getExclusiveFileLock() deletes the file if it exists and then creates it DirFile4 Creates the file and writes four bytes into it. Once a storage file is locked, it can not be used as a regular file releasing the lock deletes the file(!!) I can see why all of this is done, it's for a specific purpose for locking Derby databases, but I don't think anyone would expect either of those implementations from reading the api. Thus I think first the StorageFile javadoc should be changed to describe the behaviour implemented (since it's all that is required by Derby today). I think the generalities of it are: Any StorageFile that has getExclusiveFileLock() called on it is not intended to be read from or written to. It's sole purpose is to provide a locked entity to avoid multiple instances of Derby accessing the same database. getExclusiveFileLock() may delete or overwrite any existing file releaseExclusiveFileLock() may delete the file Then I wonder if this new locking mechanism for intra-jvm locks should be contained within the DirFile4 StorageFile implementation. The code is completly reliant on that implementation, so why not enclose it? Then there would be no need to change the StorageFactory interface (after from the comments clarifying behaviour).
          Hide
          Rick Hillegas added a comment -

          Scrubbing release note so that it can be digested by the SAX parser in the release note generator. Naked <br> tags replaced with <br/>.

          Show
          Rick Hillegas added a comment - Scrubbing release note so that it can be digested by the SAX parser in the release note generator. Naked <br> tags replaced with <br/>.
          Hide
          Kathey Marsden added a comment -

          I am working to move the intraJVM lock into StorageFile, getExclusiveLock()/ releaseExclusiveLock, but I wanted to comment on why I expanded the synchronization to include all of privGetJBMSLockOnDB as I think that will still be required.

          I found running the test with multiple threads that I would get occasional dual boot without any error message or warning with the original patch. I expanded the synchronization and found that it corrected the problem. I think there seems to be a window where two threads can get in and create or delete the lock file at the same time, e.g.
          if (fileLock.exists())
          {
          ...
          ----> Another thread gets in and deletes the file here

          I am not totally sure that this is the hole and since this logic would remain in BaseDataFileFactory we would still need to synchronize on the interned databaseName there. I am sure however that it works with the expanded synchronization but does not without it.

          Show
          Kathey Marsden added a comment - I am working to move the intraJVM lock into StorageFile, getExclusiveLock()/ releaseExclusiveLock, but I wanted to comment on why I expanded the synchronization to include all of privGetJBMSLockOnDB as I think that will still be required. I found running the test with multiple threads that I would get occasional dual boot without any error message or warning with the original patch. I expanded the synchronization and found that it corrected the problem. I think there seems to be a window where two threads can get in and create or delete the lock file at the same time, e.g. if (fileLock.exists()) { ... ----> Another thread gets in and deletes the file here I am not totally sure that this is the hole and since this logic would remain in BaseDataFileFactory we would still need to synchronize on the interned databaseName there. I am sure however that it works with the expanded synchronization but does not without it.
          Hide
          Kathey Marsden added a comment -

          If I try to keep the interface the same and incorporate the intrajvm lock into getExclusiveLock() I will lose granularity in the error message. With the previous patch, we had a separate intrajvm dual boot messages as well as a warning if the property was not writable. Should I add two new return possibilities from getExclusiveLock to handle these cases?

          Kathey

          Show
          Kathey Marsden added a comment - If I try to keep the interface the same and incorporate the intrajvm lock into getExclusiveLock() I will lose granularity in the error message. With the previous patch, we had a separate intrajvm dual boot messages as well as a warning if the property was not writable. Should I add two new return possibilities from getExclusiveLock to handle these cases? Kathey
          Hide
          Daniel John Debrunner added a comment -

          Could the problem (needing to expand) with the synchronization be related to question I raised earlier?

          >> - the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object?
          >> If not it should be clearly commented how the synchronization is working.

          In the comment above, I'm not sure which piece of code you are referring to here?

          if (fileLock.exists())
          {
          ...
          ----> Another thread gets in and deletes the file here

          Show
          Daniel John Debrunner added a comment - Could the problem (needing to expand) with the synchronization be related to question I raised earlier? >> - the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object? >> If not it should be clearly commented how the synchronization is working. In the comment above, I'm not sure which piece of code you are referring to here? if (fileLock.exists()) { ... ----> Another thread gets in and deletes the file here
          Hide
          Kathey Marsden added a comment -

          Dan asked:
          >Could the problem (needing to expand) with the synchronization be related to question I raised earlier?

          >> - the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object?

          No even with that corrected the test still fails with only putting synchronization on getIntraJVMDBLock()

          In the comment above, I'm not sure which piece of code you are referring to here?

          This is from privGetJBMSLockOnDB()

          Show
          Kathey Marsden added a comment - Dan asked: >Could the problem (needing to expand) with the synchronization be related to question I raised earlier? >> - the synchronization for getting the intra jvm lock and releasing it are using different objects. Don't they need to use the same object? No even with that corrected the test still fails with only putting synchronization on getIntraJVMDBLock() In the comment above, I'm not sure which piece of code you are referring to here? This is from privGetJBMSLockOnDB()
          Hide
          Kathey Marsden added a comment -

          Mike asked me to post the patch that has changes from Dan's first round of review comments but does not have the move of getIntraJVMDbLock to DirFile4. This is not for commit

          Show
          Kathey Marsden added a comment - Mike asked me to post the patch that has changes from Dan's first round of review comments but does not have the move of getIntraJVMDbLock to DirFile4. This is not for commit
          Hide
          Daniel John Debrunner added a comment -

          With the changes made for DERBY-2737 it might be wise to get the template server policy file to include the permission on the property derby.storage.jvmInstanceId even if this fix doesn't make it into the first 10.3 release. At least then any user's policy file might have a better chance of having the permission.

          Show
          Daniel John Debrunner added a comment - With the changes made for DERBY-2737 it might be wise to get the template server policy file to include the permission on the property derby.storage.jvmInstanceId even if this fix doesn't make it into the first 10.3 release. At least then any user's policy file might have a better chance of having the permission.
          Hide
          Daniel John Debrunner added a comment -

          Expanding the synchronization "because it works" concerns me greatly as an engineering approach to addressing a synchronization issue.

          The purpose of the synchronization was intended to protect a small window while a unique identifier was being written into a file (exFileLock in privGetJBMSLockOnDB) used for lock control.

          Expanding this synchronization is now also protecting the writing of a different file (fileLock in privGetJBMSLockOnDB), but why is that needed?
          Especially when the system has to handle multiple jvm's writing to that same file, when object synchronization doesn't help.

          It maybe that the current locking across doesn't really work and there are windows where multiple jvms booting the same database at the same time would succeed when they should fail. E.g. the locking works correctly when two jvms boot the same database one after the other (ie the second would fail) but not when they concurrently boot the same db. Maybe your test is exposing that and so the synchronization is required, but then it doesn't address the cross jvm issue.

          The locking of database boot is already complicated and adding the new (required) mechanism for cross-class loader complicates it further.
          It would be good to have a clear description of how the various schemes work and interact with each other. WIth that, one could then more easily determine what synchronization is required and why.

          Show
          Daniel John Debrunner added a comment - Expanding the synchronization "because it works" concerns me greatly as an engineering approach to addressing a synchronization issue. The purpose of the synchronization was intended to protect a small window while a unique identifier was being written into a file (exFileLock in privGetJBMSLockOnDB) used for lock control. Expanding this synchronization is now also protecting the writing of a different file (fileLock in privGetJBMSLockOnDB), but why is that needed? Especially when the system has to handle multiple jvm's writing to that same file, when object synchronization doesn't help. It maybe that the current locking across doesn't really work and there are windows where multiple jvms booting the same database at the same time would succeed when they should fail. E.g. the locking works correctly when two jvms boot the same database one after the other (ie the second would fail) but not when they concurrently boot the same db. Maybe your test is exposing that and so the synchronization is required, but then it doesn't address the cross jvm issue. The locking of database boot is already complicated and adding the new (required) mechanism for cross-class loader complicates it further. It would be good to have a clear description of how the various schemes work and interact with each other. WIth that, one could then more easily determine what synchronization is required and why .
          Hide
          Myrna van Lunteren added a comment -

          I am unchecking the 10.3 fixin. It doesn't look like this is going to make it.

          Show
          Myrna van Lunteren added a comment - I am unchecking the 10.3 fixin. It doesn't look like this is going to make it.
          Hide
          Daniel John Debrunner added a comment -

          This is an issue for every release/branch so far

          Show
          Daniel John Debrunner added a comment - This is an issue for every release/branch so far
          Hide
          Kathey Marsden added a comment -

          Unassigning myself from this issue for now. I am not actively working on it and am not sure how to resolve the synchronization issue I was seeing when running the test with many threads. Hopefully someone else will be able to find a solution which does not expand the synchronization so much.

          Kathey

          Show
          Kathey Marsden added a comment - Unassigning myself from this issue for now. I am not actively working on it and am not sure how to resolve the synchronization issue I was seeing when running the test with many threads. Hopefully someone else will be able to find a solution which does not expand the synchronization so much. Kathey
          Hide
          Kathey Marsden added a comment -

          I have been thinking about this issue and wonder:
          If we put the name of the database in the name of the derby.rawStoreDaemon thread and then check for the existence of that thread before booting the database and fail if it exists, would that be a reasonable solution?

          Show
          Kathey Marsden added a comment - I have been thinking about this issue and wonder: If we put the name of the database in the name of the derby.rawStoreDaemon thread and then check for the existence of that thread before booting the database and fail if it exists, would that be a reasonable solution?
          Hide
          Kathey Marsden added a comment -

          I did some testing with the thread name idea (just in a separate stand-alone program) and found that it would probably be workable. Performance wasn't bad. It seemed to take only 16 milliseconds on my machine to iterate through 1000 thread names and synchronizing on an interned string seemed to work across ClassLoaders but ...

          1) It would require RuntimePermissions getStackTrace and modifyThreadGroup to derby.jar, which seem like pretty powerful additions to our permissions requirements.

          2) I was looking back at Suresh's original implmentation (using the system property) and found that solution actually still holds promise. There was just an issue with synchronization which needs to be resolved. This solution wouldn't collide with any effort to consolidate the daemon threads and we already have this permission requirement documented, so I think this is a cleaner solution that should be revisited.

          I'll take a look again at the old patch and try to figure out why the synchronization expansion was required and how to avoid it.

          Show
          Kathey Marsden added a comment - I did some testing with the thread name idea (just in a separate stand-alone program) and found that it would probably be workable. Performance wasn't bad. It seemed to take only 16 milliseconds on my machine to iterate through 1000 thread names and synchronizing on an interned string seemed to work across ClassLoaders but ... 1) It would require RuntimePermissions getStackTrace and modifyThreadGroup to derby.jar, which seem like pretty powerful additions to our permissions requirements. 2) I was looking back at Suresh's original implmentation (using the system property) and found that solution actually still holds promise. There was just an issue with synchronization which needs to be resolved. This solution wouldn't collide with any effort to consolidate the daemon threads and we already have this permission requirement documented, so I think this is a cleaner solution that should be revisited. I'll take a look again at the old patch and try to figure out why the synchronization expansion was required and how to avoid it.
          Hide
          Rick Hillegas added a comment -

          Amended the title, description, and environment fields. This bug also appears on the Apple Java 5 VM on Mac OS X 10.5.7. It appears that this bug is not limited to Linux.

          Show
          Rick Hillegas added a comment - Amended the title, description, and environment fields. This bug also appears on the Apple Java 5 VM on Mac OS X 10.5.7. It appears that this bug is not limited to Linux.
          Hide
          Rick Hillegas added a comment - - edited

          It appears to me that a workaround for this bug is to upgrade to Java 6. I am attaching a slightly revised version of the repro, DualBootRepro.java (this revision contains all of the helper classes needed to create the failure scenario). I see failures on the following platforms:

          Apple Java 5 on Mac OS X
          Sun Java 5 on Ubuntu Linux
          IBM Java 5 on Ubuntu Linux

          Happily, however, the second boot fails and raises java.nio.channels.OverlappingFileLockException on the following platforms:

          Apple Java 6 beta on Mac OS X
          Sun Java 6 on Ubuntu Linux
          IBM Java 6 on Ubuntu Linux

          Admittedly, that's a graceless failure. However, I think it should prevent the database from being corrupted and it's worth recommending an upgrade to Java 6 for users who encounter this bug.

          Show
          Rick Hillegas added a comment - - edited It appears to me that a workaround for this bug is to upgrade to Java 6. I am attaching a slightly revised version of the repro, DualBootRepro.java (this revision contains all of the helper classes needed to create the failure scenario). I see failures on the following platforms: Apple Java 5 on Mac OS X Sun Java 5 on Ubuntu Linux IBM Java 5 on Ubuntu Linux Happily, however, the second boot fails and raises java.nio.channels.OverlappingFileLockException on the following platforms: Apple Java 6 beta on Mac OS X Sun Java 6 on Ubuntu Linux IBM Java 6 on Ubuntu Linux Admittedly, that's a graceless failure. However, I think it should prevent the database from being corrupted and it's worth recommending an upgrade to Java 6 for users who encounter this bug.
          Hide
          Kathey Marsden added a comment -

          Thank you Rick for looking at this issue. I think it sorely needs a new pair of eyes. I know I have seen dual boot on SuSE Linux with IBM 1.6, but have not tried the recent fix packs. I have seen the OverlappingFileLockException on IBM 1.6 with z/OS, but thought it was a JVM bug working in our favor. Has there been recent spec clarification that exclusive java.nio.FileLocks should prevent cross classloader access? If we determine that the OverlappingFileLockException is the correct JVM behavior, then I think we can catch that exception and throw a dual boot message and then file JVM bugs where we don't see the OverlappingFileLockException.

          Show
          Kathey Marsden added a comment - Thank you Rick for looking at this issue. I think it sorely needs a new pair of eyes. I know I have seen dual boot on SuSE Linux with IBM 1.6, but have not tried the recent fix packs. I have seen the OverlappingFileLockException on IBM 1.6 with z/OS, but thought it was a JVM bug working in our favor. Has there been recent spec clarification that exclusive java.nio.FileLocks should prevent cross classloader access? If we determine that the OverlappingFileLockException is the correct JVM behavior, then I think we can catch that exception and throw a dual boot message and then file JVM bugs where we don't see the OverlappingFileLockException.
          Hide
          Rick Hillegas added a comment - - edited

          Hi Kathey. I'm not aware of a spec clarification but I will ask around. I agree that we still need a solution for people whose VMs don't raise OverlappingFileLockException in this situation. Just thought that we might be able to spare some users a data corruption by recommending that they experiment with Java 6.

          Show
          Rick Hillegas added a comment - - edited Hi Kathey. I'm not aware of a spec clarification but I will ask around. I agree that we still need a solution for people whose VMs don't raise OverlappingFileLockException in this situation. Just thought that we might be able to spare some users a data corruption by recommending that they experiment with Java 6.
          Hide
          Rick Hillegas added a comment - - edited

          I have confirmed the following with Alan Bateman, the spec lead for NIO2:

          1) The call we are making (java.nio.channels.FileChannel.tryLock) is supposed to raise java.nio.channels.OverlappingFileLockException if an attempt is made to lock the same section of a file twice in the same VM, regardless of what ClassLoaders the calls come from. So the results I reported are expected.

          2) However, in Java 5 java.nio.channels.OverlappingFileLockException is NOT raised in this situation. This is a Java 5 bug. The bug was fixed in Java 6.

          3) Alan is not aware of any changes made by IBM to the nio code in this area. He is curious about the claim that java.nio.channels.OverlappingFileLockException is not raised by IBM Java 6 on SuSE Linux.

          Could someone run the repro on IBM Java 6 on SuSE Linux in order to resolve this confusion?

          One more wrinkle: When (2) was fixed, a compatibility flag was introduced so that people could continue to enjoy the old behavior, that is, the behavior we don't want. You will get the bad Java 5 behavior if you set the following system property to anything other than false:

          sun.nio.ch.disableSystemWideOverlappingFileLockCheck

          Thanks,
          -Rick

          Show
          Rick Hillegas added a comment - - edited I have confirmed the following with Alan Bateman, the spec lead for NIO2: 1) The call we are making (java.nio.channels.FileChannel.tryLock) is supposed to raise java.nio.channels.OverlappingFileLockException if an attempt is made to lock the same section of a file twice in the same VM, regardless of what ClassLoaders the calls come from. So the results I reported are expected. 2) However, in Java 5 java.nio.channels.OverlappingFileLockException is NOT raised in this situation. This is a Java 5 bug. The bug was fixed in Java 6. 3) Alan is not aware of any changes made by IBM to the nio code in this area. He is curious about the claim that java.nio.channels.OverlappingFileLockException is not raised by IBM Java 6 on SuSE Linux. Could someone run the repro on IBM Java 6 on SuSE Linux in order to resolve this confusion? One more wrinkle: When (2) was fixed, a compatibility flag was introduced so that people could continue to enjoy the old behavior, that is, the behavior we don't want. You will get the bad Java 5 behavior if you set the following system property to anything other than false: sun.nio.ch.disableSystemWideOverlappingFileLockCheck Thanks, -Rick
          Hide
          Kathey Marsden added a comment -

          This is indeed excellent news.

          I tried on SuSE with:
          java version "1.6.0"
          Java(TM) SE Runtime Environment (build pxi3260sr4-20090216_01(SR4))
          IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux x86-32 jvmxi3260-20090215_29883 (JIT enabled, AOT enabled)
          J9VM - 20090215_029883_lHdSMr
          JIT - r9_20090213_2028
          GC - 20090213_AA)
          JCL - 20090214_01

          and indeed got the OverlappingFileLockException and tried with the property you mentioned and saw dual boot. When I get back from vacation, I will review my case records and revisit the dual boot case we saw with 1.6 and see if it is something else.

          I think the important thing is to have a solution moving forward, so think we should just catch this exception and throw a dual boot message. While an older jvm solution would be great, I think most of the legacy applications that have dual boot have learned their lesson the hard way with corruption and no longer do it. It is in application development that we need to catch this and throw a clear message. That development usually happens with newer JVM's.

          Do you have a bug number/link for the JVM fix?

          Show
          Kathey Marsden added a comment - This is indeed excellent news. I tried on SuSE with: java version "1.6.0" Java(TM) SE Runtime Environment (build pxi3260sr4-20090216_01(SR4)) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux x86-32 jvmxi3260-20090215_29883 (JIT enabled, AOT enabled) J9VM - 20090215_029883_lHdSMr JIT - r9_20090213_2028 GC - 20090213_AA) JCL - 20090214_01 and indeed got the OverlappingFileLockException and tried with the property you mentioned and saw dual boot. When I get back from vacation, I will review my case records and revisit the dual boot case we saw with 1.6 and see if it is something else. I think the important thing is to have a solution moving forward, so think we should just catch this exception and throw a dual boot message. While an older jvm solution would be great, I think most of the legacy applications that have dual boot have learned their lesson the hard way with corruption and no longer do it. It is in application development that we need to catch this and throw a clear message. That development usually happens with newer JVM's. Do you have a bug number/link for the JVM fix?
          Hide
          Rick Hillegas added a comment -

          Hi Kathey. The bug id is 6332756. See http://forums.sun.com/thread.jspa?threadID=5321079

          Show
          Rick Hillegas added a comment - Hi Kathey. The bug id is 6332756. See http://forums.sun.com/thread.jspa?threadID=5321079
          Hide
          Kathey Marsden added a comment -

          Thanks Rick for the reference. Was there consideration given to bakcporting the fix to 1.5, perhaps even with a different default for sun.nio.ch.disableSystemWideOverlappingFileLockCheck?

          The bug comments say "Fixing this issue is straight-forward but it introduces a behaviour change". At least having the property available in 1.5 to force the check would be helpful to eliminate the possibility of dual boot from corruption cases.

          Show
          Kathey Marsden added a comment - Thanks Rick for the reference. Was there consideration given to bakcporting the fix to 1.5, perhaps even with a different default for sun.nio.ch.disableSystemWideOverlappingFileLockCheck? The bug comments say "Fixing this issue is straight-forward but it introduces a behaviour change". At least having the property available in 1.5 to force the check would be helpful to eliminate the possibility of dual boot from corruption cases.
          Hide
          Rick Hillegas added a comment -

          > Was there consideration given to bakcporting the fix to 1.5, perhaps even with a different default for sun.nio.ch.disableSystemWideOverlappingFileLockCheck?

          I believe so. I think that proposal was rejected because the fix would introduce a compatibility change within a major version. The barrier to compatibility changes seems to be higher within major versions than between them. I suppose we could lobby to have that decision revisited but I doubt we would make much headway. The compatibility policy seems sensible to me.

          Show
          Rick Hillegas added a comment - > Was there consideration given to bakcporting the fix to 1.5, perhaps even with a different default for sun.nio.ch.disableSystemWideOverlappingFileLockCheck? I believe so. I think that proposal was rejected because the fix would introduce a compatibility change within a major version. The barrier to compatibility changes seems to be higher within major versions than between them. I suppose we could lobby to have that decision revisited but I doubt we would make much headway. The compatibility policy seems sensible to me.
          Hide
          Kathey Marsden added a comment -

          If for 1.5 the property default is the current bad behavior, I think there would be no compatibility concern and it would provide a worthwhile supportability improvement especially if it is a low risk fix overall.

          For now though it would be good to focus on getting a better error for Derby users using 1.6. If we find ourselves using the "Maybe they dual booted" excuse in a corruption case we can revisit whether to lobby for the 1.5 option. Sound ok?

          Show
          Kathey Marsden added a comment - If for 1.5 the property default is the current bad behavior, I think there would be no compatibility concern and it would provide a worthwhile supportability improvement especially if it is a low risk fix overall. For now though it would be good to focus on getting a better error for Derby users using 1.6. If we find ourselves using the "Maybe they dual booted" excuse in a corruption case we can revisit whether to lobby for the 1.5 option. Sound ok?
          Hide
          Rick Hillegas added a comment -

          I think it's a good idea to clean up how we handle java.nio.channels.OverlappingFileLockException . I agree that we can revisit this bug if that solution proves to be inadequate.

          Show
          Rick Hillegas added a comment - I think it's a good idea to clean up how we handle java.nio.channels.OverlappingFileLockException . I agree that we can revisit this bug if that solution proves to be inadequate.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-700-01-aa-catchOverlappingFileLockException.diff. This patch catches the OverlappingFileLockException, releases the channel, and sets the status to indicate that another Derby instance has locked the database. I have not run any tests yet other than verifying that the repro runs correctly on Java 6, but I'm posting this patch to see if the Store experts think this looks right. Once we agree that the code looks good, we can write some additional regression tests for this problem.

          Touches the following file:

          M java/engine/org/apache/derby/impl/io/DirFile4.java

          Show
          Rick Hillegas added a comment - Attaching derby-700-01-aa-catchOverlappingFileLockException.diff. This patch catches the OverlappingFileLockException, releases the channel, and sets the status to indicate that another Derby instance has locked the database. I have not run any tests yet other than verifying that the repro runs correctly on Java 6, but I'm posting this patch to see if the Store experts think this looks right. Once we agree that the code looks good, we can write some additional regression tests for this problem. Touches the following file: M java/engine/org/apache/derby/impl/io/DirFile4.java
          Hide
          Mike Matrigali added a comment -

          The new code looks ok to me. Maybe the comment could say what we expect to throw the
          OverlappingFileLockException, I assume it is:
          dbLock =lockFileChannel.tryLock();

          It will be great if we can count on jdk16 to fix this problem completely, finally. Too bad if we can't
          get a fix out of 15 as a lot of my users still use it also.

          Show
          Mike Matrigali added a comment - The new code looks ok to me. Maybe the comment could say what we expect to throw the OverlappingFileLockException, I assume it is: dbLock =lockFileChannel.tryLock(); It will be great if we can count on jdk16 to fix this problem completely, finally. Too bad if we can't get a fix out of 15 as a lot of my users still use it also.
          Hide
          Rick Hillegas added a comment -

          Thanks, Mike. I will add that comment. I will also add a test case. For the record, the regression tests ran cleanly for me.

          Show
          Rick Hillegas added a comment - Thanks, Mike. I will add that comment. I will also add a test case. For the record, the regression tests ran cleanly for me.
          Hide
          Kathey Marsden added a comment -

          There was a test in the earlier patches that might be of use.

          Show
          Kathey Marsden added a comment - There was a test in the earlier patches that might be of use.
          Hide
          Rick Hillegas added a comment -

          Attaching derby-700-01-ab-catchOverlappingFileLockException.diff. This adds the comment which Mike recommended and adds a test culled from one of the previous patches. I will run regression tests again.

          Touches the following files:

          M java/engine/org/apache/derby/impl/io/DirFile4.java

          Graceful handling of OverlappingFileLockException.

          M java/testing/org/apache/derbyTesting/functionTests/tests/store/_Suite.java
          A java/testing/org/apache/derbyTesting/functionTests/tests/store/ClassLoaderBootTest.java
          M java/testing/org/apache/derbyTesting/junit/JDBC.java

          Two test cases. One runs in all environments and checks that anyone can boot a database that has been shut down already. The other test case verifies that a database can't be booted twice if the VM is at Java 6 or higher.

          Show
          Rick Hillegas added a comment - Attaching derby-700-01-ab-catchOverlappingFileLockException.diff. This adds the comment which Mike recommended and adds a test culled from one of the previous patches. I will run regression tests again. Touches the following files: M java/engine/org/apache/derby/impl/io/DirFile4.java Graceful handling of OverlappingFileLockException. M java/testing/org/apache/derbyTesting/functionTests/tests/store/_Suite.java A java/testing/org/apache/derbyTesting/functionTests/tests/store/ClassLoaderBootTest.java M java/testing/org/apache/derbyTesting/junit/JDBC.java Two test cases. One runs in all environments and checks that anyone can boot a database that has been shut down already. The other test case verifies that a database can't be booted twice if the VM is at Java 6 or higher.
          Hide
          Rick Hillegas added a comment -

          The tests ran cleanly for me except for an error in ManagementMBeanTest. I ran the test by itself and it passed cleanly. I doubt that this fix affected that test. Here is the error:

          1) testStartStopManagementFromApplication(org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest)junit.framework.AssertionFailedError: expected:<2> but was:<8>
          at org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest.startStopManagement(ManagementMBeanTest.java:86)
          at org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest.testStartStopManagementFromApplication(ManagementMBeanTest.java:56)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)

          Show
          Rick Hillegas added a comment - The tests ran cleanly for me except for an error in ManagementMBeanTest. I ran the test by itself and it passed cleanly. I doubt that this fix affected that test. Here is the error: 1) testStartStopManagementFromApplication(org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest)junit.framework.AssertionFailedError: expected:<2> but was:<8> at org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest.startStopManagement(ManagementMBeanTest.java:86) at org.apache.derbyTesting.functionTests.tests.management.ManagementMBeanTest.testStartStopManagementFromApplication(ManagementMBeanTest.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          Hide
          Rick Hillegas added a comment -

          Committed derby-700-01-ab-catchOverlappingFileLockException.diff at subversion revision 805448.

          Show
          Rick Hillegas added a comment - Committed derby-700-01-ab-catchOverlappingFileLockException.diff at subversion revision 805448.
          Hide
          Rick Hillegas added a comment -

          Ported 805448 from trunk to 10.5 branch at subversion revision 805454.

          Show
          Rick Hillegas added a comment - Ported 805448 from trunk to 10.5 branch at subversion revision 805454.
          Hide
          Rick Hillegas added a comment -

          Ported 805448 from trunk to 10.4 branch at subversion revision 805485.

          Show
          Rick Hillegas added a comment - Ported 805448 from trunk to 10.4 branch at subversion revision 805485.
          Hide
          Rick Hillegas added a comment -

          Ported 805448 from trunk to 10.3 branch at subversion revision 805486.

          Show
          Rick Hillegas added a comment - Ported 805448 from trunk to 10.3 branch at subversion revision 805486.
          Hide
          Rick Hillegas added a comment -

          Ported 805448 from trunk to 10.2 branch at subversion revision 805523.

          Show
          Rick Hillegas added a comment - Ported 805448 from trunk to 10.2 branch at subversion revision 805523.
          Hide
          Rick Hillegas added a comment -

          At this point I have ported the patch to all branches from 10.2 forward. If someone would like to port the patch to 10.1, please do--my 10.1 build environment no longer works and the effort required to repair it does not seem justified.

          I am inclined to resolve this issue if there are no objections.

          Show
          Rick Hillegas added a comment - At this point I have ported the patch to all branches from 10.2 forward. If someone would like to port the patch to 10.1, please do--my 10.1 build environment no longer works and the effort required to repair it does not seem justified. I am inclined to resolve this issue if there are no objections.
          Hide
          Knut Anders Hatlen added a comment - - edited

          ManagementMBeanTest started failing in the Tinderbox (trunk and 10.4) after the check-in:
          http://dbtg.foundry.sun.com/derby/test/tinderbox_10.4_16/jvm1.6/testing/Limited/testSummary-805541.html
          http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/jvm1.6/testing/Limited/testSummary-805576.html

          [update: Ole has now logged this problem as DERBY-4356]

          Show
          Knut Anders Hatlen added a comment - - edited ManagementMBeanTest started failing in the Tinderbox (trunk and 10.4) after the check-in: http://dbtg.foundry.sun.com/derby/test/tinderbox_10.4_16/jvm1.6/testing/Limited/testSummary-805541.html http://dbtg.foundry.sun.com/derby/test/tinderbox_trunk16/jvm1.6/testing/Limited/testSummary-805576.html [update: Ole has now logged this problem as DERBY-4356]
          Hide
          Knut Anders Hatlen added a comment -

          I noticed this in the new test (ClassLoaderBootTest):

          • it would be good to null out the three ClassLoader fields in tearDown() so that the class loaders and the classes they have loaded can be garbage collected (otherwise they'll stay in memory until the full JUnit run has finished)
          • the decorateSQL() method creates a table which is said to be used to test export, but I don't see any code for testing export
          • the test cases are wrapped in try {...}

            catch (SQLException se)

            { dumpSQLException(se); }

            . Won't this prevent the reporting of the exception in JUnit? Better to let the exceptions be thrown and handled by the framework?

          • the methods getFileWhichLoadedClass(Class) and getURL(File) have no callers
          Show
          Knut Anders Hatlen added a comment - I noticed this in the new test (ClassLoaderBootTest): it would be good to null out the three ClassLoader fields in tearDown() so that the class loaders and the classes they have loaded can be garbage collected (otherwise they'll stay in memory until the full JUnit run has finished) the decorateSQL() method creates a table which is said to be used to test export, but I don't see any code for testing export the test cases are wrapped in try {...} catch (SQLException se) { dumpSQLException(se); } . Won't this prevent the reporting of the exception in JUnit? Better to let the exceptions be thrown and handled by the framework? the methods getFileWhichLoadedClass(Class) and getURL(File) have no callers
          Hide
          Rick Hillegas added a comment -

          Thanks for those good suggestions, Knut. They are addressed by the patch I am attaching: derby-700-02-aa-testCleanup.diff.

          In particular, I added a tearDown() method which resets the class loader and releases resources. It looks to me like those extra try/catch blocks were resetting the class loader on the way out--I think tearDown() is a better place for that work.

          I'll run full regression tests against this fix.

          Show
          Rick Hillegas added a comment - Thanks for those good suggestions, Knut. They are addressed by the patch I am attaching: derby-700-02-aa-testCleanup.diff. In particular, I added a tearDown() method which resets the class loader and releases resources. It looks to me like those extra try/catch blocks were resetting the class loader on the way out--I think tearDown() is a better place for that work. I'll run full regression tests against this fix.
          Hide
          Rick Hillegas added a comment -

          Committted derby-700-02-aa-testCleanup.diff at subversion revision 805858. Tests ran cleanly for except for the spillover consequences for ManagementMBeanTest--the regression there is unchanged.

          Show
          Rick Hillegas added a comment - Committted derby-700-02-aa-testCleanup.diff at subversion revision 805858. Tests ran cleanly for except for the spillover consequences for ManagementMBeanTest--the regression there is unchanged.
          Hide
          Myrna van Lunteren added a comment -

          I backported the change to DirFile4.java of 805448 to 10.1 with revision 805978.

          Show
          Myrna van Lunteren added a comment - I backported the change to DirFile4.java of 805448 to 10.1 with revision 805978.
          Hide
          Rick Hillegas added a comment -

          Resolving this issue. It looks like the noise in ManagementMBeanTest has been cleaned up.

          Show
          Rick Hillegas added a comment - Resolving this issue. It looks like the noise in ManagementMBeanTest has been cleaned up.
          Hide
          Mamta A. Satoor added a comment -

          Rick, I was wondering if you knew the correlation between DERBY-700 and DERBY-4361. testDefault fixture in ErrorStreamTest has been failing consistently after the checkin went in for DERBY-700. If the test orders are changed and ErrorStreamTest is run after the test ClassLoaderBootTest which was added for DERBY-700, testDefault fixture does not fail anymore. I will look further into what the relationship between the 2 jira entries might be but I thought I would check if it rings a bell for you. Thanks

          Show
          Mamta A. Satoor added a comment - Rick, I was wondering if you knew the correlation between DERBY-700 and DERBY-4361 . testDefault fixture in ErrorStreamTest has been failing consistently after the checkin went in for DERBY-700 . If the test orders are changed and ErrorStreamTest is run after the test ClassLoaderBootTest which was added for DERBY-700 , testDefault fixture does not fail anymore. I will look further into what the relationship between the 2 jira entries might be but I thought I would check if it rings a bell for you. Thanks
          Hide
          Kathey Marsden added a comment -

          Adjusting fix version. Fix only works with JDK 1.6 which I think is first supported with 10.3, so even though the Derby part of the fix went back to 10.1, it will only be effective there if the older jvms get fixed. I will put a request in for the IBM jvm to have the jvm fix backported to at least 1.5.

          Show
          Kathey Marsden added a comment - Adjusting fix version. Fix only works with JDK 1.6 which I think is first supported with 10.3, so even though the Derby part of the fix went back to 10.1, it will only be effective there if the older jvms get fixed. I will put a request in for the IBM jvm to have the jvm fix backported to at least 1.5.
          Hide
          Knut Anders Hatlen added a comment -

          I think 10.1 is supposed to work with Java 6 as well. Support for JDBC 4.0 wasn't added until 10.2, but as long as your application only calls JDBC 3.0 functionality, you should be able to run it with older Derby versions under Java 6. I frequently run older versions with Java 6, for instance when I search for the version that introduced a certain bug.

          Show
          Knut Anders Hatlen added a comment - I think 10.1 is supposed to work with Java 6 as well. Support for JDBC 4.0 wasn't added until 10.2, but as long as your application only calls JDBC 3.0 functionality, you should be able to run it with older Derby versions under Java 6. I frequently run older versions with Java 6, for instance when I search for the version that introduced a certain bug.
          Hide
          Kathey Marsden added a comment -

          Attaching a new release note. The old one referred to the original proposed fix which was not adopted. Please review.

          Show
          Kathey Marsden added a comment - Attaching a new release note. The old one referred to the original proposed fix which was not adopted. Please review.
          Hide
          Myrna van Lunteren added a comment -

          Looks good, but there were 2 typos...uploading a corrected version.

          Show
          Myrna van Lunteren added a comment - Looks good, but there were 2 typos...uploading a corrected version.
          Hide
          Rick Hillegas added a comment -

          Thanks Kathey and Myrna. The release note looks good and it passes the ReleaseNoteReader's checks.

          Show
          Rick Hillegas added a comment - Thanks Kathey and Myrna. The release note looks good and it passes the ReleaseNoteReader's checks.
          Hide
          Kathey Marsden added a comment -

          Updating fix version for 10.1 fix

          Show
          Kathey Marsden added a comment - Updating fix version for 10.1 fix

            People

            • Assignee:
              Rick Hillegas
              Reporter:
              Kathey Marsden
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development