Derby
  1. Derby
  2. DERBY-5671

NsTest does not run on trunk do multiple issues stemming from concurrency improvements

    Details

    • Issue & fix info:
      High Value Fix
    • Bug behavior facts:
      Regression

      Description

      As I understand it at least since September 30 of last year, the system test NsTest has been broken on trunk. In these six months the test has not been runnable, so we do not know if new issues have been introduced with sequence generators or most importantly with auto-increment columns that are now based on them, which many, many applications rely upon. Even if the known problems are fixed later in the 10.9 release cycle and new problems are exposed, we won't be able to go back to any point in time to discover when they might be released.

      In 10.8 we coped with this problem by backing out the concurrency improvements (DERBY-5448) pending fixes for DERBY-5422, DERBY-5454, DERBY-5430. Currently none of those issues have been assigned. Since this has been going on now for six months, I think we urgently need to stabiliize auto-increment columns and get this test running again on trunk. I can see three possible options.
      1) Someone with interest assign themselves to these issues and make significant progress over the next few weeks.
      2) Make the concurrency improvements optional with a property which defaults to false (I don't know if this is practical)
      3) Back the concurrency performance improvements out of trunk until these issues have been resolved and the change can be resubmitted.

      I realize that NsTest is not the easiest test to work with but it does seem to have found serious problems with generated columns that I think users are likely to hit. In the past, a similiar disregard for mailjdbc exposing a corruption issue meant that we actually released a bad corruption issue that I know hit many users of Derby before we addressed it. Autoincrement is widely, widely, used. We need to get it stabilized and the test running on trunk. Although the system tests are not particularly easy to deal with, they are all we have and they do find issues.

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          linking related issues

          Show
          Kathey Marsden added a comment - linking related issues
          Hide
          Rick Hillegas added a comment -

          Hi Kathey,

          I am working on fixing the correctness problems with sequence generators. I should have a candidate solution ready for review soon. After that, I will be ready to discuss what we should do about identity columns. There are a number of issues in this area, linked to derby-5495, and they need to be addressed before we release 10.9. That includes the sequence/identity problems with nstest. Thanks.

          Show
          Rick Hillegas added a comment - Hi Kathey, I am working on fixing the correctness problems with sequence generators. I should have a candidate solution ready for review soon. After that, I will be ready to discuss what we should do about identity columns. There are a number of issues in this area, linked to derby-5495, and they need to be addressed before we release 10.9. That includes the sequence/identity problems with nstest. Thanks.
          Hide
          Kathey Marsden added a comment -

          Thank you Rick for looking at the sequence issues and working to get nstest running again. Looking not only at the number of issues linked to DERBY-4995 but also looking briefly at the scope and complexity of changes being proposed, for example in DERBY-5493, I really think that sequences are not mature and solid enough to be the basis of identity columns by default. We need to get identity columns back to a stable state on trunk really as soon as possible. Six months has been entirely too long for them to be in an untestable state. I think within the next few weeks nsTest should be running cleanly again.

          If the goal is to move identity columns over to be sequence based, I think the only safe approach considering the wide use in the field is to expose that expose that as an optional experimental feature.

          1) Restore the existing 10.8 default implementation for identity columns.
          2) For 10.9, create an option which allows users to alternately try the new implementation which defaults to false. Encourage users to test their applications with both options and give the option an attractive performance boosting name.
          3) In an future release, after getting sufficient feedback from users using the new option and resolving all known issues and maybe writning some additional identity stress tests, switch the default and deprecate the opton.
          4) In some far futre release deprecate the option and clean up the old code.

          I don't know how intertangled the two implementations are. If they are, perhaps the safest thing would be to back out the concurrency changes and then start on the property based approch from scratch with the switchable goal.

          Show
          Kathey Marsden added a comment - Thank you Rick for looking at the sequence issues and working to get nstest running again. Looking not only at the number of issues linked to DERBY-4995 but also looking briefly at the scope and complexity of changes being proposed, for example in DERBY-5493 , I really think that sequences are not mature and solid enough to be the basis of identity columns by default. We need to get identity columns back to a stable state on trunk really as soon as possible. Six months has been entirely too long for them to be in an untestable state. I think within the next few weeks nsTest should be running cleanly again. If the goal is to move identity columns over to be sequence based, I think the only safe approach considering the wide use in the field is to expose that expose that as an optional experimental feature. 1) Restore the existing 10.8 default implementation for identity columns. 2) For 10.9, create an option which allows users to alternately try the new implementation which defaults to false. Encourage users to test their applications with both options and give the option an attractive performance boosting name. 3) In an future release, after getting sufficient feedback from users using the new option and resolving all known issues and maybe writning some additional identity stress tests, switch the default and deprecate the opton. 4) In some far futre release deprecate the option and clean up the old code. I don't know how intertangled the two implementations are. If they are, perhaps the safest thing would be to back out the concurrency changes and then start on the property based approch from scratch with the switchable goal.
          Hide
          Rick Hillegas added a comment -

          Subversion revision 1311285 (see DERBY-5687) backed out the concurrency improvements introduced by DERBY-4437. It would be good to see if NsTest runs on the trunk now. Thanks.

          Show
          Rick Hillegas added a comment - Subversion revision 1311285 (see DERBY-5687 ) backed out the concurrency improvements introduced by DERBY-4437 . It would be good to see if NsTest runs on the trunk now. Thanks.
          Hide
          Myrna van Lunteren added a comment -

          Thanks Rick.
          I've been running nstest with sane tests in a small configuration (passing in 'Embedded small' - 10x at each 'run'/build) and that seems to be much more manageable than before the backing out.
          I am now running the same small configuration with insane jars, and things look ok there too.
          I will confirm a 'full' nstest run after that. Theoretically I would let it run for 2 weeks' (or until some security-minded person/tool boots the machine to install windows security fixes from under me).

          Show
          Myrna van Lunteren added a comment - Thanks Rick. I've been running nstest with sane tests in a small configuration (passing in 'Embedded small' - 10x at each 'run'/build) and that seems to be much more manageable than before the backing out. I am now running the same small configuration with insane jars, and things look ok there too. I will confirm a 'full' nstest run after that. Theoretically I would let it run for 2 weeks' (or until some security-minded person/tool boots the machine to install windows security fixes from under me).
          Hide
          Myrna van Lunteren added a comment -

          Unfortunately, my test runs (one with networkserver on linux, one with embedded on windows XP) only ran for 3 days before the machines booted automatically, but together with the runs of the shorter test format, I think this gives enough to say this is fixed.

          It did seem that I got more 40XL1 (timeout) messages than I used to, but we always got some of those so I think this is ok.

          Resolving.

          Show
          Myrna van Lunteren added a comment - Unfortunately, my test runs (one with networkserver on linux, one with embedded on windows XP) only ran for 3 days before the machines booted automatically, but together with the runs of the shorter test format, I think this gives enough to say this is fixed. It did seem that I got more 40XL1 (timeout) messages than I used to, but we always got some of those so I think this is ok. Resolving.
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
          Hide
          Kathey Marsden added a comment -

          This is not relevant to 10.8. Marking backport reject.

          Show
          Kathey Marsden added a comment - This is not relevant to 10.8. Marking backport reject.

            People

            • Assignee:
              Unassigned
              Reporter:
              Kathey Marsden
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development