Derby
  1. Derby
  2. DERBY-4038

On Z/OS store/access.sql fails with encryptionAES and encryptionDES

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.3.3.0
    • Fix Version/s: 10.6.1.0
    • Component/s: Test
    • Labels:
      None
    • Environment:
    • Urgency:
      Normal
    • Bug behavior facts:
      Regression Test Failure

      Description

      On Z/OS access.sql fails for encryptionAES and encryptionDES. The diff is big but I can't quite make out what the problem is. Maybe just a change of query plans.

      1. access.diff
        134 kB
        Kathey Marsden
      2. access.out
        299 kB
        Kathey Marsden
      3. DERBY-4038.diff_1
        80 kB
        Myrna van Lunteren
      4. DERBY-4038.diff_2
        80 kB
        Myrna van Lunteren
      5. DERBY-4038.diff_3
        80 kB
        Myrna van Lunteren
      6. DERBY-4038.diff_4
        438 kB
        Myrna van Lunteren
      7. DERBY-4038.stat_4
        2 kB
        Myrna van Lunteren
      8. runaccess.out
        11 kB
        Myrna van Lunteren

        Activity

        Gavin made changes -
        Workflow jira [ 12451397 ] Default workflow, editable Closed status [ 12799903 ]
        Kathey Marsden made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Myrna van Lunteren made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Issue & fix info [Patch Available]
        Fix Version/s 10.6.0.0 [ 12313727 ]
        Resolution Fixed [ 1 ]
        Hide
        Myrna van Lunteren added a comment -

        commited the latest patch with revision 897938.

        In the past, some of the steps in the access.sql test caused instability (i.e. intermittent errors), specifically in the area of the spacetable checks. The offending checks were removed (see revision 393896), but it's possible that the converted test has some of this. I think those should be handled in separate JIRA issues; the main problem of this issue has been corrected.
        (I did run suites.All without errors on zseries with patched jars).

        Show
        Myrna van Lunteren added a comment - commited the latest patch with revision 897938. In the past, some of the steps in the access.sql test caused instability (i.e. intermittent errors), specifically in the area of the spacetable checks. The offending checks were removed (see revision 393896), but it's possible that the converted test has some of this. I think those should be handled in separate JIRA issues; the main problem of this issue has been corrected. (I did run suites.All without errors on zseries with patched jars).
        Myrna van Lunteren made changes -
        Assignee Myrna van Lunteren [ myrna ]
        Issue & fix info [Patch Available]
        Myrna van Lunteren made changes -
        Attachment DERBY-4038.diff_4 [ 12429678 ]
        Attachment DERBY-4038.stat_4 [ 12429679 ]
        Hide
        Myrna van Lunteren added a comment -

        Indeed the difference with that property was an ordering differrence.
        I'm uploading patch DERBY-4038.diff_4 which has added order by clauses to the statements showing an ordering difference depending on the property.
        So, now the test willrun with the property when it's just the test, but without when the databases are encrypted.
        In this patch I also removed the old access.sql test.
        Regressions ran fine.

        I intend to commit this in a day or so if there are no further comments.

        Show
        Myrna van Lunteren added a comment - Indeed the difference with that property was an ordering differrence. I'm uploading patch DERBY-4038 .diff_4 which has added order by clauses to the statements showing an ordering difference depending on the property. So, now the test willrun with the property when it's just the test, but without when the databases are encrypted. In this patch I also removed the old access.sql test. Regressions ran fine. I intend to commit this in a day or so if there are no further comments.
        Hide
        Mike Matrigali added a comment -

        I would be fine getting the test to run without that property. I did some investigation and don't see any reason for it for the validity of the test. My best guess is this is one way a long time ago we used to try to avoid intermittent query plan / ordering problems.

        Looking at the original test I see one of the results is as follows. Any chance what you are seeing is not "different" lengths but different orders. In that case I would say just throw an order by in and don't worry about the property:

        ij> select LENGTH(a) from long2;
        1
        -----------
        3240
        3240
        3240
        3240
        3240
        3240
        3240
        3240
        3240
        3240
        3024
        3024
        3024
        3024
        3024
        3024
        3024
        3024
        3024
        3024

        Show
        Mike Matrigali added a comment - I would be fine getting the test to run without that property. I did some investigation and don't see any reason for it for the validity of the test. My best guess is this is one way a long time ago we used to try to avoid intermittent query plan / ordering problems. Looking at the original test I see one of the results is as follows. Any chance what you are seeing is not "different" lengths but different orders. In that case I would say just throw an order by in and don't worry about the property: ij> select LENGTH(a) from long2; 1 ----------- 3240 3240 3240 3240 3240 3240 3240 3240 3240 3240 3024 3024 3024 3024 3024 3024 3024 3024 3024 3024
        Hide
        Myrna van Lunteren added a comment -

        I have played with various scenarios...
        I can modify AccessTest fixtures to create the PADSTRING stored procedure without using the decorateSQL...
        Or I can modify EncryptionSuite to have a setUp and tearDown method to create and shutdown the stored procedure...

        But the SystemPropertyTestSetup and the Decorator.encryptedDatabase don't work together, because for SystemPropertyTestSetup the database has to be rebooted and the required boot password is local to Decorator.encryptedDatabase().

        I'll experiment getting a EncryptedSystemPropertyTestSetup, but in the mean time, really the only bit that's affected by not having the SystemProperties is that in the test case for CS1346 (a and b), half the time the length(a) of long1 and long2 is different when we have the property derby.optimizer.ruleBasedOptimization set to true (3240) vs. when it is false (3024).
        (the other half of the time it's already 3024 characters).

        Is that part of the test important to be verified with encryption?

        Show
        Myrna van Lunteren added a comment - I have played with various scenarios... I can modify AccessTest fixtures to create the PADSTRING stored procedure without using the decorateSQL... Or I can modify EncryptionSuite to have a setUp and tearDown method to create and shutdown the stored procedure... But the SystemPropertyTestSetup and the Decorator.encryptedDatabase don't work together, because for SystemPropertyTestSetup the database has to be rebooted and the required boot password is local to Decorator.encryptedDatabase(). I'll experiment getting a EncryptedSystemPropertyTestSetup, but in the mean time, really the only bit that's affected by not having the SystemProperties is that in the test case for CS1346 (a and b), half the time the length(a) of long1 and long2 is different when we have the property derby.optimizer.ruleBasedOptimization set to true (3240) vs. when it is false (3024). (the other half of the time it's already 3024 characters). Is that part of the test important to be verified with encryption?
        Myrna van Lunteren made changes -
        Attachment DERBY-4038.diff_3 [ 12429023 ]
        Hide
        Myrna van Lunteren added a comment -

        Attaching a 3rd patch.
        I think this addresses the previous comments, including using SYSCS_DIAG.SpaceTable calls. The problem with the 0 pages visited was (duh) because no data was actually fetched in the converted test - after I added a check the statistics do show 1 pages visted.

        My next step is to run this test through the various encryptions.

        I tried to add this test to suites.EncryptionSuite.baseSuite, which then looked like this:
        ------------
        TestSuite suite = new TestSuite("Encryption Algorithm: " + algorithm);
        //...
        suite.addTestSuite(EncryptionSuite.class);
        suite.addTestSuite(org.apache.derbyTesting.functionTests.tests.store.AccessTest.class);
        return suite;
        ------------
        but that resulted in the decorateSQL in AccessTest not being honored.

        Does anyone have a suggestion how to put those two pieces of the puzzle together?

        Show
        Myrna van Lunteren added a comment - Attaching a 3rd patch. I think this addresses the previous comments, including using SYSCS_DIAG.SpaceTable calls. The problem with the 0 pages visited was (duh) because no data was actually fetched in the converted test - after I added a check the statistics do show 1 pages visted. My next step is to run this test through the various encryptions. I tried to add this test to suites.EncryptionSuite.baseSuite, which then looked like this: ------------ TestSuite suite = new TestSuite("Encryption Algorithm: " + algorithm); //... suite.addTestSuite(EncryptionSuite.class); suite.addTestSuite(org.apache.derbyTesting.functionTests.tests.store.AccessTest.class); return suite; ------------ but that resulted in the decorateSQL in AccessTest not being honored. Does anyone have a suggestion how to put those two pieces of the puzzle together?
        Myrna van Lunteren made changes -
        Attachment DERBY-4038.diff_2 [ 12428378 ]
        Attachment runaccess.out [ 12428379 ]
        Hide
        Myrna van Lunteren added a comment -

        Attaching a second patch.
        I've run the test in the suite, and there were 2 NPE failures in unrelated tests because of my changes to the runtimestatisticsParser, so I've fixed that up.
        I've also removed a coupld of the TODOs that seemed to be ok.

        I've tried to figure out why the converted test fixtures regarding CS track 4595 show 0 pages visited instead of 1 (with index scans) but I'm no closer.
        I've printed them out instead of letting it assert -
        in the way this test works currently, various subsequent test fixtures in this same test will fail if one fails, because tables won't get dropped.
        I'm attaching the output from the test.

        I'll work on adding the proper SYSCS_DIAG.Spacetable calls next, this patch isn't ready for commit yet.

        Show
        Myrna van Lunteren added a comment - Attaching a second patch. I've run the test in the suite, and there were 2 NPE failures in unrelated tests because of my changes to the runtimestatisticsParser, so I've fixed that up. I've also removed a coupld of the TODOs that seemed to be ok. I've tried to figure out why the converted test fixtures regarding CS track 4595 show 0 pages visited instead of 1 (with index scans) but I'm no closer. I've printed them out instead of letting it assert - in the way this test works currently, various subsequent test fixtures in this same test will fail if one fails, because tables won't get dropped. I'm attaching the output from the test. I'll work on adding the proper SYSCS_DIAG.Spacetable calls next, this patch isn't ready for commit yet.
        Hide
        Mike Matrigali added a comment -

        comments from review of the patch:

        > - removed the call to procedure WAIT_FOR_POST_COMMIT as it didn't appear to
        > be used anywhere.
        sounds good, it is not called so remove all reference to it.
        >
        > - Was I misinterpreting the use of WAIT_FOR_POST_COMMIT? What would be the pur
        pose of it?
        It is not called, leftover cruft.

        comments on TODO's:
        + // TODO: check on what the following comment from the original test
        + // means. I don't think this test is currently checking whether the
        + // optimizer is picking the index, might be worthwhile adding it.

        At this point not sure history of this, I would just remove comment.

        + // as subset of columns, with qualifier not in list
        + //TODO: this comment is not matching the expected results. OK?
        + // should be 5,3,1 and 50,30,10
        + doQuery(st, "select e, c, a from foo where foo.b = 20",
        comment looks bad, just fix the comment.

        + // TODO: how does this check that rows are reclaimed or not? How?
        looks like all work was moved to the subroutine, so probably should move
        comments to the subroutine from the caller. At least the part of the comments
        that make sense.

        The code just tries to exercise the post commit path, but agree that it does
        not specifically verify the reclaim. This is hard to do from a user level
        test, in a way that is reproducible across different hardware, jvms. The
        values are just hand picked to exercise different paths through the code.
        One could try to use space table to verify sizes of the tables, but this
        has caused non-reproducible results in cases where timing of background threads
        are different.

        // TODO: change this call into a call to SYSCS_DIAG.SPACE_TABLE
        I agree, it should use the current supported interface.

        + // TODO: original test had 1 pages visited.
        + // But now that it's converted, we get 0 pages, which also
        + // seems reasonable to me. Or not?
        This seems like it might be a problem. Can you print out the full query
        plan on failure of the assert. It is hard to know what is going on from
        just this one piece of info.

        The main point of this test is that is should use the index whether there
        are zero or 1 rows. We choose to force this because if we use index we get
        row level locking, but if we use table scan we are going to get table level
        locking - so better concurrency in the edge case.

        + // TODO: in which way does this check qualifiers?
        qualifiers are just things like a = 1 and b = 20. When test was written the
        code in language and store to evaluate these were changing. The changes
        especially had some new logic to handle "and" and "or". So the test just
        has some of these cases and tests them by making sure the right results
        come back.

        Show
        Mike Matrigali added a comment - comments from review of the patch: > - removed the call to procedure WAIT_FOR_POST_COMMIT as it didn't appear to > be used anywhere. sounds good, it is not called so remove all reference to it. > > - Was I misinterpreting the use of WAIT_FOR_POST_COMMIT? What would be the pur pose of it? It is not called, leftover cruft. comments on TODO's: + // TODO: check on what the following comment from the original test + // means. I don't think this test is currently checking whether the + // optimizer is picking the index, might be worthwhile adding it. At this point not sure history of this, I would just remove comment. + // as subset of columns, with qualifier not in list + //TODO: this comment is not matching the expected results. OK? + // should be 5,3,1 and 50,30,10 + doQuery(st, "select e, c, a from foo where foo.b = 20", comment looks bad, just fix the comment. + // TODO: how does this check that rows are reclaimed or not? How? looks like all work was moved to the subroutine, so probably should move comments to the subroutine from the caller. At least the part of the comments that make sense. The code just tries to exercise the post commit path, but agree that it does not specifically verify the reclaim. This is hard to do from a user level test, in a way that is reproducible across different hardware, jvms. The values are just hand picked to exercise different paths through the code. One could try to use space table to verify sizes of the tables, but this has caused non-reproducible results in cases where timing of background threads are different. // TODO: change this call into a call to SYSCS_DIAG.SPACE_TABLE I agree, it should use the current supported interface. + // TODO: original test had 1 pages visited. + // But now that it's converted, we get 0 pages, which also + // seems reasonable to me. Or not? This seems like it might be a problem. Can you print out the full query plan on failure of the assert. It is hard to know what is going on from just this one piece of info. The main point of this test is that is should use the index whether there are zero or 1 rows. We choose to force this because if we use index we get row level locking, but if we use table scan we are going to get table level locking - so better concurrency in the edge case. + // TODO: in which way does this check qualifiers? qualifiers are just things like a = 1 and b = 20. When test was written the code in language and store to evaluate these were changing. The changes especially had some new logic to handle "and" and "or". So the test just has some of these cases and tests them by making sure the right results come back.
        Myrna van Lunteren made changes -
        Attachment DERBY-4038.diff_1 [ 12427752 ]
        Hide
        Myrna van Lunteren added a comment -

        I'm attaching a patch that's a first attempt at converting this test.

        I made a few changes where I figured the original test was doing something odd:

        • as indicated in a previous comment, changed the sequence of events and updating of statistics in the partial row runtime statistics test case / fixture so it's now doing 4 different things. In doing so, there's no longer a duplicate index warning returned for a create index foo_cover statement; I assumed generating that error that wasn't the point of the test.
        • changed the table in the CheckTable call in CS track 2241 test case from 'FOO' to B2441, assuming it was meant to check the table in the test case at hand.
        • fixed up test case for cs track 2778:
        • made the create index work
        • fixed up the call using --derby-properties which for some reason gave a syntax error.
        • added a few select statements to CS495B (because then I could filter out
          a method to do the test 3 times)
        • removed the call to procedure WAIT_FOR_POST_COMMIT as it didn't appear to
          be used anywhere.

        I still need to:

        • add it into store._Suite.java (and run suites.All to make sure this test doesn't interfere with other tests)
        • hang the encryption scenarios around this one
          (the access.sql test is run with encryption Algorithms AES,Blowfish,CFB,DES,ECB,OFB (through various encryption<ABC>.runalls) and DESede (through encryption all running storemats))

        However, I'd like some input on this conversion so far.

        • Was I misinterpreting the use of WAIT_FOR_POST_COMMIT? What would be the purpose of it?
        • I added more questions in //TODOs within the converted test, the main of which are that it seems to me some of the test cases don't seem to do any checking of what they're supposed to check. For instance, the reclaim deleted rows doesn't seem to check anything is reclaimed.
        • Are there obvious improvements to the flow of the test framework?
        Show
        Myrna van Lunteren added a comment - I'm attaching a patch that's a first attempt at converting this test. I made a few changes where I figured the original test was doing something odd: as indicated in a previous comment, changed the sequence of events and updating of statistics in the partial row runtime statistics test case / fixture so it's now doing 4 different things. In doing so, there's no longer a duplicate index warning returned for a create index foo_cover statement; I assumed generating that error that wasn't the point of the test. changed the table in the CheckTable call in CS track 2241 test case from 'FOO' to B2441, assuming it was meant to check the table in the test case at hand. fixed up test case for cs track 2778: made the create index work fixed up the call using --derby-properties which for some reason gave a syntax error. added a few select statements to CS495B (because then I could filter out a method to do the test 3 times) removed the call to procedure WAIT_FOR_POST_COMMIT as it didn't appear to be used anywhere. I still need to: add it into store._Suite.java (and run suites.All to make sure this test doesn't interfere with other tests) hang the encryption scenarios around this one (the access.sql test is run with encryption Algorithms AES,Blowfish,CFB,DES,ECB,OFB (through various encryption<ABC>.runalls) and DESede (through encryption all running storemats)) However, I'd like some input on this conversion so far. Was I misinterpreting the use of WAIT_FOR_POST_COMMIT? What would be the purpose of it? I added more questions in //TODOs within the converted test, the main of which are that it seems to me some of the test cases don't seem to do any checking of what they're supposed to check. For instance, the reclaim deleted rows doesn't seem to check anything is reclaimed. Are there obvious improvements to the flow of the test framework?
        Hide
        Myrna van Lunteren added a comment -

        I've been working on converting access.sql; and I noticed that section testing "partial row runtime statistics" actually has a section that's not doing what it was meant to do judging by the comments; it looks as if there's supposed to be 4 iterations

        • no index
        • with index
        • no index, deleted rows
        • with index, deleted rows
          But, because the index does not get dropped anywhere, we get an error creating the index for the fourth iteration, and the 3rd and 4th iterations are exactly the same.
          Perhaps the 4th iteration was copied by mistake, for I see that without index, you don't see the "deleted rows visited" line in the statistics.
          Still, when running the queries without an index but with deleted rows the statistics show a different (higher) number of total rows visited, so I'll be modifying the converted test to exercise that combination too.
        Show
        Myrna van Lunteren added a comment - I've been working on converting access.sql; and I noticed that section testing "partial row runtime statistics" actually has a section that's not doing what it was meant to do judging by the comments; it looks as if there's supposed to be 4 iterations no index with index no index, deleted rows with index, deleted rows But, because the index does not get dropped anywhere, we get an error creating the index for the fourth iteration, and the 3rd and 4th iterations are exactly the same. Perhaps the 4th iteration was copied by mistake, for I see that without index, you don't see the "deleted rows visited" line in the statistics. Still, when running the queries without an index but with deleted rows the statistics show a different (higher) number of total rows visited, so I'll be modifying the converted test to exercise that combination too.
        Hide
        Myrna van Lunteren added a comment -

        I forgot to mention that I ran tests with 10.5.1.1 - because no errors got reported with 10.5.2.0, I wanted to make sure I could duplicate the behavior where it had been reported.

        After looking some more, it appears to me that in the one error situation I got (encryptionBlowfish with 18 diffing sections) the .tmp and .out are diffing in the tab characters; so that would point to something not being correct in the Sed class rather than the generateUTF8OutFile method.

        converting to junit still is the correct thing to do.

        Show
        Myrna van Lunteren added a comment - I forgot to mention that I ran tests with 10.5.1.1 - because no errors got reported with 10.5.2.0, I wanted to make sure I could duplicate the behavior where it had been reported. After looking some more, it appears to me that in the one error situation I got (encryptionBlowfish with 18 diffing sections) the .tmp and .out are diffing in the tab characters; so that would point to something not being correct in the Sed class rather than the generateUTF8OutFile method. converting to junit still is the correct thing to do.
        Hide
        Myrna van Lunteren added a comment -

        I have played with this for a while, and I've come to 3 conclusions:
        1. Knut is right, it seems the only diff is that in some parts of the runtime statistics output the tabs have been replaced by another character
        2. This is intermittent.
        Kathey reported the failure in the platform tests for 10.5.1.0 and 10.5.1.1 but not for (the discarded) 10.5.2.0.
        I ran encryptionAll on a zseries machine a couple of times and never got a failure. I ran derbyall a couple of times and access.sql failed once in encryptionBlowfish - with the character-replacement happening in about 18 instances (as opposed to 7 in the log Kathey attached).
        3. the diff output created by the test is singularly unhelpful.

        It's possible there's something wrong with the generateUTF8OutFile code in functionTests.harness.RunTest; I'm not sure why that would be intermittent though.
        One way to fix this would be to convert the test to junit.

        Show
        Myrna van Lunteren added a comment - I have played with this for a while, and I've come to 3 conclusions: 1. Knut is right, it seems the only diff is that in some parts of the runtime statistics output the tabs have been replaced by another character 2. This is intermittent. Kathey reported the failure in the platform tests for 10.5.1.0 and 10.5.1.1 but not for (the discarded) 10.5.2.0. I ran encryptionAll on a zseries machine a couple of times and never got a failure. I ran derbyall a couple of times and access.sql failed once in encryptionBlowfish - with the character-replacement happening in about 18 instances (as opposed to 7 in the log Kathey attached). 3. the diff output created by the test is singularly unhelpful. It's possible there's something wrong with the generateUTF8OutFile code in functionTests.harness.RunTest; I'm not sure why that would be intermittent though. One way to fix this would be to convert the test to junit.
        Knut Anders Hatlen made changes -
        Urgency Normal
        Component/s Store [ 11412 ]
        Hide
        Knut Anders Hatlen added a comment -

        Triaged for 10.5.2. Removed store component.

        I compared the attached access.out with the canon in the 10.3 branch. The only difference I found was that on some of the lines in the attached file tab characters had been replaced by a \u00 character.

        Show
        Knut Anders Hatlen added a comment - Triaged for 10.5.2. Removed store component. I compared the attached access.out with the canon in the 10.3 branch. The only difference I found was that on some of the lines in the attached file tab characters had been replaced by a \u00 character.
        Dag H. Wanvik made changes -
        Component/s Test [ 11413 ]
        Kathey Marsden made changes -
        Field Original Value New Value
        Attachment access.diff [ 12399076 ]
        Attachment access.out [ 12399077 ]
        Hide
        Kathey Marsden added a comment -

        Attaching diff and out files for the test failure.

        Show
        Kathey Marsden added a comment - Attaching diff and out files for the test failure.
        Kathey Marsden created issue -

          People

          • Assignee:
            Myrna van Lunteren
            Reporter:
            Kathey Marsden
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development