Derby
  1. Derby
  2. DERBY-3279

Derby 10.3.X ignores ORDER BY DESC when target column has an index and is used in an OR clause or an IN list.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.3.1.4, 10.3.2.1
    • Fix Version/s: 10.3.3.0, 10.4.1.3
    • Component/s: SQL
    • Labels:
      None
    • Environment:
      Rational Application Developer 7.0.0.2 (Eclipse 3.2.2), J2RE 1.5.0 IBM J9 2.3 Windows XP
    • Urgency:
      Urgent
    • Bug behavior facts:
      Regression

      Description

      Running the following produces the error seen in Derby 10.3.X but not in 10.2.X nor in 10.1.X.
      Don't know if this related to DERBY-3231.

      First query is incorrectly sorted whereas the second one is okay when there is an index on the table.
      If the table is not indexed, the sort works correctly in DESC order.
      ------
      create table CHEESE (
      CHEESE_CODE VARCHAR(5),
      CHEESE_NAME VARCHAR(20),
      CHEESE_COST DECIMAL(7,4)
      );

      create index cheese_index on CHEESE (CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC);

      INSERT INTO CHEESE (
      CHEESE_CODE,
      CHEESE_NAME,
      CHEESE_COST)
      VALUES ('00000', 'GOUDA', 001.1234),
      ('00000', 'EDAM', 002.1111),
      ('54321', 'EDAM', 008.5646),
      ('12345', 'GORGONZOLA', 888.2309),
      ('AAAAA', 'EDAM', 999.8888),
      ('54321', 'MUENSTER', 077.9545);

      SELECT * FROM CHEESE
      WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
      ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;

      SELECT * FROM CHEESE
      WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
      ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;

      1. releaseNote.html
        7 kB
        A B
      2. d3279_v1.patch
        29 kB
        A B
      3. d3279_ix2brnode_v1.patch
        18 kB
        A B
      4. d3279_10_3_merge.patch
        44 kB
        A B
      5. cheese2.sql
        2 kB
        Ajay Bhala

        Issue Links

          Activity

          Hide
          Daniel John Debrunner added a comment - - edited

          Here's what I get for trunk, both of them look incorrect since the ORDEY BY has DESC?

          ij> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          > > ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          00000|EDAM |2.1111
          54321|EDAM |8.5646

          2 rows selected
          ij> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          > > ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          54321|EDAM |8.5646
          AAAAA|EDAM |999.8888

          2 rows selected

          Show
          Daniel John Debrunner added a comment - - edited Here's what I get for trunk, both of them look incorrect since the ORDEY BY has DESC? ij> SELECT * FROM CHEESE WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' > > ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ 00000|EDAM |2.1111 54321|EDAM |8.5646 2 rows selected ij> SELECT * FROM CHEESE WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' > > ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ 54321|EDAM |8.5646 AAAAA|EDAM |999.8888 2 rows selected
          Hide
          Mamta A. Satoor added a comment -

          Ajay, I am curious if the database in Derby 10.3 was created with territory based collation enabled? In other words, when the database was created, did you use a JDBC url property like collation=TERRITORY_BASED?

          Show
          Mamta A. Satoor added a comment - Ajay, I am curious if the database in Derby 10.3 was created with territory based collation enabled? In other words, when the database was created, did you use a JDBC url property like collation=TERRITORY_BASED?
          Hide
          Kathey Marsden added a comment -

          This is a regression caused by revision 516454 of DERBY-681, but appears to be different than DERBY-3231 as it reproduces on the trunk as well after the fix for DERBY-3231.

          Show
          Kathey Marsden added a comment - This is a regression caused by revision 516454 of DERBY-681 , but appears to be different than DERBY-3231 as it reproduces on the trunk as well after the fix for DERBY-3231 .
          Hide
          Kathey Marsden added a comment -

          I tested with revision 516453 and the sort ordered properly, but not with revision 516454 (DERBY-681)

          Show
          Kathey Marsden added a comment - I tested with revision 516453 and the sort ordered properly, but not with revision 516454 ( DERBY-681 )
          Hide
          A B added a comment - - edited

          Actually, though I hate to say it, I think it's a regression caused by DERBY-47. When I run with 516454 (i.e. after DERBY-681 was committed I see the correct results:

          ij(CONNECTION1)> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          54321|EDAM |8.5646
          00000|EDAM |2.1111

          2 rows selected
          ij(CONNECTION1)> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          AAAAA|EDAM |999.8888
          54321|EDAM |8.5646

          2 rows selected
          ij(CONNECTION1)> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          ORDER BY CHEESE_CODE ASC, CHEESE_NAME DESC, CHEESE_COST ASC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          00000|EDAM |2.1111
          54321|EDAM |8.5646

          2 rows selected
          ij(CONNECTION1)> SELECT * FROM CHEESE
          WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
          ORDER BY CHEESE_CODE ASC, CHEESE_NAME DESC, CHEESE_COST ASC;
          CHEE&|CHEESE_NAME |CHEESE_C&
          ------------------------------------
          54321|EDAM |8.5646
          AAAAA|EDAM |999.8888

          2 rows selected

          But I run after the final commit for DERBY-47, the results are the same regardless of whether we sort ASC ro DESC on CHEESE_CODE (i.e. sort is ignored).

          Show
          A B added a comment - - edited Actually, though I hate to say it, I think it's a regression caused by DERBY-47 . When I run with 516454 (i.e. after DERBY-681 was committed I see the correct results: ij(CONNECTION1)> SELECT * FROM CHEESE WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ 54321|EDAM |8.5646 00000|EDAM |2.1111 2 rows selected ij(CONNECTION1)> SELECT * FROM CHEESE WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ AAAAA|EDAM |999.8888 54321|EDAM |8.5646 2 rows selected ij(CONNECTION1)> SELECT * FROM CHEESE WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' ORDER BY CHEESE_CODE ASC, CHEESE_NAME DESC, CHEESE_COST ASC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ 00000|EDAM |2.1111 54321|EDAM |8.5646 2 rows selected ij(CONNECTION1)> SELECT * FROM CHEESE WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM' ORDER BY CHEESE_CODE ASC, CHEESE_NAME DESC, CHEESE_COST ASC; CHEE&|CHEESE_NAME |CHEESE_C& ------------------------------------ 54321|EDAM |8.5646 AAAAA|EDAM |999.8888 2 rows selected But I run after the final commit for DERBY-47 , the results are the same regardless of whether we sort ASC ro DESC on CHEESE_CODE (i.e. sort is ignored).
          Hide
          Kathey Marsden added a comment -

          Yes, my bad. I come back from getting my coffee and there it is on the screen in the correct order. Sorry for the noise. Without the coffee I am quite sure it sorted the other way # Guess I need both coffee and vacation.

          Show
          Kathey Marsden added a comment - Yes, my bad. I come back from getting my coffee and there it is on the screen in the correct order. Sorry for the noise. Without the coffee I am quite sure it sorted the other way # Guess I need both coffee and vacation.
          Hide
          A B added a comment -

          In the absence of IN list "multi-probing", the way Derby implements an IN list is:

          while (more rows in the underlying result set)

          { get next row from the result set search IN list values to see if target column of row exists in the IN list if target column of the row exists in the IN list, return the row }

          Since the loop is based on scanning the underlying result set for rows, any rows which we return will be returned in the order they are received from the underlying result set. So if, for example, the underlying result is an index scan, and if the index is defined as DESCENDING, the rows will come back in descending order.

          With multi-probing enabled, which only happens after DERBY-47 and only happens for index scans (which is why the ordering is correct if the index is removed, per Ajay's description), the processing is (loosely):

          while (more values in the IN list)

          { get next IN list value probe underlying index for a row whose target column matches the current IN list value if index probe returned a row, return that row }

          Notice how, here, the loop is driven by the order of the values in the IN list, instead of by the order of the rows as they are returned from the index scan. So even if the index is defined as "DESC", the final ordering of the rows will match the ordering of the IN values.

          That said, in order to ensure proper handling of duplicate IN list values, the current multi-probe logic in Derby always sorts IN list values in ASCENDING order. So unless an explicit SortResultSet is generated for execution time, the rows for an IN-list multi-probe will always be in ASCENDING order.

          The final relevant piece of information is that the optimizer is smart enough to remove unnecessary sorts when a) there is an ORDER BY on some index column IC, b) the final access plan uses that index, and c) the ASC/DESC option of the ORDER BY matches the ASC/DESC option of the index column IC. For the example queries, we have "ORDER BY CHEESE_CODE DESC", and we have an index definition which includes "CHEESE_CODE DESC". So the optimizer thinks the sort is unnecessary and removes it. Hence this Jira.

          There are two fixes that come to mind right away:

          1. Make the optimizer recognize that the sort cannot be removed if we're multi-probing, or
          2. Make MultiProbeTableScanResultSet sort the IN list values in ascending OR descending order depending on the ORDER BY clause (instead of always sorting in ASCENDING).

          I think option #2 is the best option here. I have a fix in progress and will post when it is postable...

          Show
          A B added a comment - In the absence of IN list "multi-probing", the way Derby implements an IN list is: while (more rows in the underlying result set) { get next row from the result set search IN list values to see if target column of row exists in the IN list if target column of the row exists in the IN list, return the row } Since the loop is based on scanning the underlying result set for rows, any rows which we return will be returned in the order they are received from the underlying result set. So if, for example, the underlying result is an index scan, and if the index is defined as DESCENDING, the rows will come back in descending order. With multi-probing enabled, which only happens after DERBY-47 and only happens for index scans (which is why the ordering is correct if the index is removed, per Ajay's description), the processing is (loosely): while (more values in the IN list) { get next IN list value probe underlying index for a row whose target column matches the current IN list value if index probe returned a row, return that row } Notice how, here, the loop is driven by the order of the values in the IN list , instead of by the order of the rows as they are returned from the index scan. So even if the index is defined as "DESC", the final ordering of the rows will match the ordering of the IN values. That said, in order to ensure proper handling of duplicate IN list values, the current multi-probe logic in Derby always sorts IN list values in ASCENDING order. So unless an explicit SortResultSet is generated for execution time, the rows for an IN-list multi-probe will always be in ASCENDING order. The final relevant piece of information is that the optimizer is smart enough to remove unnecessary sorts when a) there is an ORDER BY on some index column IC, b) the final access plan uses that index, and c) the ASC/DESC option of the ORDER BY matches the ASC/DESC option of the index column IC. For the example queries, we have "ORDER BY CHEESE_CODE DESC", and we have an index definition which includes "CHEESE_CODE DESC". So the optimizer thinks the sort is unnecessary and removes it. Hence this Jira. There are two fixes that come to mind right away: 1. Make the optimizer recognize that the sort cannot be removed if we're multi-probing, or 2. Make MultiProbeTableScanResultSet sort the IN list values in ascending OR descending order depending on the ORDER BY clause (instead of always sorting in ASCENDING). I think option #2 is the best option here. I have a fix in progress and will post when it is postable...
          Hide
          Ajay Bhala added a comment -

          AB, thank you for that explanation. I look forward to the fix.

          Ajay

          Show
          Ajay Bhala added a comment - AB, thank you for that explanation. I look forward to the fix. Ajay
          Hide
          Ajay Bhala added a comment -

          AB, just wondering if you can provide me with an update on the fix you have available.

          Thanks,

          Ajay

          Show
          Ajay Bhala added a comment - AB, just wondering if you can provide me with an update on the fix you have available. Thanks, Ajay
          Hide
          A B added a comment -

          Hi Ajay,

          Sorry for the delay, I've had other things on my plate. But as it turns out, I was working with this yesterday and just a few minutes ago I kicked off the nightly regression tests with what I think is an acceptable patch for this issue. I still need to clean up the changes/comments a bit, but if the tests all pass I hope to post something by the end of the day today (PST).

          Show
          A B added a comment - Hi Ajay, Sorry for the delay, I've had other things on my plate. But as it turns out, I was working with this yesterday and just a few minutes ago I kicked off the nightly regression tests with what I think is an acceptable patch for this issue. I still need to clean up the changes/comments a bit, but if the tests all pass I hope to post something by the end of the day today (PST).
          Hide
          A B added a comment -

          Attaching a first attempt at resolving this issue.

          The problem as described above only occurs in situations where preprocessing/optimization eliminates a sort. There is already a method called "adjustForSortElimination()" defined in ResultSetNode, so this patch creates an alternate signature that takes an OrderByList as an argument. Then, in the case where sort elimination occurs, we will search a table's predicates to see if any of them is a multi-probing predicate. If so, we check to see if the ORDER BY that was eliminated required a DESCENDING sort on the multi-probe column, and we adjust for that (if needed) by sorting the IN list values into descending order at execution time. See code comments for details.

          I ran derbyall and suites.All with this patch applied and saw no failures. Review comments are appreciated.

          Show
          A B added a comment - Attaching a first attempt at resolving this issue. The problem as described above only occurs in situations where preprocessing/optimization eliminates a sort. There is already a method called "adjustForSortElimination()" defined in ResultSetNode, so this patch creates an alternate signature that takes an OrderByList as an argument. Then, in the case where sort elimination occurs, we will search a table's predicates to see if any of them is a multi-probing predicate. If so, we check to see if the ORDER BY that was eliminated required a DESCENDING sort on the multi-probe column, and we adjust for that (if needed) by sorting the IN list values into descending order at execution time. See code comments for details. I ran derbyall and suites.All with this patch applied and saw no failures. Review comments are appreciated.
          Hide
          A B added a comment -

          Changing summary to more closely describe the problem.

          Show
          A B added a comment - Changing summary to more closely describe the problem.
          Hide
          Bryan Pendleton added a comment -

          Hopefully this isn't possible, but I thought I'd ask:

          What if there was a multi-column index, for example,
          CREATE INDEX NAME_IDX ON EMPLOYEE(LAST_NAME,FIRST_NAME),
          and I had a query that was something like:

          SELECT LAST_NAME,FIRST_NAME FROM EMPLOYEE
          WHERE LAST_NAME IN ('REAGAN', 'CLINTON', 'BUSH')
          AND FIRST_NAME IN ('RONALD', 'WILLIAM', 'GEORGE')
          ORDER BY LAST_NAME ASC, FIRST_NAME DESC

          In such a case, if we decided to do in-list multi-probing
          against the NAME_IDX, would we have an ambiguity about
          whether we wanted to sort the (LAST_NAME,FIRST_NAME)
          values in ASC or DESC order?

          Show
          Bryan Pendleton added a comment - Hopefully this isn't possible, but I thought I'd ask: What if there was a multi-column index, for example, CREATE INDEX NAME_IDX ON EMPLOYEE(LAST_NAME,FIRST_NAME), and I had a query that was something like: SELECT LAST_NAME,FIRST_NAME FROM EMPLOYEE WHERE LAST_NAME IN ('REAGAN', 'CLINTON', 'BUSH') AND FIRST_NAME IN ('RONALD', 'WILLIAM', 'GEORGE') ORDER BY LAST_NAME ASC, FIRST_NAME DESC In such a case, if we decided to do in-list multi-probing against the NAME_IDX, would we have an ambiguity about whether we wanted to sort the (LAST_NAME,FIRST_NAME) values in ASC or DESC order?
          Hide
          A B added a comment -

          > What if there was a multi-column index [...] if we decided to do in-list multi-probing
          > against the NAME_IDX, would we have an ambiguity about whether we wanted to
          > sort the (LAST_NAME,FIRST_NAME) values in ASC or DESC order?

          Great question, Bryan. Thanks for bringing it up!

          I think the answer is, in fact, that "this isn't possible". A given multi-probing result set only ever works with a single column--even if that column is part of a multi-column index. See the following comments from TableScanResultSet:

          • Note that it is possible for a start/stop key to contain more
          • than one column (ex. if we're scanning a multi-column index). In
          • that case we plug probeValue into the first column of the start
          • and/or stop key and leave the rest of the key as it is. As an
          • example, assume we have the following predicates:
            *
          • ... where d in (1, 20000) and b > 200 and b <= 500
            *
          • And assume further that we have an index defined on (d, b).
          • In this case it's possible that we have TWO start predicates
          • and TWO stop predicates: the IN list will give us "d = probeVal",
          • which is a start predicate and a stop predicate; then "b > 200"
          • may give us a second start predicate, while "b <= 500" may give
          • us a second stop predicate. So in this situation we want our
          • start key to be:
            *
          • (probeValue, 200)
            *
          • and our stop key to be:
            *
          • (probeValue, 500).
            *
          • This will effectively limit the scan so that it only returns
          • rows whose "D" column equals probeValue and whose "B" column
          • falls in the range of 200 thru 500.
            *
          • Note: Derby currently only allows a single start/stop predicate
          • per column. See PredicateList.orderUsefulPredicates().

          It's also worth noting that Derby will only do multi-probing IF the column in question is the FIRST column in the index. This comes from the comments in PredicateList.orderUsefulPredicates():

          else if (pred.isInListProbePredicate()
          && (indexPosition > 0))

          { /* If the predicate is an IN-list probe predicate * then we only consider it to be useful if the * referenced column is the *first* one in the * index (i.e. if (indexPosition == 0)). Otherwise * the predicate would be treated as a qualifier * for store, which could lead to incorrect * results. */ indexCol = null; }

          So in the example you gave above, we would (should) never do multi-probing for FIRST_NAME. We would either do multi-probing based on LAST_NAME, in which case the sort would have to be ASC, or we would do no multi-probing at all. The FIRST_NAME IN list would be treated as a "normal" (non-probing) IN list operator.

          At least, that's the theory. If you'd like me to add a test case for that, just to be sure, let me know and I can certainly do so.

          If this still isnt' clear, please feel free to ask again. And thanks, as always, for your great comments!

          Show
          A B added a comment - > What if there was a multi-column index [...] if we decided to do in-list multi-probing > against the NAME_IDX, would we have an ambiguity about whether we wanted to > sort the (LAST_NAME,FIRST_NAME) values in ASC or DESC order? Great question, Bryan. Thanks for bringing it up! I think the answer is, in fact, that "this isn't possible". A given multi-probing result set only ever works with a single column--even if that column is part of a multi-column index. See the following comments from TableScanResultSet: Note that it is possible for a start/stop key to contain more than one column (ex. if we're scanning a multi-column index). In that case we plug probeValue into the first column of the start and/or stop key and leave the rest of the key as it is. As an example, assume we have the following predicates: * ... where d in (1, 20000) and b > 200 and b <= 500 * And assume further that we have an index defined on (d, b). In this case it's possible that we have TWO start predicates and TWO stop predicates: the IN list will give us "d = probeVal", which is a start predicate and a stop predicate; then "b > 200" may give us a second start predicate, while "b <= 500" may give us a second stop predicate. So in this situation we want our start key to be: * (probeValue, 200) * and our stop key to be: * (probeValue, 500). * This will effectively limit the scan so that it only returns rows whose "D" column equals probeValue and whose "B" column falls in the range of 200 thru 500. * Note: Derby currently only allows a single start/stop predicate per column. See PredicateList.orderUsefulPredicates(). It's also worth noting that Derby will only do multi-probing IF the column in question is the FIRST column in the index. This comes from the comments in PredicateList.orderUsefulPredicates(): else if (pred.isInListProbePredicate() && (indexPosition > 0)) { /* If the predicate is an IN-list probe predicate * then we only consider it to be useful if the * referenced column is the *first* one in the * index (i.e. if (indexPosition == 0)). Otherwise * the predicate would be treated as a qualifier * for store, which could lead to incorrect * results. */ indexCol = null; } So in the example you gave above, we would (should) never do multi-probing for FIRST_NAME. We would either do multi-probing based on LAST_NAME, in which case the sort would have to be ASC, or we would do no multi-probing at all. The FIRST_NAME IN list would be treated as a "normal" (non-probing) IN list operator. At least, that's the theory. If you'd like me to add a test case for that, just to be sure, let me know and I can certainly do so. If this still isnt' clear, please feel free to ask again. And thanks, as always, for your great comments!
          Hide
          Bryan Pendleton added a comment -

          Very good. I anticipated that answer, but it was quite helpful
          to read your clear and detailed answer. Thanks!

          I read through the patch and didn't notice any other issues that
          concerned me. I think you should move ahead with this approach.

          Show
          Bryan Pendleton added a comment - Very good. I anticipated that answer, but it was quite helpful to read your clear and detailed answer. Thanks! I read through the patch and didn't notice any other issues that concerned me. I think you should move ahead with this approach.
          Hide
          A B added a comment -

          Thank you, as always, for your review Bryan. I committed the _v1 patch with svn # 616126:

          URL: http://svn.apache.org/viewvc?rev=616126&view=rev

          Unchecking the patch available flag and marking Fix In as 10.4. As per usual, if the tinderbox results run cleanly over the next few days, I'll look at porting this back to 10.3.

          Show
          A B added a comment - Thank you, as always, for your review Bryan. I committed the _v1 patch with svn # 616126: URL: http://svn.apache.org/viewvc?rev=616126&view=rev Unchecking the patch available flag and marking Fix In as 10.4. As per usual, if the tinderbox results run cleanly over the next few days, I'll look at porting this back to 10.3.
          Hide
          Ajay Bhala added a comment - - edited

          AB, thank you for the patch file. I patched the source from 10.3.2.1 with the file and re-ran my queries above.
          All seems well. However when I ran the queries in the attached file, I saw some incorrect sorting. The queries are similar to the original posting save for the fact that CHEESE_COST is neither in the index nor in the ORDER BY clause.

          Show
          Ajay Bhala added a comment - - edited AB, thank you for the patch file. I patched the source from 10.3.2.1 with the file and re-ran my queries above. All seems well. However when I ran the queries in the attached file, I saw some incorrect sorting. The queries are similar to the original posting save for the fact that CHEESE_COST is neither in the index nor in the ORDER BY clause.
          Hide
          Ajay Bhala added a comment - - edited

          cheese2.sql has test queries for patch d3279_v1.patch

          Show
          Ajay Bhala added a comment - - edited cheese2.sql has test queries for patch d3279_v1.patch
          Hide
          A B added a comment -

          Thanks for running the tests, Ajay.

          The removal of CHEESE_COST from the index means that the index now has a subset of the columns from the base table. That in turn means that we have to generate an IndexToBaseRowNode above the base table (which we won't do if the index and the base table have the same columns). In my first patch I overlooked this fact and thus did not propagate the call to "adjustForSortElimination()" down to the base table beneath IndexToBaseRowNode.

          To fix this I just had to add an appropriate call to IndexToBaseRowNode:

          + /**
          + * @see ResultSetNode#adjustForSortElimination
          + */
          + void adjustForSortElimination(RequiredRowOrdering rowOrdering)
          + throws StandardException
          +

          { + source.adjustForSortElimination(rowOrdering); + }

          and the queries in cheese2.sql now sort correctly. I want to look a bit more to see if there are any other cases where intermediary nodes are added above FromBaseTable and thus need a similar change, and then I will post a follow-up patch with the corresponding changes.

          Thanks again for running more tests and reporting back!

          Show
          A B added a comment - Thanks for running the tests, Ajay. The removal of CHEESE_COST from the index means that the index now has a subset of the columns from the base table. That in turn means that we have to generate an IndexToBaseRowNode above the base table (which we won't do if the index and the base table have the same columns). In my first patch I overlooked this fact and thus did not propagate the call to "adjustForSortElimination()" down to the base table beneath IndexToBaseRowNode. To fix this I just had to add an appropriate call to IndexToBaseRowNode: + /** + * @see ResultSetNode#adjustForSortElimination + */ + void adjustForSortElimination(RequiredRowOrdering rowOrdering) + throws StandardException + { + source.adjustForSortElimination(rowOrdering); + } and the queries in cheese2.sql now sort correctly. I want to look a bit more to see if there are any other cases where intermediary nodes are added above FromBaseTable and thus need a similar change, and then I will post a follow-up patch with the corresponding changes. Thanks again for running more tests and reporting back!
          Hide
          A B added a comment -

          Attaching a follow-up patch, d3279_ix2brnode_v1.patch, that fixes the case for IndexToBaseRowNodes which appear above the base table. I didn't see any other cases where similar changes were necessary (or rather, all other cases were handled by the _v1 patch).

          This patch also adds some more test cases to InListMultiProbeTest, esp. to 1) test the case where the index has fewer columns than the base table, and 2) test queries that have multiple IN lists in them. I decided to add the latter set of tests since Bryan asked about a similar query in an earlier comment, and since cheese2.sql also includes a couple of queries with multiple IN lists in the where clause (indirectly).

          I ran derbyall and suites.All with ibm142 and saw no failures.

          Show
          A B added a comment - Attaching a follow-up patch, d3279_ix2brnode_v1.patch, that fixes the case for IndexToBaseRowNodes which appear above the base table. I didn't see any other cases where similar changes were necessary (or rather, all other cases were handled by the _v1 patch). This patch also adds some more test cases to InListMultiProbeTest, esp. to 1) test the case where the index has fewer columns than the base table, and 2) test queries that have multiple IN lists in them. I decided to add the latter set of tests since Bryan asked about a similar query in an earlier comment, and since cheese2.sql also includes a couple of queries with multiple IN lists in the where clause (indirectly). I ran derbyall and suites.All with ibm142 and saw no failures.
          Hide
          A B added a comment -

          Committed d3279_ix2brnode_v1.patch to trunk with svn # 617548:

          URL: http://svn.apache.org/viewvc?rev=617548&view=rev

          Ajay, if you can do more testing of your queries with this change applied, that would be helpful--and thanks for your patience in the meantime. Otherwise, assuming all goes well with tinderbox runs for the next few days, I'll plan to port the changes to 10.3 sometime late next week...

          Show
          A B added a comment - Committed d3279_ix2brnode_v1.patch to trunk with svn # 617548: URL: http://svn.apache.org/viewvc?rev=617548&view=rev Ajay, if you can do more testing of your queries with this change applied, that would be helpful--and thanks for your patience in the meantime. Otherwise, assuming all goes well with tinderbox runs for the next few days, I'll plan to port the changes to 10.3 sometime late next week...
          Hide
          Ajay Bhala added a comment -

          AB, I will try to perform more testing this week and let you know how things work out.

          Thanks again.

          Ajay

          Show
          Ajay Bhala added a comment - AB, I will try to perform more testing this week and let you know how things work out. Thanks again. Ajay
          Hide
          A B added a comment -

          I ran the following "svn merge" commands to port this change back to 10.3:

          svn merge -r 616125:616126 https://svn.apache.org/repos/asf/db/derby/code/trunk
          svn merge -r 617547:617548 https://svn.apache.org/repos/asf/db/derby/code/trunk

          The second command saw a conflict due to svn # 614745, which went into trunk but not into 10.3. So I had to do a manual resolution of the conflict--which simply amounted to adjusting for 2 lines in junit/JDBC.java.

          I'm attaching d3279_10_3_merge.patch, which encapsulates the "svn merge" commands plus the conflict resolution. I ran the 10.3 "derbyall" and "suites.All" regression tests and saw no failures.

          I'll wait a day or two to see if Ajay's testing illuminates any other problems; if not, then I'll go ahead and commit the merge patch to the 10.3 branch.

          Show
          A B added a comment - I ran the following "svn merge" commands to port this change back to 10.3: svn merge -r 616125:616126 https://svn.apache.org/repos/asf/db/derby/code/trunk svn merge -r 617547:617548 https://svn.apache.org/repos/asf/db/derby/code/trunk The second command saw a conflict due to svn # 614745, which went into trunk but not into 10.3. So I had to do a manual resolution of the conflict--which simply amounted to adjusting for 2 lines in junit/JDBC.java. I'm attaching d3279_10_3_merge.patch, which encapsulates the "svn merge" commands plus the conflict resolution. I ran the 10.3 "derbyall" and "suites.All" regression tests and saw no failures. I'll wait a day or two to see if Ajay's testing illuminates any other problems; if not, then I'll go ahead and commit the merge patch to the 10.3 branch.
          Hide
          Ajay Bhala added a comment -

          AB, I've not had the opportunity to test the v2 patch but my colleague has. His testing has not revealed further issues. I would say that we can put this bug to bed.

          Thanks for your help,

          Ajay

          Show
          Ajay Bhala added a comment - AB, I've not had the opportunity to test the v2 patch but my colleague has. His testing has not revealed further issues. I would say that we can put this bug to bed. Thanks for your help, Ajay
          Hide
          A B added a comment -

          Thanks for the feedback, Ajay. I committed the changes to 10.3 with svn # 619568:

          URL: http://svn.apache.org/viewvc?rev=619568&view=rev

          > I would say that we can put this bug to bed.

          Since it's in 10.3 now, I'm marking the issue as resolved. If all is okay, can you go ahead and close it when you have the chance? Thanks again for your patience on this one...

          Show
          A B added a comment - Thanks for the feedback, Ajay. I committed the changes to 10.3 with svn # 619568: URL: http://svn.apache.org/viewvc?rev=619568&view=rev > I would say that we can put this bug to bed. Since it's in 10.3 now, I'm marking the issue as resolved. If all is okay, can you go ahead and close it when you have the chance? Thanks again for your patience on this one...
          Hide
          A B added a comment -

          I don't know if fixes for wrong results regressions really need a release note or not-esp. if it 's just an ordering issue-but I'm attaching one here anyway. I am not checking the "Existing Application Impact" box as I don't know what the rules are for regressions. If it is agreed that release notes are good for regression fixes, as well, then someone can check the appropriate box to have the release note picked up by the generator tool. Note: the html file will have to be "scrubbed" in order to play nicely with the release note tool.

          Show
          A B added a comment - I don't know if fixes for wrong results regressions really need a release note or not- esp. if it 's just an ordering issue -but I'm attaching one here anyway. I am not checking the "Existing Application Impact" box as I don't know what the rules are for regressions. If it is agreed that release notes are good for regression fixes, as well, then someone can check the appropriate box to have the release note picked up by the generator tool. Note: the html file will have to be "scrubbed" in order to play nicely with the release note tool.

            People

            • Assignee:
              A B
              Reporter:
              Ajay Bhala
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development