Derby
  1. Derby
  2. DERBY-4371

Non-selected columns for SELECT DISTINCT allowed in ORDER BY clause if ordered by expression

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 10.5.1.1
    • Fix Version/s: 10.7.1.1
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Repro attached
    • Bug behavior facts:
      Wrong query result

      Description

      How to repeat:

      ij> create table t (i integer, j integer);;
      0 rows inserted/updated/deleted
      ij> insert into t values (1,2),(1,3);
      2 rows inserted/updated/deleted
      ij> select distinct i from t order by j;
      ERROR 42879: The ORDER BY clause may not contain column 'J', since the query specifies DISTINCT and that column does not appear in the query result.
      ij> select distinct i from t order by j*2;
      I
      -----------
      1
      1

      2 rows selected

      1. derby-4371-tests.diff
        8 kB
        Nirmal Fernando
      2. derby-4371-7.diff
        5 kB
        Nirmal Fernando
      3. derby-4371-6.diff
        5 kB
        Nirmal Fernando
      4. derby-4371-5.diff
        5 kB
        Nirmal Fernando
      5. DERBY-4371-4.diff
        4 kB
        Nirmal Fernando
      6. DERBY-4371-3.diff
        4 kB
        Nirmal Fernando
      7. DERBY-4371-2.diff
        2 kB
        Nirmal Fernando
      8. DERBY-4371.diff
        2 kB
        Nirmal Fernando

        Issue Links

          Activity

          Hide
          Bernt M. Johnsen added a comment -

          Wrong result. Should be "Critical"

          Show
          Bernt M. Johnsen added a comment - Wrong result. Should be "Critical"
          Hide
          Knut Anders Hatlen added a comment -

          SQL:2003 talks about this in part 2, 14.1 <declare cursor>, syntax rule 18d (18 d 9 B II, to be specific). My understanding is that if you have SELECT DISTINCT, all the columns/expressions in the ORDER BY clause must also be found in the select list.

          Adding this restriction would solve the issue by rejecting the problematic query. Whereas the behaviour of this particular query is not well-defined, there may be some meaningful queries that will be rejected too, like "select distinct i from t order by -i", so we should probably add a release note if we choose to fix it this way.

          Another data point, PostgreSQL does not allow any of the queries discussed here:

          kh160127=# select distinct i from t order by j;
          ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
          kh160127=# select distinct i from t order by j*2;
          ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
          kh160127=# select distinct i from t order by -i;
          ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

          MySQL allows all three of them, although only the latter has a well-defined ordering.

          Show
          Knut Anders Hatlen added a comment - SQL:2003 talks about this in part 2, 14.1 <declare cursor>, syntax rule 18d (18 d 9 B II, to be specific). My understanding is that if you have SELECT DISTINCT, all the columns/expressions in the ORDER BY clause must also be found in the select list. Adding this restriction would solve the issue by rejecting the problematic query. Whereas the behaviour of this particular query is not well-defined, there may be some meaningful queries that will be rejected too, like "select distinct i from t order by -i", so we should probably add a release note if we choose to fix it this way. Another data point, PostgreSQL does not allow any of the queries discussed here: kh160127=# select distinct i from t order by j; ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list kh160127=# select distinct i from t order by j*2; ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list kh160127=# select distinct i from t order by -i; ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list MySQL allows all three of them, although only the latter has a well-defined ordering.
          Hide
          Knut Anders Hatlen added a comment -

          The error for "select distinct i from t order by j" is raised by OrderByColumn.bindOrderByColumn() if the query is distinct, the sort key is a column reference, and the sort key has been added to the result column list. Ideally, we should raise the error if the query is distinct and the sort key has been added to the RCL, and not care about whether or not it's a column reference. However, we always add the sort key to the RCL if it's an expression, even if the same expression is already in the RCL. DERBY-4406 is caused by this, and a fix for that issue is likely to make it easier to fix this issue.

          Show
          Knut Anders Hatlen added a comment - The error for "select distinct i from t order by j" is raised by OrderByColumn.bindOrderByColumn() if the query is distinct, the sort key is a column reference, and the sort key has been added to the result column list. Ideally, we should raise the error if the query is distinct and the sort key has been added to the RCL, and not care about whether or not it's a column reference. However, we always add the sort key to the RCL if it's an expression, even if the same expression is already in the RCL. DERBY-4406 is caused by this, and a fix for that issue is likely to make it easier to fix this issue.
          Hide
          Nirmal Fernando added a comment -

          Hi All,

          I like to do some work on this bug as well. If someone can suggest me a starting point, it would be great.

          Following is some of my understandings/ findings, please correct me if I am wrong.

          I think we need to perform a check inside OrderByColumn.bindOrderByColumn(), to find out whether the order by expression (not only column references) is same as the expression in RCL. If they are different we should raise that error if and only if (target instanceof SelectNode)=true and ( (SelectNode)target ).hasDistinct() =true.

          I think this will make Derby to not pull-up an expression that is already there in RCL. Currently Derby pull-up j*2 expression in both occasions mentioned below:

          select distinct i from t order by j*2; // addedColumnOffset = 0
          select distinct j*2 from t order by j*2; // addedColumnOffset = 0 this ideally should be -1.

          I like to get your view thoughts on this.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi All, I like to do some work on this bug as well. If someone can suggest me a starting point, it would be great. Following is some of my understandings/ findings, please correct me if I am wrong. I think we need to perform a check inside OrderByColumn.bindOrderByColumn(), to find out whether the order by expression (not only column references) is same as the expression in RCL. If they are different we should raise that error if and only if (target instanceof SelectNode)=true and ( (SelectNode)target ).hasDistinct() =true. I think this will make Derby to not pull-up an expression that is already there in RCL. Currently Derby pull-up j*2 expression in both occasions mentioned below: select distinct i from t order by j*2; // addedColumnOffset = 0 select distinct j*2 from t order by j*2; // addedColumnOffset = 0 this ideally should be -1. I like to get your view thoughts on this. Thanks.
          Hide
          Bryan Pendleton added a comment -

          Hi Nirmal, thanks for looking at this issue.

          I wonder if it would be helpful to take a close look at how GROUP BY handles
          a somewhat similar situation:

          select a, sum(b) from t group by a having a+c > 1

          is not legal because of the reference to column 'c'. But, the related statement:

          select a+c, sum(b) from t group by a+c having a+c > 1

          is legal, because the expression is the same in all cases (I think – please check this!)

          You can find the code that implements this behavior, I think, in the method
          addNewColumnsForAggregation() in GroupByNode.java. In particular, look
          closely at how the SubstituteExpressionsVisitor and CollectNodesVisitor
          get used to process the HAVING clause's expression tree.

          Show
          Bryan Pendleton added a comment - Hi Nirmal, thanks for looking at this issue. I wonder if it would be helpful to take a close look at how GROUP BY handles a somewhat similar situation: select a, sum(b) from t group by a having a+c > 1 is not legal because of the reference to column 'c'. But, the related statement: select a+c, sum(b) from t group by a+c having a+c > 1 is legal, because the expression is the same in all cases (I think – please check this!) You can find the code that implements this behavior, I think, in the method addNewColumnsForAggregation() in GroupByNode.java. In particular, look closely at how the SubstituteExpressionsVisitor and CollectNodesVisitor get used to process the HAVING clause's expression tree.
          Hide
          Knut Anders Hatlen added a comment -

          Actually, GROUP BY A+C is not valid SQL. It is non-standard syntax that was added in DERBY-883. SQL:2003 only allows column references in group by clauses, as far as I can see.

          Show
          Knut Anders Hatlen added a comment - Actually, GROUP BY A+C is not valid SQL. It is non-standard syntax that was added in DERBY-883 . SQL:2003 only allows column references in group by clauses, as far as I can see.
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan & Knut,

          select a, sum(b) from t group by a having a+c > 1 : I checked this in both Derby and PostgreSQL, this is not legal as Bryan correctly mentioned since column reference 'c' included in having clause is not in group by clause, though the 'a' in select clause is in the group by clause. (Note that 'b' is used as 'sum(b)', i.e. if we use a column reference with an aggregate function it need not to be in the group by clause.)

          select a+c, sum(b) from t group by a+c having a+c > 1 :This is a valid query in both Derby and PostgreSQL. I think Bryan is correct, since the same expression used in all 3 cases, this is successful. I verified this by following queries:
          select a, sum(b) from t group by a+c having a+c > 1 //this fails
          select a-c, sum(b) from t group by a+c having a+c > 1 //this fails
          select a-c, sum(b) from t group by a,c having a+c > 1 //this passes
          select a, sum(b) from t group by a+c,a having a+c > 1 //this passes
          ---> this implies expressions in select clause should be in group by clause

          same way I figured out that an expression used in a having clause should be in group by clause.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Bryan & Knut, select a, sum(b) from t group by a having a+c > 1 : I checked this in both Derby and PostgreSQL, this is not legal as Bryan correctly mentioned since column reference 'c' included in having clause is not in group by clause, though the 'a' in select clause is in the group by clause. (Note that 'b' is used as 'sum(b)', i.e. if we use a column reference with an aggregate function it need not to be in the group by clause.) select a+c, sum(b) from t group by a+c having a+c > 1 :This is a valid query in both Derby and PostgreSQL. I think Bryan is correct, since the same expression used in all 3 cases, this is successful. I verified this by following queries: select a, sum(b) from t group by a+c having a+c > 1 //this fails select a-c, sum(b) from t group by a+c having a+c > 1 //this fails select a-c, sum(b) from t group by a,c having a+c > 1 //this passes select a, sum(b) from t group by a+c,a having a+c > 1 //this passes ---> this implies expressions in select clause should be in group by clause same way I figured out that an expression used in a having clause should be in group by clause. Thanks.
          Hide
          Nirmal Fernando added a comment -

          Hi All,

          I have attached a patch for this issue. This patch shows following error for the query mentioned in the description:

          ERROR 42879: The ORDER BY clause may not contain column 'expression' , since the query specifies DISTINCT and that column does not appear in the query result.

          I used 'expression' in the error, since j*2 is not a column reference, but I think it is better if we can show a new error for this.

          I am expecting your valuable comments on the patch.

          Show
          Nirmal Fernando added a comment - Hi All, I have attached a patch for this issue. This patch shows following error for the query mentioned in the description: ERROR 42879: The ORDER BY clause may not contain column 'expression' , since the query specifies DISTINCT and that column does not appear in the query result. I used 'expression' in the error, since j*2 is not a column reference, but I think it is better if we can show a new error for this. I am expecting your valuable comments on the patch.
          Hide
          Nirmal Fernando added a comment -

          I am running derbyall tests now.

          Thanks.

          Show
          Nirmal Fernando added a comment - I am running derbyall tests now. Thanks.
          Hide
          Nirmal Fernando added a comment -

          Hi All,

          I ran the derbyall tests with my modification, but it failed (I was wondering why). Following is a quote from derbyall_report.txt:

          Generating report for RunSuite derbyall null null null true
          ------------------ Java Information ------------------
          Java Version: 1.6.0_16
          Java Vendor: Sun Microsystems Inc.
          Java home: C:\Program Files\Java\jre6
          Java classpath: %classpath%;C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyrun.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyTesting.jar;C:\OtherNirmal\GSoC\Code\tools\java\jakarta-oro-2.0.8.jar;C:\OtherNirmal\GSoC\Code\tools\java\junit.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_cs.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_de_DE.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_es.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_fr.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_hu.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_it.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ja_JP.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ko_KR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pl.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pt_BR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ru.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_CN.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_TW.jar;C:\OtherNirmal\GSoC\xalan-j_2_7_1-bin-2jars\xalan-j_2_7_1\samples\xalansamples.jar;C:\OtherNirmal\GSoC\Document\lib\fop.jar;C:\OtherNirmal\GSoC\Document\lib\avalon-framework-4.2.0.jar;C:\OtherNirmal\GSoC\Document\lib\batik-all-1.6.jar;.
          OS name: Windows Vista
          OS architecture: x86
          OS version: 6.0
          Java user name: Nirmal
          Java user home: C:\Users\Nirmal
          Java user dir: C:\OtherNirmal\GSoC\Testing\DerbyAll
          java.specification.name: Java Platform API Specification
          java.specification.version: 1.6
          --------- Derby Information --------
          JRE - JDBC: Java SE 6 - JDBC 4.0
          [C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar] 10.6.0.0 alpha - (exported)
          ------------------------------------------------------
          ----------------- Locale Information -----------------
          Current Locale : [English/United States [en_US]]
          Found support for locale: [cs]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [de_DE]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [es]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [fr]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [hu]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [it]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ja_JP]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ko_KR]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [pl]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [pt_BR]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ru]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [zh_CN]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [zh_TW]
          version: 10.6.0.0 alpha - (exported)
          ------------------------------------------------------
          Test environment information:
          COMMAND LINE STYLE: jdk13
          TEST CANONS: master
          ------------------------------------------------------
          ------------------------------------------------------
          Summary results:

          Test Run Started: 2010-04-25 01:19:16.0
          Test Run Duration: 00:48:47

          204 Tests Run
          31% Pass (65 tests passed)
          69% Fail (139 tests failed)
          3 Suites skipped

          =================================================================
          Since I was pretty sure, that these failures occurred not because of my modifications, I ran derbyall again after commenting the code I added.
          The result was what I expected.

          Generating report for RunSuite derbyall null null null true
          ------------------ Java Information ------------------
          Java Version: 1.6.0_16
          Java Vendor: Sun Microsystems Inc.
          Java home: C:\Program Files\Java\jre6
          Java classpath: %classpath%;C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyrun.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyTesting.jar;C:\OtherNirmal\GSoC\Code\tools\java\jakarta-oro-2.0.8.jar;C:\OtherNirmal\GSoC\Code\tools\java\junit.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_cs.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_de_DE.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_es.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_fr.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_hu.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_it.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ja_JP.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ko_KR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pl.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pt_BR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ru.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_CN.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_TW.jar;C:\OtherNirmal\GSoC\xalan-j_2_7_1-bin-2jars\xalan-j_2_7_1\samples\xalansamples.jar;C:\OtherNirmal\GSoC\Document\lib\fop.jar;C:\OtherNirmal\GSoC\Document\lib\avalon-framework-4.2.0.jar;C:\OtherNirmal\GSoC\Document\lib\batik-all-1.6.jar;.
          OS name: Windows Vista
          OS architecture: x86
          OS version: 6.0
          Java user name: Nirmal
          Java user home: C:\Users\Nirmal
          Java user dir: C:\OtherNirmal\GSoC\Testing\DerbyAll2
          java.specification.name: Java Platform API Specification
          java.specification.version: 1.6
          --------- Derby Information --------
          JRE - JDBC: Java SE 6 - JDBC 4.0
          [C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar] 10.6.0.0 alpha - (exported)
          [C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar] 10.6.0.0 alpha - (exported)
          ------------------------------------------------------
          ----------------- Locale Information -----------------
          Current Locale : [English/United States [en_US]]
          Found support for locale: [cs]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [de_DE]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [es]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [fr]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [hu]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [it]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ja_JP]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ko_KR]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [pl]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [pt_BR]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [ru]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [zh_CN]
          version: 10.6.0.0 alpha - (exported)
          Found support for locale: [zh_TW]
          version: 10.6.0.0 alpha - (exported)
          ------------------------------------------------------
          Test environment information:
          COMMAND LINE STYLE: jdk13
          TEST CANONS: master
          ------------------------------------------------------
          ------------------------------------------------------
          Summary results:

          Test Run Started: 2010-04-25 10:29:54.0
          Test Run Duration: 00:48:08

          204 Tests Run
          31% Pass (65 tests passed)
          69% Fail (139 tests failed)
          3 Suites skipped

          ============================================================

          But, since the both reports failed same number of tests, I suppose I can conclude that my modifications, not resulted of failure of any tests.

          Expect your views on this.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi All, I ran the derbyall tests with my modification, but it failed (I was wondering why). Following is a quote from derbyall_report.txt: Generating report for RunSuite derbyall null null null true ------------------ Java Information ------------------ Java Version: 1.6.0_16 Java Vendor: Sun Microsystems Inc. Java home: C:\Program Files\Java\jre6 Java classpath: %classpath%;C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyrun.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyTesting.jar;C:\OtherNirmal\GSoC\Code\tools\java\jakarta-oro-2.0.8.jar;C:\OtherNirmal\GSoC\Code\tools\java\junit.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_cs.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_de_DE.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_es.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_fr.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_hu.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_it.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ja_JP.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ko_KR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pl.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pt_BR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ru.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_CN.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_TW.jar;C:\OtherNirmal\GSoC\xalan-j_2_7_1-bin-2jars\xalan-j_2_7_1\samples\xalansamples.jar;C:\OtherNirmal\GSoC\Document\lib\fop.jar;C:\OtherNirmal\GSoC\Document\lib\avalon-framework-4.2.0.jar;C:\OtherNirmal\GSoC\Document\lib\batik-all-1.6.jar;. OS name: Windows Vista OS architecture: x86 OS version: 6.0 Java user name: Nirmal Java user home: C:\Users\Nirmal Java user dir: C:\OtherNirmal\GSoC\Testing\DerbyAll java.specification.name: Java Platform API Specification java.specification.version: 1.6 --------- Derby Information -------- JRE - JDBC: Java SE 6 - JDBC 4.0 [C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar] 10.6.0.0 alpha - (exported) ------------------------------------------------------ ----------------- Locale Information ----------------- Current Locale : [English/United States [en_US] ] Found support for locale: [cs] version: 10.6.0.0 alpha - (exported) Found support for locale: [de_DE] version: 10.6.0.0 alpha - (exported) Found support for locale: [es] version: 10.6.0.0 alpha - (exported) Found support for locale: [fr] version: 10.6.0.0 alpha - (exported) Found support for locale: [hu] version: 10.6.0.0 alpha - (exported) Found support for locale: [it] version: 10.6.0.0 alpha - (exported) Found support for locale: [ja_JP] version: 10.6.0.0 alpha - (exported) Found support for locale: [ko_KR] version: 10.6.0.0 alpha - (exported) Found support for locale: [pl] version: 10.6.0.0 alpha - (exported) Found support for locale: [pt_BR] version: 10.6.0.0 alpha - (exported) Found support for locale: [ru] version: 10.6.0.0 alpha - (exported) Found support for locale: [zh_CN] version: 10.6.0.0 alpha - (exported) Found support for locale: [zh_TW] version: 10.6.0.0 alpha - (exported) ------------------------------------------------------ Test environment information: COMMAND LINE STYLE: jdk13 TEST CANONS: master ------------------------------------------------------ ------------------------------------------------------ Summary results: Test Run Started: 2010-04-25 01:19:16.0 Test Run Duration: 00:48:47 204 Tests Run 31% Pass (65 tests passed) 69% Fail (139 tests failed) 3 Suites skipped ================================================================= Since I was pretty sure, that these failures occurred not because of my modifications, I ran derbyall again after commenting the code I added. The result was what I expected. Generating report for RunSuite derbyall null null null true ------------------ Java Information ------------------ Java Version: 1.6.0_16 Java Vendor: Sun Microsystems Inc. Java home: C:\Program Files\Java\jre6 Java classpath: %classpath%;C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyrun.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyTesting.jar;C:\OtherNirmal\GSoC\Code\tools\java\jakarta-oro-2.0.8.jar;C:\OtherNirmal\GSoC\Code\tools\java\junit.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_cs.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_de_DE.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_es.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_fr.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_hu.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_it.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ja_JP.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ko_KR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pl.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_pt_BR.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_ru.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_CN.jar;C:\OtherNirmal\GSoC\Code\jars\sane\derbyLocale_zh_TW.jar;C:\OtherNirmal\GSoC\xalan-j_2_7_1-bin-2jars\xalan-j_2_7_1\samples\xalansamples.jar;C:\OtherNirmal\GSoC\Document\lib\fop.jar;C:\OtherNirmal\GSoC\Document\lib\avalon-framework-4.2.0.jar;C:\OtherNirmal\GSoC\Document\lib\batik-all-1.6.jar;. OS name: Windows Vista OS architecture: x86 OS version: 6.0 Java user name: Nirmal Java user home: C:\Users\Nirmal Java user dir: C:\OtherNirmal\GSoC\Testing\DerbyAll2 java.specification.name: Java Platform API Specification java.specification.version: 1.6 --------- Derby Information -------- JRE - JDBC: Java SE 6 - JDBC 4.0 [C:\OtherNirmal\GSoC\Code\jars\sane\derby.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbytools.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbynet.jar] 10.6.0.0 alpha - (exported) [C:\OtherNirmal\GSoC\Code\jars\sane\derbyclient.jar] 10.6.0.0 alpha - (exported) ------------------------------------------------------ ----------------- Locale Information ----------------- Current Locale : [English/United States [en_US] ] Found support for locale: [cs] version: 10.6.0.0 alpha - (exported) Found support for locale: [de_DE] version: 10.6.0.0 alpha - (exported) Found support for locale: [es] version: 10.6.0.0 alpha - (exported) Found support for locale: [fr] version: 10.6.0.0 alpha - (exported) Found support for locale: [hu] version: 10.6.0.0 alpha - (exported) Found support for locale: [it] version: 10.6.0.0 alpha - (exported) Found support for locale: [ja_JP] version: 10.6.0.0 alpha - (exported) Found support for locale: [ko_KR] version: 10.6.0.0 alpha - (exported) Found support for locale: [pl] version: 10.6.0.0 alpha - (exported) Found support for locale: [pt_BR] version: 10.6.0.0 alpha - (exported) Found support for locale: [ru] version: 10.6.0.0 alpha - (exported) Found support for locale: [zh_CN] version: 10.6.0.0 alpha - (exported) Found support for locale: [zh_TW] version: 10.6.0.0 alpha - (exported) ------------------------------------------------------ Test environment information: COMMAND LINE STYLE: jdk13 TEST CANONS: master ------------------------------------------------------ ------------------------------------------------------ Summary results: Test Run Started: 2010-04-25 10:29:54.0 Test Run Duration: 00:48:08 204 Tests Run 31% Pass (65 tests passed) 69% Fail (139 tests failed) 3 Suites skipped ============================================================ But, since the both reports failed same number of tests, I suppose I can conclude that my modifications, not resulted of failure of any tests. Expect your views on this. Thanks.
          Hide
          Nirmal Fernando added a comment -

          Hi All,

          derbyall Tests ran successfully.

          If someone can review the patch it would be great.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi All, derbyall Tests ran successfully. If someone can review the patch it would be great. Thanks.
          Hide
          Bryan Pendleton added a comment -

          Hi Nirmal, thanks for the patch, it looks very promising.
          Here's a couple things that occurred to me as I read it:

          1) Is the instanceof check for BinaryOperatorNode necessary?
          What would happen if it was removed?

          2) Did you run the JUnit test suites? Or just the derbyall suite?

          3) Can you add some new JUnit test cases to cover some of
          the variations that are handled by the new code? In particular,
          it would be nice to have test cases for the ORDER BY statements that
          are mentioned in the bug description, and it would be nice to have
          test cases for the ORDER BY statements that you mention in
          your comments in the patch.

          I think that DistinctTest.java would be a fine place to add the test cases.

          4) Something about this loop looks strange:

          for (int i=1; i<target.getResultColumns().size() ;i++ ){

          Usually, either the loop counter starts at 0, or the loop test
          is for <=, not <. Do you think the start and end conditions are
          correct in this case?

          Show
          Bryan Pendleton added a comment - Hi Nirmal, thanks for the patch, it looks very promising. Here's a couple things that occurred to me as I read it: 1) Is the instanceof check for BinaryOperatorNode necessary? What would happen if it was removed? 2) Did you run the JUnit test suites? Or just the derbyall suite? 3) Can you add some new JUnit test cases to cover some of the variations that are handled by the new code? In particular, it would be nice to have test cases for the ORDER BY statements that are mentioned in the bug description, and it would be nice to have test cases for the ORDER BY statements that you mention in your comments in the patch. I think that DistinctTest.java would be a fine place to add the test cases. 4) Something about this loop looks strange: for (int i=1; i<target.getResultColumns().size() ;i++ ){ Usually, either the loop counter starts at 0, or the loop test is for <=, not <. Do you think the start and end conditions are correct in this case?
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan,

          Thank you very much for reviewing the patch.

          1) I added this simply because queries like select distinct i*2 from order by i are already giving that error. But I think removing this will not affect anything instead we will able to handle queries like select distinct i from order by random() such that it shows the error. Thanks for pointing it.

          2) I ran just the derbyall suite, I will run suites.all now.

          3) This will probably the next thing I like to learn. I will read up on Junit tests.

          4) Whoops, thanks Bryan, it was a mistake, and I corrected that, and I modified the patch too, please find the new attachment.

          Thanks Again.

          Show
          Nirmal Fernando added a comment - Hi Bryan, Thank you very much for reviewing the patch. 1) I added this simply because queries like select distinct i*2 from order by i are already giving that error. But I think removing this will not affect anything instead we will able to handle queries like select distinct i from order by random() such that it shows the error. Thanks for pointing it. 2) I ran just the derbyall suite, I will run suites.all now. 3) This will probably the next thing I like to learn. I will read up on Junit tests. 4) Whoops, thanks Bryan, it was a mistake, and I corrected that, and I modified the patch too, please find the new attachment. Thanks Again.
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan,

          I ran Junit tests for the new modification and there's a failure. Following is the output in JunitAll.out:

          orderby(org.apache.derbyTesting.functionTests.tests.lang.LangScripts)junit.framework.ComparisonFailure: Output at line 1736 expected:<[C1 |C2 ]> but was:<[ERROR 42879: The ORDER BY clause may not contain column 'expression', since the query specifies DISTINCT and that column does not appear in the query result.]>
          at org.apache.derbyTesting.functionTests.util.CanonTestCase.compareCanon(CanonTestCase.java:106)
          at org.apache.derbyTesting.functionTests.util.ScriptTestCase.runTest(ScriptTestCase.java:198)
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
          at junit.extensions.TestSetup.run(TestSetup.java:25)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)

          Where can I found the query related to this ?
          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Bryan, I ran Junit tests for the new modification and there's a failure. Following is the output in JunitAll.out: orderby(org.apache.derbyTesting.functionTests.tests.lang.LangScripts)junit.framework.ComparisonFailure: Output at line 1736 expected:< [C1 |C2 ] > but was:< [ERROR 42879: The ORDER BY clause may not contain column 'expression', since the query specifies DISTINCT and that column does not appear in the query result.] > at org.apache.derbyTesting.functionTests.util.CanonTestCase.compareCanon(CanonTestCase.java:106) at org.apache.derbyTesting.functionTests.util.ScriptTestCase.runTest(ScriptTestCase.java:198) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:21) at junit.extensions.TestSetup.run(TestSetup.java:25) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) Where can I found the query related to this ? Thanks.
          Hide
          Bryan Pendleton added a comment -

          There's a bit of a trick to reading the following output:

          > orderby(org.apache.derbyTesting.functionTests.tests.lang.LangScripts)junit.framework.ComparisonFailure:
          > Output at line 1736 expected:<[C1 |C2 ]> but was:<[ERROR 42879: The ORDER BY clause
          > may not contain column 'expression', since the query specifies DISTINCT and that column does
          > not appear in the query result.]>

          The output is referring to line 1736 in the file
          java/testing/org/apache/derbyTesting/functionTests/master/orderby.out

          The "LangScripts" test is a special JUnit test which takes files full of SQL statements, such as
          java/testing/org/apache/derbyTesting/functionTests/tests/lang/orderby.sql, feeds them
          into the "ij" tool, and then compares the output against the corresponding XXX.out file.

          So have a look at line 1736 in orderby.out and the message should make more sense.

          Show
          Bryan Pendleton added a comment - There's a bit of a trick to reading the following output: > orderby(org.apache.derbyTesting.functionTests.tests.lang.LangScripts)junit.framework.ComparisonFailure: > Output at line 1736 expected:< [C1 |C2 ] > but was:<[ERROR 42879: The ORDER BY clause > may not contain column 'expression', since the query specifies DISTINCT and that column does > not appear in the query result.]> The output is referring to line 1736 in the file java/testing/org/apache/derbyTesting/functionTests/master/orderby.out The "LangScripts" test is a special JUnit test which takes files full of SQL statements, such as java/testing/org/apache/derbyTesting/functionTests/tests/lang/orderby.sql, feeds them into the "ij" tool, and then compares the output against the corresponding XXX.out file. So have a look at line 1736 in orderby.out and the message should make more sense.
          Hide
          Nirmal Fernando added a comment -

          Thanks Bryan for help me out to read this.

          So, I found that the line 1736 is referred to the query: select distinct c1, c2 from t1 order by c1+1;

          According to the standards this should fail with the error I created:

          ERROR 42879: The ORDER BY clause may not contain column 'expression', since the query specifies DISTINCT and that column does
          not appear in the query result.

          But it expects the query to be pass and output this:

          C1 |C2
          ----------------------
          1 |c
          2 |b
          3 |a
          4 |c

          So, I think we need to modify this test, isn't it? Isn't this the place I need to add additional queries to test the functionality of the modified code?

          Thanks.

          Show
          Nirmal Fernando added a comment - Thanks Bryan for help me out to read this. So, I found that the line 1736 is referred to the query: select distinct c1, c2 from t1 order by c1+1; According to the standards this should fail with the error I created: ERROR 42879: The ORDER BY clause may not contain column 'expression', since the query specifies DISTINCT and that column does not appear in the query result. But it expects the query to be pass and output this: C1 |C2 ---------------------- 1 |c 2 |b 3 |a 4 |c So, I think we need to modify this test, isn't it? Isn't this the place I need to add additional queries to test the functionality of the modified code? Thanks.
          Hide
          Bryan Pendleton added a comment -

          It sounds like your proposed patch would cause
          a query which previously passed to get an error instead.

          This is the sort of thing that we always think very carefully
          about, because people may be currently using queries
          like this, and depending on them, and so we are hesitant
          to disrupt functionality that is currently operational for fear
          of breaking existing applications.

          In this particular case, we can see by using the Subversion "annotate" feature
          http://svn.apache.org/viewvc/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/orderby.out?view=annotate
          that the particular query was changed by DERBY-2351 in revision 637529:
          http://svn.apache.org/viewvc/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/orderby.out?r1=637528&r2=637529&

          I think that the next step is probably to have a close review of the
          DERBY-2351 notes and try to understand why the DERBY-2351 changes
          are in conflict with the changes in your proposed patch.

          It's quite possible that your patch is implementing the desired behavior,
          but it would better still if we can understand why DERBY-2351 felt that
          the other behavior was desirable.

          Show
          Bryan Pendleton added a comment - It sounds like your proposed patch would cause a query which previously passed to get an error instead. This is the sort of thing that we always think very carefully about, because people may be currently using queries like this, and depending on them, and so we are hesitant to disrupt functionality that is currently operational for fear of breaking existing applications. In this particular case, we can see by using the Subversion "annotate" feature http://svn.apache.org/viewvc/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/orderby.out?view=annotate that the particular query was changed by DERBY-2351 in revision 637529: http://svn.apache.org/viewvc/db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/orderby.out?r1=637528&r2=637529& I think that the next step is probably to have a close review of the DERBY-2351 notes and try to understand why the DERBY-2351 changes are in conflict with the changes in your proposed patch. It's quite possible that your patch is implementing the desired behavior, but it would better still if we can understand why DERBY-2351 felt that the other behavior was desirable.
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan,

          Okey, I understood the importance.

          I will study on DERBY-2351.

          Show
          Nirmal Fernando added a comment - Hi Bryan, Okey, I understood the importance. I will study on DERBY-2351 .
          Hide
          Nirmal Fernando added a comment -

          Hi Bryan,

          As I understood after going through DERBY-2351, we must allow queries like:

          select distinct i from t order by i*2

          should be allow as this functionality may have been using by Derby users, in there applications. Derby allows this is because i is there in the distinct even though i*2 is not there.

          So, if we follow this rule it implies that:
          select distinct * from t order by i*2
          is also allow.

          To check whether a column reference in a distinct is contained in this order by column I have to perform a check for each operand in both ways (j*2 and 2*j).

          I have attached a patch proposal. Your reviews are highly appreciated.

          Note: This patch is not tend to handle queries like below:
          select distinct i from t order by random()
          select distinct i from t order by j*2*3;

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Bryan, As I understood after going through DERBY-2351 , we must allow queries like: select distinct i from t order by i*2 should be allow as this functionality may have been using by Derby users, in there applications. Derby allows this is because i is there in the distinct even though i*2 is not there. So, if we follow this rule it implies that: select distinct * from t order by i*2 is also allow. To check whether a column reference in a distinct is contained in this order by column I have to perform a check for each operand in both ways (j*2 and 2*j). I have attached a patch proposal. Your reviews are highly appreciated. Note: This patch is not tend to handle queries like below: select distinct i from t order by random() select distinct i from t order by j*2*3; Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Hi Nirmal,

          If I understand correctly, the patch only addresses the case with a binary operator with one numeric constant operand and one result column operand. Unfortunately, that's just one out of an infinite number of possible shapes the ORDER BY clause could take, and we cannot address each one of those cases individually, so I think we should aim for a more general approach.

          Perhaps this would work:

          When DISTINCT is specified, do this for each OrderByColumn:

          1) Check if there's a column in the target's result column list with an equivalent expression (can be checked with the isEquivalent() method in ValueNode). If there is one, this OrderByColumn is OK.

          2) Otherwise, collect all columns referenced by the expression in the OrderByColumn (CollectNodesVisitor could probably do this for you), and check if all the columns are also in the target's result column list. If they all are, the OrderByColumn is OK. If not, throw an exception.

          Show
          Knut Anders Hatlen added a comment - Hi Nirmal, If I understand correctly, the patch only addresses the case with a binary operator with one numeric constant operand and one result column operand. Unfortunately, that's just one out of an infinite number of possible shapes the ORDER BY clause could take, and we cannot address each one of those cases individually, so I think we should aim for a more general approach. Perhaps this would work: When DISTINCT is specified, do this for each OrderByColumn: 1) Check if there's a column in the target's result column list with an equivalent expression (can be checked with the isEquivalent() method in ValueNode). If there is one, this OrderByColumn is OK. 2) Otherwise, collect all columns referenced by the expression in the OrderByColumn (CollectNodesVisitor could probably do this for you), and check if all the columns are also in the target's result column list. If they all are, the OrderByColumn is OK. If not, throw an exception.
          Hide
          Nirmal Fernando added a comment -

          Thanks Knut, for reviewing the patch, yes me too felt that it is too generic and thanks for suggesting me possible two ways, that was really helpful. I think the most appropriate way is the second way, since we want queries like select distinct c1, c2 from t1 order by c1+1 to be valid.

          The patch DERBY-4371-4.diff contains the new modifications. Highly appreciate reviews from all of you. Thanks.

          I am now running the tests.

          Show
          Nirmal Fernando added a comment - Thanks Knut, for reviewing the patch, yes me too felt that it is too generic and thanks for suggesting me possible two ways, that was really helpful. I think the most appropriate way is the second way, since we want queries like select distinct c1, c2 from t1 order by c1+1 to be valid. The patch DERBY-4371 -4.diff contains the new modifications. Highly appreciate reviews from all of you. Thanks. I am now running the tests.
          Hide
          Knut Anders Hatlen added a comment - - edited

          Hi Nirmal,

          Sorry for not being clear. I didn't mean to suggest two different approaches, but one approach with two steps. If we only implement step 2, we won't allow statements like this one (which are allowed according to the standard):

          SELECT DISTINCT C1+1 FROM T ORDER BY C1+1

          Show
          Knut Anders Hatlen added a comment - - edited Hi Nirmal, Sorry for not being clear. I didn't mean to suggest two different approaches, but one approach with two steps. If we only implement step 2, we won't allow statements like this one (which are allowed according to the standard): SELECT DISTINCT C1+1 FROM T ORDER BY C1+1
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          It's okey. With a slight modification I managed to use 2) and it worked fine.
          Method:
          It checks the ColumnReferences in distinct clause (using CollectNodesVisitor ), with ColumnReferences in OrderByColumn.

          This successfully allow the above query.

          Any thoughts of this approach?

          Show
          Nirmal Fernando added a comment - Hi Knut, It's okey. With a slight modification I managed to use 2) and it worked fine. Method: It checks the ColumnReferences in distinct clause (using CollectNodesVisitor ), with ColumnReferences in OrderByColumn. This successfully allow the above query. Any thoughts of this approach?
          Hide
          Knut Anders Hatlen added a comment -

          Hi Nirmal,

          After a quick look at the patch, I have these comments:

          I don't think the CollectNodesVisitor should be used on the expressions in the select list, as that would allow too many queries to be executed. For example, the #4 patch will allow this statement:

          select distinct i*j from t1 order by i/j

          This statement should be rejected because none of the columns I or J, or the expression I/J are contained directly in the select list (both I and J are in an expression in the select list, but not on the top level, so there's no functional dependency between the columns in the select list and the expression in the order by clause).

          I'm not sure if it's enough to check the name of the columns to see if it's the same column, as they may refer to columns in different tables. Perhaps ColumnReference.isEquivalent() would work better?

          I don't think column names should be compared with String.equalsIgnoreCase(), as column names are case sensitive. For example, I think this select statement is supposed to fail because "I" is not the same column as "i":

          ij> create table t2("I" int , "i" int);
          0 rows inserted/updated/deleted
          ij> select distinct "I"+1 from t2 order by "i"+1;
          1
          -----------

          0 rows selected

          Show
          Knut Anders Hatlen added a comment - Hi Nirmal, After a quick look at the patch, I have these comments: I don't think the CollectNodesVisitor should be used on the expressions in the select list, as that would allow too many queries to be executed. For example, the #4 patch will allow this statement: select distinct i*j from t1 order by i/j This statement should be rejected because none of the columns I or J, or the expression I/J are contained directly in the select list (both I and J are in an expression in the select list, but not on the top level, so there's no functional dependency between the columns in the select list and the expression in the order by clause). I'm not sure if it's enough to check the name of the columns to see if it's the same column, as they may refer to columns in different tables. Perhaps ColumnReference.isEquivalent() would work better? I don't think column names should be compared with String.equalsIgnoreCase(), as column names are case sensitive. For example, I think this select statement is supposed to fail because "I" is not the same column as "i": ij> create table t2("I" int , "i" int); 0 rows inserted/updated/deleted ij> select distinct "I"+1 from t2 order by "i"+1; 1 ----------- 0 rows selected
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          Thanks for the comments.

          I'll look into your first and second comments.

          I printed out a column name given by ColumnReference.getColumnName(), though the column in the table is 'i' it shows as 'I'. To confirm this I created a new table with columns 's' and inserted some values to it, and issued select * command, and the result was displayed as 'S'. That's why I used that method, but now only I knew that we can issue case sensitive column names, by adding within a quotation mark(").

          Show
          Nirmal Fernando added a comment - Hi Knut, Thanks for the comments. I'll look into your first and second comments. I printed out a column name given by ColumnReference.getColumnName(), though the column in the table is 'i' it shows as 'I'. To confirm this I created a new table with columns 's' and inserted some values to it, and issued select * command, and the result was displayed as 'S'. That's why I used that method, but now only I knew that we can issue case sensitive column names, by adding within a quotation mark(").
          Hide
          Nirmal Fernando added a comment -

          Hi,

          I'll resume my work on this issue.

          But I've a small doubt.
          Should we allow following type of queries:
          "select distinct i from t order by i/j;"

          thanks.

          Show
          Nirmal Fernando added a comment - Hi, I'll resume my work on this issue. But I've a small doubt. Should we allow following type of queries: "select distinct i from t order by i/j;" thanks.
          Hide
          Lily Wei added a comment -

          Hi Nirmal:
          I am new to this issue. However, will "select distinct i from t order by i/j" fits in to 2) collect all columns referenced by the expression in the OrderByColumn (CollectNodesVisitor could probably do this for you), and check if all the columns are also in the target's result column list. If they all are, the OrderByColumn is OK. If not, throw an exception. "j" is not in the column of target's result column list so we throw an exception.

          Show
          Lily Wei added a comment - Hi Nirmal: I am new to this issue. However, will "select distinct i from t order by i/j" fits in to 2) collect all columns referenced by the expression in the OrderByColumn (CollectNodesVisitor could probably do this for you), and check if all the columns are also in the target's result column list. If they all are, the OrderByColumn is OK. If not, throw an exception. "j" is not in the column of target's result column list so we throw an exception.
          Hide
          Nirmal Fernando added a comment -

          Hi Lily,

          Thanks for the reply.

          I got confused with Knut's last post

          i.e. select distinct i*j from t1 order by i/j
          This statement should be rejected because none of the columns I or J, or the expression I/J are contained directly in the select list (both I and J are in an expression in the select list, but not on the top level, so there's no functional dependency between the columns in the select list and the expression in the order by clause).

          What you think? Isn't this implies as it's enough to have I OR J? May be Knut accidentally mistyped it.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Lily, Thanks for the reply. I got confused with Knut's last post i.e. select distinct i*j from t1 order by i/j This statement should be rejected because none of the columns I or J, or the expression I/J are contained directly in the select list (both I and J are in an expression in the select list, but not on the top level, so there's no functional dependency between the columns in the select list and the expression in the order by clause). What you think? Isn't this implies as it's enough to have I OR J? May be Knut accidentally mistyped it. Thanks.
          Hide
          Nirmal Fernando added a comment -

          Hi,

          I'm attaching patch #5 which is capable of fixing this issue.
          It would be nice if someone can review this.

          PS: This will allow a query like
          select distinct i from t order by i/j;
          for now. After I get clarified whether this type of
          query is valid, I'll update the patch.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi, I'm attaching patch #5 which is capable of fixing this issue. It would be nice if someone can review this. PS: This will allow a query like select distinct i from t order by i/j; for now. After I get clarified whether this type of query is valid, I'll update the patch. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          > Should we allow following type of queries:
          > "select distinct i from t order by i/j;"

          No, this query should not be allowed. Consider the table

          i | j | i/j
          ---------------
          100 | 10 | 10
          100 | 100 | 1
          200 | 40 | 5
          3 | 1 | 3

          Here, the ORDER BY clause is ambiguous, because it's not clear whether
          100 should be the first value or the last value in the result.

          Show
          Knut Anders Hatlen added a comment - > Should we allow following type of queries: > "select distinct i from t order by i/j;" No, this query should not be allowed. Consider the table i | j | i/j --------------- 100 | 10 | 10 100 | 100 | 1 200 | 40 | 5 3 | 1 | 3 Here, the ORDER BY clause is ambiguous, because it's not clear whether 100 should be the first value or the last value in the result.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          Thanks for the explanation!
          I'll update the patch soon!

          Show
          Nirmal Fernando added a comment - Hi Knut, Thanks for the explanation! I'll update the patch soon!
          Hide
          Nirmal Fernando added a comment -

          Attaching the updated patch. Following our few queries which I've tested:

          Failing queries:
          select distinct i from t order by j;
          select distinct i from t order by j*2;
          select distinct i,j*3 from t order by j*2;
          select distinct i from t order by i*2,j*3;
          select distinct i*j from t order by i/j;
          select distinct i from t order by -j;
          select distinct i from t order by i/j;

          Passing queries:
          select distinct i,j from t order by j;
          select distinct i,j from t order by j*2;
          select distinct i,j*2 from t order by j*2;
          select distinct i,j from t order by i*2,j*3;
          select distinct * from t order by j*2, i*3;

          Thanks.

          Show
          Nirmal Fernando added a comment - Attaching the updated patch. Following our few queries which I've tested: Failing queries: select distinct i from t order by j; select distinct i from t order by j*2; select distinct i,j*3 from t order by j*2; select distinct i from t order by i*2,j*3; select distinct i*j from t order by i/j; select distinct i from t order by -j; select distinct i from t order by i/j; Passing queries: select distinct i,j from t order by j; select distinct i,j from t order by j*2; select distinct i,j*2 from t order by j*2; select distinct i,j from t order by i*2,j*3; select distinct * from t order by j*2, i*3; Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the patch, Nirmal. This looks promising! I found the new logic in bindOrderByColumn() a bit hard to follow, though, mainly because of the many nesting levels. Do you think it is possible to factor out some of that code into helper methods to improve the readability?

          It would be great if you could add the queries you posted above to one of the regression tests (both the ones that are expected to work, and those that are expected to fail).

          Show
          Knut Anders Hatlen added a comment - Thanks for the patch, Nirmal. This looks promising! I found the new logic in bindOrderByColumn() a bit hard to follow, though, mainly because of the many nesting levels. Do you think it is possible to factor out some of that code into helper methods to improve the readability? It would be great if you could add the queries you posted above to one of the regression tests (both the ones that are expected to work, and those that are expected to fail).
          Hide
          Knut Anders Hatlen added a comment -

          bindOrderByColumn() doesn't seem to handle the case where the CollectNodeVisitor returns an empty list. For example, these two queries used to work, but now they fail:

          ij> select distinct i from t order by 1+1;
          ERROR 42879: The ORDER BY clause may not contain column 'null', since the query specifies DISTINCT and that column does not appear in the query result.
          ij> select distinct i from t order by random();
          ERROR 42879: The ORDER BY clause may not contain column 'null', since the query specifies DISTINCT and that column does not appear in the query result.

          Show
          Knut Anders Hatlen added a comment - bindOrderByColumn() doesn't seem to handle the case where the CollectNodeVisitor returns an empty list. For example, these two queries used to work, but now they fail: ij> select distinct i from t order by 1+1; ERROR 42879: The ORDER BY clause may not contain column 'null', since the query specifies DISTINCT and that column does not appear in the query result. ij> select distinct i from t order by random(); ERROR 42879: The ORDER BY clause may not contain column 'null', since the query specifies DISTINCT and that column does not appear in the query result.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          Thanks for reviewing the patch!

          I have attached the patch #6, this will allow those queries you have mentioned and broke the code into another method for the ease of read.

          PS: I'll attach a different patch including the changes to the orderby.out.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Knut, Thanks for reviewing the patch! I have attached the patch #6, this will allow those queries you have mentioned and broke the code into another method for the ease of read. PS: I'll attach a different patch including the changes to the orderby.out. Thanks.
          Hide
          Nirmal Fernando added a comment -

          Hi,

          Here's the diff of tests added. Tests ran successfully in my environment.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi, Here's the diff of tests added. Tests ran successfully in my environment. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Nirmal, the new patch was easier to read. And thanks for adding all the test cases.

          I believe the code is correct, but it's still somewhat tricky to understand, especially because of the nesting levels and the way it breaks out of the loops. Here are some suggestions that I think would make the code easier to follow:

          1) Reduce the nesting level in the else branch in bindOrderByColumn(). Instead of the current code with two levels

          if (addedColumnOffset >= 0 && ....) {
          if (!expressionMatch(target)) {

          merge them into one if statement:

          if (addedColumnOffset >= 0 && ....
          && !expressionMatch(target)) {

          2. Instead of breaking out of the loop in bindOrderByColumn() with the "break" keyword when a non-matching column is found, just throw the exception immediately. Then it's easier to see what the effect of a mismatch is.

          3. Given that the "throw" statement is moved inside the loop as part of comment (2), there's no need to check explicitly whether the list of collected nodes was empty, so "if(collectNodesVisitor.getList().isEmpty())" can be removed.

          4. The results of some sub-expressions could be put in to local variables to avoid having to repeat them. For example, target.getResultColumns() in expressionMatch(), and target.getResultColumns() and target.getResultColumns().getResultColumn.getExpression() in columnMatchFound().

          5. The loop in columnMatchFound() could be simplified by returning true immediately when a match is found, instead of setting a flag and then checking the flag further down in the loop body.

          Show
          Knut Anders Hatlen added a comment - Thanks Nirmal, the new patch was easier to read. And thanks for adding all the test cases. I believe the code is correct, but it's still somewhat tricky to understand, especially because of the nesting levels and the way it breaks out of the loops. Here are some suggestions that I think would make the code easier to follow: 1) Reduce the nesting level in the else branch in bindOrderByColumn(). Instead of the current code with two levels if (addedColumnOffset >= 0 && ....) { if (!expressionMatch(target)) { merge them into one if statement: if (addedColumnOffset >= 0 && .... && !expressionMatch(target)) { 2. Instead of breaking out of the loop in bindOrderByColumn() with the "break" keyword when a non-matching column is found, just throw the exception immediately. Then it's easier to see what the effect of a mismatch is. 3. Given that the "throw" statement is moved inside the loop as part of comment (2), there's no need to check explicitly whether the list of collected nodes was empty, so "if(collectNodesVisitor.getList().isEmpty())" can be removed. 4. The results of some sub-expressions could be put in to local variables to avoid having to repeat them. For example, target.getResultColumns() in expressionMatch(), and target.getResultColumns() and target.getResultColumns().getResultColumn .getExpression() in columnMatchFound(). 5. The loop in columnMatchFound() could be simplified by returning true immediately when a match is found, instead of setting a flag and then checking the flag further down in the loop body.
          Hide
          Nirmal Fernando added a comment -

          Hi Knut,

          Patch #7 is included the modifications that you've suggested.

          Thanks.

          Show
          Nirmal Fernando added a comment - Hi Knut, Patch #7 is included the modifications that you've suggested. Thanks.
          Hide
          Knut Anders Hatlen added a comment -

          Thank you, Nirmal! The #7 patch looks good to me. I've checked it in along with the new test cases.

          Committed revision 989048.

          Show
          Knut Anders Hatlen added a comment - Thank you, Nirmal! The #7 patch looks good to me. I've checked it in along with the new test cases. Committed revision 989048.
          Hide
          Nirmal Fernando added a comment -

          Thanks Knut for committing the patch!!

          Show
          Nirmal Fernando added a comment - Thanks Knut for committing the patch!!
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.

            People

            • Assignee:
              Nirmal Fernando
              Reporter:
              Bernt M. Johnsen
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development