Derby
  1. Derby
  2. DERBY-4631

Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

    Details

    • Urgency:
      Normal
    • Issue & fix info:
      Release Note Needed, Repro attached
    • Bug behavior facts:
      Deviation from standard, Wrong query result

      Description

      SQL:2003 says that the join columns in a natural join or in a named
      columns join should be added to the select list by coalescing the
      column from the left table with the column from the right table.

      Section 7.7, <joined table>, syntax rules:

      > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
      > reference> or <table factor> that is the second operand of the
      > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
      > respectively. Let TA and TB be the range variables of TR1 and TR2,
      > respectively. (...)

      and

      > 7) If NATURAL is specified or if a <join specification> immediately
      > containing a <named columns join> is specified, then:
      (...)
      > d) If there is at least one corresponding join column, then let SLCC
      > be a <select list> of <derived column>s of the form
      >
      > COALESCE ( TA.C, TB.C ) AS C
      >
      > for every column C that is a corresponding join column, taken in
      > order of their ordinal positions in RT1.

      For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
      rather just TB.C (the column in the right table) directly.

      This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
      invariant in a right outer join. (Because TA.C is either NULL or equal
      to TB.C.)

      However, in a database with territory-based collation, equality
      between two values does not mean they are identical, especially now
      that the strength of the collator can be specified (DERBY-1748).

      Take for instance this join:

      ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
      ij> create table big(x varchar(5));
      0 rows inserted/updated/deleted
      ij> insert into big values 'A','B','C';
      3 rows inserted/updated/deleted
      ij> create table small(x varchar(5));
      0 rows inserted/updated/deleted
      ij> insert into small values 'b','c','d';
      3 rows inserted/updated/deleted
      ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
      X |X |X |4
      -----------------------
      A |NULL |A |A
      B |b |B |b
      C |c |C |c

      3 rows selected

      I believe that the expected result from the above query is that the
      first column should have the same values as the last column. That is,
      the first column should contain

      {'A', 'b', 'c'}

      , not

      {'A', 'B', 'C'}

      .

      1. releaseNote.html
        3 kB
        Mamta A. Satoor
      2. DERBY_4631_patch7_diff.txt
        49 kB
        Mamta A. Satoor
      3. DERBY_4631_patch6_diff.txt
        47 kB
        Mamta A. Satoor
      4. DERBY_4631_patch5_diff.txt
        13 kB
        Mamta A. Satoor
      5. DERBY_4631_patch4_diff.txt
        11 kB
        Mamta A. Satoor
      6. DERBY_4631_patch3_stat.txt
        0.3 kB
        Mamta A. Satoor
      7. DERBY_4631_patch3_diff.txt
        17 kB
        Mamta A. Satoor
      8. DERBY_4631_not_for_commit_patch2_stat.txt
        0.5 kB
        Mamta A. Satoor
      9. DERBY_4631_not_for_commit_patch2_diff.txt
        13 kB
        Mamta A. Satoor
      10. DERBY_4631_not_for_commit_patch1_stat.txt
        0.4 kB
        Mamta A. Satoor
      11. DERBY_4631_not_for_commit_patch1_diff.txt
        11 kB
        Mamta A. Satoor

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          This problem also makes us accept some statements that should not be accepted. For example:

          SELECT country,count(country) FROM
          COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country

          Here, the query should be rejected because countries.country is not specified in the select list. However, it's accepted because we replace the column country with countries.country instead of coalesce(countries.country, cities.country).

          Show
          Knut Anders Hatlen added a comment - This problem also makes us accept some statements that should not be accepted. For example: SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country Here, the query should be rejected because countries.country is not specified in the select list. However, it's accepted because we replace the column country with countries.country instead of coalesce(countries.country, cities.country).
          Hide
          Mamta A. Satoor added a comment -

          It seems like what we need is to replace the join column references(which are not tied to the join table specifically) in select column list to COALESCE functions. Additionally, in case of a named join order, the join column names in USING clause should be replaced with COALESCE function. eg of each of the cases
          select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
          The first column in the select above which is x should be internally replaced with coalesce(t1.x, t2.x)

          SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country
          The first column in the select and the COUNTRY in USING clause should be replaced with coalesce internally.

          I looked at the code a bindExpressions code in impl.sql.compile.JoinNode There, we already have a code to recognize natural join
          if (naturalJoin)

          { usingClause = getCommonColumnsForNaturalJoin(); }

          I think after this code, we should go through join columns in select column list and in USING clause and replace them with corresponding COALESEC functions.

          Show
          Mamta A. Satoor added a comment - It seems like what we need is to replace the join column references(which are not tied to the join table specifically) in select column list to COALESCE functions. Additionally, in case of a named join order, the join column names in USING clause should be replaced with COALESCE function. eg of each of the cases select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2; The first column in the select above which is x should be internally replaced with coalesce(t1.x, t2.x) SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country The first column in the select and the COUNTRY in USING clause should be replaced with coalesce internally. I looked at the code a bindExpressions code in impl.sql.compile.JoinNode There, we already have a code to recognize natural join if (naturalJoin) { usingClause = getCommonColumnsForNaturalJoin(); } I think after this code, we should go through join columns in select column list and in USING clause and replace them with corresponding COALESEC functions.
          Hide
          Mamta A. Satoor added a comment -

          I debugged the code to figure out what (and where) are we doing in the code which causes us to give wrong results for join column in case of territory based database and right outer join with NATURAL or USING clause. As Knut pointed out earlier in this jira, as per the SQL spec, "the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table. " What I have found is that Derby decides to pick up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. This is not a problem when we are dealing with non-territory based databases but the assumption to rely on just one table's join column is incorrect when working with territory based databases. Following is the test case I used for debugging which further explains Derby's current implementation.

          connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
          create table big(x varchar(5));
          insert into big values 'A','B', null;
          create table small(x varchar(5));
          insert into small values 'b','c', null;
          select * from small t1 natural left outer join big t2;
          select * from small t1 natural right outer join big t2;

          For both natural left outer join and natural right outer join, at execution time, we create a merged row which has columns merged from the left and right tables. The column(in my example, there is only one column)s in the select sql maps to a column from the merged row. The mapping is determined at the sql compile phase.
          In the case of 'select * from small t1 natural left outer join big t2', there will be three merged rows with 2 columns each
          'b' 'B'
          'c' null
          null null
          And for natural left outer join, the generated code has column in the select SQL map to the first column in the merged row. This will always work fine even in a territory based database because as per the SQL standards, the column x should be equivalent to the return value of coalesce(t1.x, t2.x). Since we are working with left outer join, then if the first column in the merged row is null, then even the 2nd column(from the right table) will be null and hence it is ok to always pick up the value from the 1st column in the merged row. This mapping will always lead column x to have the same value as coalesce(t1.x, t2.x).

          But for a territory based database, we can't count on a logic like that for natural right outer join. The way Derby works right now, the column x in the select always gets mapped to the 2nd column in the merged row. In the case of 'select * from small t1 natural right outer join big t2', there will be three merged rows with 2 columns each
          null 'A'
          'b' 'B'
          null null
          And for natural right outer join, the generated code has column in the select SQL map to the second column in the merged row. This will work fine in a non-territory database, because if column 1 in the merged row has a non-null value, then it will always be the same value as the column 2 in the merged row. But in our example, with territor based database(with SECONDARY strength, meaning it is case insensitive comparison), values 'B' and 'b' are considered equal. Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd column of the merged row. For natural right outer join with the data given in the example above,
          coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of column x in the SELECT to the 2nd column in the merged row will return 'A', 'B' and null thus returning data which does not comply with SQL standard which says that column x's value should be the return value of coalesce(t1.x, t2.x). So it seems like may be we need some of kind projection in case of natural right outer join (rather than simple column mapping to the 2nd column which is what happens right now) so that we look at both the columns in the merged row to determine the value of column x.

          Hope this explanation helps understand what Derby is doing internally and based on that, we can come up with some proposal to fix the issue.

          Show
          Mamta A. Satoor added a comment - I debugged the code to figure out what (and where) are we doing in the code which causes us to give wrong results for join column in case of territory based database and right outer join with NATURAL or USING clause. As Knut pointed out earlier in this jira, as per the SQL spec, "the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table. " What I have found is that Derby decides to pick up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. This is not a problem when we are dealing with non-territory based databases but the assumption to rely on just one table's join column is incorrect when working with territory based databases. Following is the test case I used for debugging which further explains Derby's current implementation. connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; create table big(x varchar(5)); insert into big values 'A','B', null; create table small(x varchar(5)); insert into small values 'b','c', null; select * from small t1 natural left outer join big t2; select * from small t1 natural right outer join big t2; For both natural left outer join and natural right outer join, at execution time, we create a merged row which has columns merged from the left and right tables. The column(in my example, there is only one column)s in the select sql maps to a column from the merged row. The mapping is determined at the sql compile phase. In the case of 'select * from small t1 natural left outer join big t2', there will be three merged rows with 2 columns each 'b' 'B' 'c' null null null And for natural left outer join, the generated code has column in the select SQL map to the first column in the merged row. This will always work fine even in a territory based database because as per the SQL standards, the column x should be equivalent to the return value of coalesce(t1.x, t2.x). Since we are working with left outer join, then if the first column in the merged row is null, then even the 2nd column(from the right table) will be null and hence it is ok to always pick up the value from the 1st column in the merged row. This mapping will always lead column x to have the same value as coalesce(t1.x, t2.x). But for a territory based database, we can't count on a logic like that for natural right outer join. The way Derby works right now, the column x in the select always gets mapped to the 2nd column in the merged row. In the case of 'select * from small t1 natural right outer join big t2', there will be three merged rows with 2 columns each null 'A' 'b' 'B' null null And for natural right outer join, the generated code has column in the select SQL map to the second column in the merged row. This will work fine in a non-territory database, because if column 1 in the merged row has a non-null value, then it will always be the same value as the column 2 in the merged row. But in our example, with territor based database(with SECONDARY strength, meaning it is case insensitive comparison), values 'B' and 'b' are considered equal. Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd column of the merged row. For natural right outer join with the data given in the example above, coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of column x in the SELECT to the 2nd column in the merged row will return 'A', 'B' and null thus returning data which does not comply with SQL standard which says that column x's value should be the return value of coalesce(t1.x, t2.x). So it seems like may be we need some of kind projection in case of natural right outer join (rather than simple column mapping to the 2nd column which is what happens right now) so that we look at both the columns in the merged row to determine the value of column x. Hope this explanation helps understand what Derby is doing internally and based on that, we can come up with some proposal to fix the issue.
          Hide
          Mamta A. Satoor added a comment -

          As mentioned earlier in this jira, the SQL spec says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table. Derby implements the SQL spec by using left table's column value when doing natural left outer join. For natural right outer join, Derby uses the right
          table's join column value. This logic correctly implements the SQL spec requirements for both left and right outer joins for non-territory based database and for left outer join for territory based database but the logic for the natural right outer join doesn't work for territory based database. Taking the example from earlier in this jira
          create table big(x varchar(5));
          insert into big values 'A','B';
          create table small(x varchar(5));
          insert into small values 'b','c';

          For this data, following shows that selecting the join column doesn't give the same results as coalesce(t1.x, t2.x)
          select x, t1.x t1x, t2.x t2x, coalesce(t1.x, t2.x) cx from small t1 natural right outer join big t2;
          X |T1X |T2X |CX
          --------------------------------------------------------------------
          A |NULL |A |A
          B |b |B |b

          For the 1st row above, coalesce(t1.x, t2.x) will return 'A' and that is what we got for the join column in that row, so we are good for the 1st row. But for the 2nd row, coalesce(t1.x, t2.x) will return 'b' whereas the join column for that row shows 'B'. This is because as per Derby's implementation, for natural right outer join, we just pick the value from the right table row which for the 2nd row happens to be 'B'.

          We can leave the logic as it is for natural left outer joins since it works fine for both territory and non-terrtory based databases. We can also leave the logic untouched for natural right outer joins for non-territory based databases. The only broken case is natural right outer join in case of territory based database. For this specific case, we can generate a project restrict resultset which will pick the join column's value based on following logic
          1)if the left table's column value is null then pick up the right table's column's value.
          2)If the left table's column value is non-null, then pick up that value

          I have not done much work in code generation and hence wanted to run this logic by the community to see if anyone has any feedback and if this looks like the correct approach to solve the problem. Any suggestions on alternative/better fix?

          Show
          Mamta A. Satoor added a comment - As mentioned earlier in this jira, the SQL spec says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table. Derby implements the SQL spec by using left table's column value when doing natural left outer join. For natural right outer join, Derby uses the right table's join column value. This logic correctly implements the SQL spec requirements for both left and right outer joins for non-territory based database and for left outer join for territory based database but the logic for the natural right outer join doesn't work for territory based database. Taking the example from earlier in this jira create table big(x varchar(5)); insert into big values 'A','B'; create table small(x varchar(5)); insert into small values 'b','c'; For this data, following shows that selecting the join column doesn't give the same results as coalesce(t1.x, t2.x) select x, t1.x t1x, t2.x t2x, coalesce(t1.x, t2.x) cx from small t1 natural right outer join big t2; X |T1X |T2X |CX -------------------------------------------------------------------- A |NULL |A |A B |b |B |b For the 1st row above, coalesce(t1.x, t2.x) will return 'A' and that is what we got for the join column in that row, so we are good for the 1st row. But for the 2nd row, coalesce(t1.x, t2.x) will return 'b' whereas the join column for that row shows 'B'. This is because as per Derby's implementation, for natural right outer join, we just pick the value from the right table row which for the 2nd row happens to be 'B'. We can leave the logic as it is for natural left outer joins since it works fine for both territory and non-terrtory based databases. We can also leave the logic untouched for natural right outer joins for non-territory based databases. The only broken case is natural right outer join in case of territory based database. For this specific case, we can generate a project restrict resultset which will pick the join column's value based on following logic 1)if the left table's column value is null then pick up the right table's column's value. 2)If the left table's column value is non-null, then pick up that value I have not done much work in code generation and hence wanted to run this logic by the community to see if anyone has any feedback and if this looks like the correct approach to solve the problem. Any suggestions on alternative/better fix?
          Hide
          Knut Anders Hatlen added a comment -

          If that approach doesn't work, another possibility might be to insert a CoalesceFunctionNode in JoinNode.buildRCL() or perhaps in JoinNode.getAllResultColumns().

          Show
          Knut Anders Hatlen added a comment - If that approach doesn't work, another possibility might be to insert a CoalesceFunctionNode in JoinNode.buildRCL() or perhaps in JoinNode.getAllResultColumns().
          Hide
          Mamta A. Satoor added a comment -

          Want to copy the commit comments from the revision r1230873. After the commit comments, I have proposal for 2 possible fixes for the problems described in the commit comments.
          *************************************
          DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

          I am adding more tests for this jira to show the kind of joins and queries that are impacted by this defect.

          In short, any join query which is either a NATURAL join or has a USING clause can run into the two problems as described in this jira.

          Problem 1)As per SQL spec, the join column's value should be derived by
          COALESCE(leftTable.joinColumn, rightTable.joinColumn).
          But Derby has it's own rule for deriving the join column's value. Derby's implementation, for a right outer join, gets the join column's value from the right table and for left outer join, it gets the value from the left table. This logic works for most cases, but it can give incorrect value for a right outer join (with NATURAL JOIN or USING clause) in case of a territory based database. Additionally, the join column in the SELECT list(with NATURAL JOIN or USING clause) gets associated with the left table's join column(for inner joins and left outer joins) or it gets associated with the right table's join column(for rihgt outer joins). Since SQL spec requires the join column to be COALESCE ( leftTable.C, rightTable.C ) AS C, the join column should not be really associated with any of the 2 join tables.

          Problem 2)The Derby's assocation of join column to left or right table as described in problem 1) causes it to allow incorrect queries. eg query
          select i from t1_D3880 inner join t2_D3880 USING group by t1_D3880.i;
          The query above works because join column i got associated with left table which is t1_D3880. If the query was rewritten to do the group by on right table, it would fail.
          select i from t1_D3880 inner join t2_D3880 USING group by t2_D3880.i;
          *************************************

          There are 2 solutions that we have talked about in the jira.
          Solution 1)Currently, at execution time, Derby creates a merged row for join column which has columns merged from the left and right tables and picks up the 1st column's value for a left outer join and picks up the 2nd column's value for right outer join. (I don't remember debugging the inner join case but from it's behavior, it probably also picks up the value from the 1st column's value just like left outer join).
          Rather than this logic, we can change the code to pick the join column's value based on following logic
          1)if the left table's column value is null then pick up the right table's column's value.
          2)If the left table's column value is non-null, then pick up that value
          We can implement this logic for all kinds of joins except the cross join(which doesn't allow NATURAL JOIN or USING clause and hence we will never be able to run in the problem described by this jira or we can implement the logic only for right outer joins which is the only case which will run into problem because we do not implement COALESCE. The problem with this solution is it will not catch Problem 2) described above because join column is getting associated with left table or right table depending on what kind of join we are working with.
          Solution 2)This solution will take care of all the problems described earlier. With this solution, at bind time, we should replace the join column with COALESCE as described by SQL spec. This is a cleaner solution then solution 1) because it takes care of all the problem cases but it can cause existing queries with group by using table name for association for join columns will stop working and will have to be rewritten.

          I lean towards solution 2) for it's cleanliness and direct implementation of SQL spec COALESCE behavior for join columns but I am interested to know what are the community's thoughts on this.

          Show
          Mamta A. Satoor added a comment - Want to copy the commit comments from the revision r1230873. After the commit comments, I have proposal for 2 possible fixes for the problems described in the commit comments. ************************************* DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation I am adding more tests for this jira to show the kind of joins and queries that are impacted by this defect. In short, any join query which is either a NATURAL join or has a USING clause can run into the two problems as described in this jira. Problem 1)As per SQL spec, the join column's value should be derived by COALESCE(leftTable.joinColumn, rightTable.joinColumn). But Derby has it's own rule for deriving the join column's value. Derby's implementation, for a right outer join, gets the join column's value from the right table and for left outer join, it gets the value from the left table. This logic works for most cases, but it can give incorrect value for a right outer join (with NATURAL JOIN or USING clause) in case of a territory based database. Additionally, the join column in the SELECT list(with NATURAL JOIN or USING clause) gets associated with the left table's join column(for inner joins and left outer joins) or it gets associated with the right table's join column(for rihgt outer joins). Since SQL spec requires the join column to be COALESCE ( leftTable.C, rightTable.C ) AS C, the join column should not be really associated with any of the 2 join tables. Problem 2)The Derby's assocation of join column to left or right table as described in problem 1) causes it to allow incorrect queries. eg query select i from t1_D3880 inner join t2_D3880 USING group by t1_D3880.i; The query above works because join column i got associated with left table which is t1_D3880. If the query was rewritten to do the group by on right table, it would fail. select i from t1_D3880 inner join t2_D3880 USING group by t2_D3880.i; ************************************* There are 2 solutions that we have talked about in the jira. Solution 1)Currently, at execution time, Derby creates a merged row for join column which has columns merged from the left and right tables and picks up the 1st column's value for a left outer join and picks up the 2nd column's value for right outer join. (I don't remember debugging the inner join case but from it's behavior, it probably also picks up the value from the 1st column's value just like left outer join). Rather than this logic, we can change the code to pick the join column's value based on following logic 1)if the left table's column value is null then pick up the right table's column's value. 2)If the left table's column value is non-null, then pick up that value We can implement this logic for all kinds of joins except the cross join(which doesn't allow NATURAL JOIN or USING clause and hence we will never be able to run in the problem described by this jira or we can implement the logic only for right outer joins which is the only case which will run into problem because we do not implement COALESCE. The problem with this solution is it will not catch Problem 2) described above because join column is getting associated with left table or right table depending on what kind of join we are working with. Solution 2)This solution will take care of all the problems described earlier. With this solution, at bind time, we should replace the join column with COALESCE as described by SQL spec. This is a cleaner solution then solution 1) because it takes care of all the problem cases but it can cause existing queries with group by using table name for association for join columns will stop working and will have to be rewritten. I lean towards solution 2) for it's cleanliness and direct implementation of SQL spec COALESCE behavior for join columns but I am interested to know what are the community's thoughts on this.
          Hide
          Mike Matrigali added a comment -

          So far you have added a number of tests, and some of the comments seem to indicate some of the tests
          are coded so they pass, but that they are actually exhibiting the wrong behavior. I find this confusing, but
          know you prefer this methodology. Could you at least make it clearer in the test comments somehow which
          of the test cases are actually bugs.

          Show
          Mike Matrigali added a comment - So far you have added a number of tests, and some of the comments seem to indicate some of the tests are coded so they pass, but that they are actually exhibiting the wrong behavior. I find this confusing, but know you prefer this methodology. Could you at least make it clearer in the test comments somehow which of the test cases are actually bugs.
          Hide
          Mike Matrigali added a comment -

          I am concerned by the performance characteristics of you proposed solution #2. Could you go into more detail on what it involves and what
          set of queries it will affect. I am most interested in what happens at execution time for a large join. I think my most basic question is in a
          1 to 1 million row join will you be adding 1 million new function calls for the coalesce, or is this somehow optimized after bind? I am assuming
          that the reason the code does not currently do an explicit coalesce already is an optimization, where it was assumed the implementation
          behavior would match the external behavior that the spec is describing.

          Solution 1 seems safer to me, and if implemented we should log the SQL syntax problems you have uncovered in a separate JIRA.

          Also can you explicitly give some queries that currently work today that will not after your change, so that the compatibility impact of your
          solutions can be understood. Do those queries return correct results? Once you list a few of these queries maybe we can try them on other
          platforms to verify that they are incorrect SQL.

          Show
          Mike Matrigali added a comment - I am concerned by the performance characteristics of you proposed solution #2. Could you go into more detail on what it involves and what set of queries it will affect. I am most interested in what happens at execution time for a large join. I think my most basic question is in a 1 to 1 million row join will you be adding 1 million new function calls for the coalesce, or is this somehow optimized after bind? I am assuming that the reason the code does not currently do an explicit coalesce already is an optimization, where it was assumed the implementation behavior would match the external behavior that the spec is describing. Solution 1 seems safer to me, and if implemented we should log the SQL syntax problems you have uncovered in a separate JIRA. Also can you explicitly give some queries that currently work today that will not after your change, so that the compatibility impact of your solutions can be understood. Do those queries return correct results? Once you list a few of these queries maybe we can try them on other platforms to verify that they are incorrect SQL.
          Hide
          Mamta A. Satoor added a comment -

          Revision 1231296 has addressed Mike's concern about the tests. The commit comment was as follows
          ******************************
          DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

          Adding comments to some of the tests, stating that Derby is allowing incorrect behavior because of DERBY-4631. Once the jira is fixed, these tests should start failing and would need to be changed to show the correct behavior.
          ******************************

          Show
          Mamta A. Satoor added a comment - Revision 1231296 has addressed Mike's concern about the tests. The commit comment was as follows ****************************** DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation Adding comments to some of the tests, stating that Derby is allowing incorrect behavior because of DERBY-4631 . Once the jira is fixed, these tests should start failing and would need to be changed to show the correct behavior. ******************************
          Hide
          Mamta A. Satoor added a comment -

          Here are answers to some of Mike's comments
          ********************************************************
          Mike's comment
          I am concerned by the performance characteristics of you proposed solution #2. Could you go into more detail on what it involves and what set of queries it will affect
          ********************************************************
          As part of solution #2, I am proposing that during the bind phase, we are going through the join query's column list, we replace the join column in the list with coalesce function and make sure that newly added coalesce node to the select list gets bound. This node will generate a call to following method in DataTypejava
          public DataValueDescriptor coalesce(DataValueDescriptor[] argumentsList, DataValueDescriptor returnValue)
          throws StandardException
          {
          // arguments list should have at least 2 arguments
          if (SanityManager.DEBUG)

          { SanityManager.ASSERT(argumentsList != null, "argumentsList expected to be non-null"); SanityManager.ASSERT(argumentsList.length > 1, "argumentsList.length expected to be > 1"); }

          /* Walk the arguments list until we find a non-null value. Otherwise we will return null
          */
          int index;
          for (index = 0; index < argumentsList.length; index++)
          {
          if (!(argumentsList[index].isNull()))

          { returnValue.setValue(argumentsList[index]); return returnValue; }

          }

          returnValue.setToNull();
          return returnValue;

          }
          In our case, there will always be only 2 arguments to the coalesce function and the first non-null value will have us return with that value from the method and if for some reason, if both the values are null, then we will return null value. All this work will be done for joins using NATURAL JOINS or USING clause which can happen for inner joins, left outer join and right outer joins. eg of each of these kind of joins
          CREATE TABLE derby4631_t1(x varchar(5));
          INSERT INTO derby4631_t1 VALUES 'A','B';
          CREATE TABLE derby4631_t2(x varchar(5));
          INSERT INTO derby4631_t2 VALUES 'b','c';

          SELECT x FROM derby4631_t2 NATURAL INNER JOIN derby4631_t1;
          SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;
          SELECT x FROM derby4631_t2 NATURAL LEFT OUTER JOIN derby4631_t1;
          select x from derby4631_t2 INNER JOIN derby4631_t1 USING;
          select x from derby4631_t2 RIGHT OUTERJOIN derby4631_t1 USING;
          select x from derby4631_t2 LEFT OUTERJOIN derby4631_t1 USING;

          Currently, at execution time, Derby already does special handling to figure out if it is dealing with left outer join or right outer join and based on that, it determines whether it should pick up the value from the left column or the right column in the merged row(consisting of 2 columns, join column from left and join column from right) for each of the join columns. Instead of this, now we will let coalesce pick up
          1)the left column's value if it is not null. If it is null then
          2)pick up the right column's value if it is not null. If it is null then simply return null.

          So, there is special code happening with both the existing Derby behavior and proposed Derby behavior but at this point, I am not sure how much more/same/less performance impact the new changes might cause. This is all in theory at this point. I have never worked on the code to replace a user supplied column from the select list with another kind of node at bind time, so if we do decide to go this path, I will be greatful to have community's knowledge on this kind of change. I will work on it my own too to figure out what needs to be done if we decide to go with this solution.

          ********************************************************
          Mike's comment
          I am assuming that the reason the code does not currently do an explicit coalesce already is an optimization, where it was assumed the implementation behavior would match the external behavior that the spec is describing,
          ********************************************************
          You are probably right Mike, but I am not sure if coalesce functionality was available in Derby when joins were implemented. It could very well be that we had coalesce available but we didn't use it for optimization reasons but I just wanted to raise that it might be a possibility that we never have had coalesce available.

          Show
          Mamta A. Satoor added a comment - Here are answers to some of Mike's comments ******************************************************** Mike's comment I am concerned by the performance characteristics of you proposed solution #2. Could you go into more detail on what it involves and what set of queries it will affect ******************************************************** As part of solution #2, I am proposing that during the bind phase, we are going through the join query's column list, we replace the join column in the list with coalesce function and make sure that newly added coalesce node to the select list gets bound. This node will generate a call to following method in DataTypejava public DataValueDescriptor coalesce(DataValueDescriptor[] argumentsList, DataValueDescriptor returnValue) throws StandardException { // arguments list should have at least 2 arguments if (SanityManager.DEBUG) { SanityManager.ASSERT(argumentsList != null, "argumentsList expected to be non-null"); SanityManager.ASSERT(argumentsList.length > 1, "argumentsList.length expected to be > 1"); } /* Walk the arguments list until we find a non-null value. Otherwise we will return null */ int index; for (index = 0; index < argumentsList.length; index++) { if (!(argumentsList [index] .isNull())) { returnValue.setValue(argumentsList[index]); return returnValue; } } returnValue.setToNull(); return returnValue; } In our case, there will always be only 2 arguments to the coalesce function and the first non-null value will have us return with that value from the method and if for some reason, if both the values are null, then we will return null value. All this work will be done for joins using NATURAL JOINS or USING clause which can happen for inner joins, left outer join and right outer joins. eg of each of these kind of joins CREATE TABLE derby4631_t1(x varchar(5)); INSERT INTO derby4631_t1 VALUES 'A','B'; CREATE TABLE derby4631_t2(x varchar(5)); INSERT INTO derby4631_t2 VALUES 'b','c'; SELECT x FROM derby4631_t2 NATURAL INNER JOIN derby4631_t1; SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; SELECT x FROM derby4631_t2 NATURAL LEFT OUTER JOIN derby4631_t1; select x from derby4631_t2 INNER JOIN derby4631_t1 USING ; select x from derby4631_t2 RIGHT OUTERJOIN derby4631_t1 USING ; select x from derby4631_t2 LEFT OUTERJOIN derby4631_t1 USING ; Currently, at execution time, Derby already does special handling to figure out if it is dealing with left outer join or right outer join and based on that, it determines whether it should pick up the value from the left column or the right column in the merged row(consisting of 2 columns, join column from left and join column from right) for each of the join columns. Instead of this, now we will let coalesce pick up 1)the left column's value if it is not null. If it is null then 2)pick up the right column's value if it is not null. If it is null then simply return null. So, there is special code happening with both the existing Derby behavior and proposed Derby behavior but at this point, I am not sure how much more/same/less performance impact the new changes might cause. This is all in theory at this point. I have never worked on the code to replace a user supplied column from the select list with another kind of node at bind time, so if we do decide to go this path, I will be greatful to have community's knowledge on this kind of change. I will work on it my own too to figure out what needs to be done if we decide to go with this solution. ******************************************************** Mike's comment I am assuming that the reason the code does not currently do an explicit coalesce already is an optimization, where it was assumed the implementation behavior would match the external behavior that the spec is describing, ******************************************************** You are probably right Mike, but I am not sure if coalesce functionality was available in Derby when joins were implemented. It could very well be that we had coalesce available but we didn't use it for optimization reasons but I just wanted to raise that it might be a possibility that we never have had coalesce available.
          Hide
          Mamta A. Satoor added a comment -

          *********************************************************************
          Mike's comment
          (With solution 2) I think my most basic question is in a 1 to 1 million row join will you be adding 1 million new function calls for the coalesce, or is this somehow optimized after bind?
          *********************************************************************

          Yes, if the join column is part of the join query (say as SELECT list or GROUP BY, WHERE clause etc), then we will be calling it for every qualified row in join. At this point, I think if join column is only part of USING clause or used internally through NATURAL join to do equi join, then I do not think we will have to generate coalesce function for the join columns.

          I debugged through the existing code more and found that to do equijoin for USING clause or NATURAL join, we generate merged row which includes the necessary join columns values from both sides (for USING clause, the necessary join column(s) will be what was specified with the USING clause, for NATURAL joins, it will be all the join columns). If we find that the join column is getting referenced outside of the equi join, then in order to get the join column's value, we do the mapping of the join column to the merged row's column (the mapping happens to either the left table's column value in merged row(if we are dealing with inner join or left outer join) or to the right table's column value(if we are dealing with right outer join)).

          If we go with Solution 1), then this mapping of join column to merge row will need to change so that we look at teft table's column value first. If it is null, then we should pick up the right table's column value. This should happen no matter if we are dealing with inner join, left outer join or right outer join with natural or uisng clause.

          Show
          Mamta A. Satoor added a comment - ********************************************************************* Mike's comment (With solution 2) I think my most basic question is in a 1 to 1 million row join will you be adding 1 million new function calls for the coalesce, or is this somehow optimized after bind? ********************************************************************* Yes, if the join column is part of the join query (say as SELECT list or GROUP BY, WHERE clause etc), then we will be calling it for every qualified row in join. At this point, I think if join column is only part of USING clause or used internally through NATURAL join to do equi join, then I do not think we will have to generate coalesce function for the join columns. I debugged through the existing code more and found that to do equijoin for USING clause or NATURAL join, we generate merged row which includes the necessary join columns values from both sides (for USING clause, the necessary join column(s) will be what was specified with the USING clause, for NATURAL joins, it will be all the join columns). If we find that the join column is getting referenced outside of the equi join, then in order to get the join column's value, we do the mapping of the join column to the merged row's column (the mapping happens to either the left table's column value in merged row(if we are dealing with inner join or left outer join) or to the right table's column value(if we are dealing with right outer join)). If we go with Solution 1), then this mapping of join column to merge row will need to change so that we look at teft table's column value first. If it is null, then we should pick up the right table's column value. This should happen no matter if we are dealing with inner join, left outer join or right outer join with natural or uisng clause.
          Hide
          Mamta A. Satoor added a comment -

          ********************************************************
          Mike's comment
          Also can you explicitly give some queries that currently work today that will not after your change, so that the compatibility impact of your solutions can be understood".
          ********************************************************
          Following queries will be fixed by both the solutions.
          java -Dij.exceptionTrace=true org.apache.derby.tools.ij
          connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
          CREATE TABLE derby4631_t1(x varchar(5));
          INSERT INTO derby4631_t1 VALUES 'A','B';
          CREATE TABLE derby4631_t2(x varchar(5));
          INSERT INTO derby4631_t2 VALUES 'b','c';
          SELECT x, coalesce(derby4631_t2.x, derby4631_t1.x) FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;
          select x, coalesce(derby4631_t2.x, derby4631_t1.x) from derby4631_t2 RIGHT OUTER JOIN derby4631_t1 USING;

          The 2 select queries above return following today(which is incorrect)
          X |2
          -----------
          A |A
          B |b
          2 rows selected

          With both the solutions proposed in this jira, they will return following(correct results)
          X |2
          -----------
          A |A
          b |b
          2 rows selected

          Additionally, each of the 2 proposed solutions affect few other queries, but unfortunately not the same way.

          Here is example query that will be affected by solution 1).
          SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 group by derby4631_t1.x;
          Since join column x can have different results with solution 1(compared to existing behavior), the group by query below will result in different results too. This is kind of obvious but still wanted to point out how different results for join columns can affect join queries with group by/having clause etc

          With solution 2), the group by query above will fail because column derby4631_t1.x is not part of the SELECT columns list. With solution 2), join column x is not going to be assoicated with left table(in case of left outer join and inner joins) or right table(in case of right outer join). Because of that, the group by will result in compilation error but column derby4631_t1.x is not in the SELECT list.

          Show
          Mamta A. Satoor added a comment - ******************************************************** Mike's comment Also can you explicitly give some queries that currently work today that will not after your change, so that the compatibility impact of your solutions can be understood". ******************************************************** Following queries will be fixed by both the solutions. java -Dij.exceptionTrace=true org.apache.derby.tools.ij connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; CREATE TABLE derby4631_t1(x varchar(5)); INSERT INTO derby4631_t1 VALUES 'A','B'; CREATE TABLE derby4631_t2(x varchar(5)); INSERT INTO derby4631_t2 VALUES 'b','c'; SELECT x, coalesce(derby4631_t2.x, derby4631_t1.x) FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; select x, coalesce(derby4631_t2.x, derby4631_t1.x) from derby4631_t2 RIGHT OUTER JOIN derby4631_t1 USING ; The 2 select queries above return following today(which is incorrect) X |2 ----------- A |A B |b 2 rows selected With both the solutions proposed in this jira, they will return following(correct results) X |2 ----------- A |A b |b 2 rows selected Additionally, each of the 2 proposed solutions affect few other queries, but unfortunately not the same way. Here is example query that will be affected by solution 1). SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 group by derby4631_t1.x; Since join column x can have different results with solution 1(compared to existing behavior), the group by query below will result in different results too. This is kind of obvious but still wanted to point out how different results for join columns can affect join queries with group by/having clause etc With solution 2), the group by query above will fail because column derby4631_t1.x is not part of the SELECT columns list. With solution 2), join column x is not going to be assoicated with left table(in case of left outer join and inner joins) or right table(in case of right outer join). Because of that, the group by will result in compilation error but column derby4631_t1.x is not in the SELECT list.
          Hide
          Mamta A. Satoor added a comment -

          With solution 1), we will need to know where the current code generation happens for join columns. Solution 1) requires that we change the code generation to following logic.
          1)if the left table's join column value is null then pick up the right table's join column's value.
          2)If the left table's join column value is non-null, then pick up that value

          Following is what I found while looking for code generation logic for the join columns.

          In the bind phase of a query, we start looking at result columns and assigning virutal column numbers to them (impl.sql.compile.ResultColumn:virtualColumnId). These virtual column ids are used to find mapping for those columns into runtime resultset for the query.

          connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
          CREATE TABLE derby4631_t1(x varchar(5));
          INSERT INTO derby4631_t1 VALUES 'A','B';
          CREATE TABLE derby4631_t2(x varchar(5));
          INSERT INTO derby4631_t2 VALUES 'b','c';
          SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;

          For the NATURAL JOIN query above, an equi-join is generated which will be derby4631_t2.x = derby4631_t1.x. For this equi join, the join column x from left table(derby4631_t2) will get virtual column id of 1 and join column x from right table(derby4631_t1) will get virtual column id of 2. The column x in the SELECT query in Derby today gets associated with right table in case of RIGHT OUTER JOIN and hence that column's virtual column id is also 2. At the time of code generation, we use this information to decide what column position from the run time resultset would be used to get the values. Following is the stack trace of where the code generation for join column happens.

          Thread [main] (Suspended (breakpoint at line 1421 in ProjectRestrictNode))
          ProjectRestrictNode.generateMinion(ExpressionClassBuilder, MethodBuilder, boolean) line: 1421
          ProjectRestrictNode.generate(ActivationClassBuilder, MethodBuilder) line: 1334
          ScrollInsensitiveResultSetNode.generate(ActivationClassBuilder, MethodBuilder) line: 109
          CursorNode.generate(ActivationClassBuilder, MethodBuilder) line: 641
          CursorNode(StatementNode).generate(ByteArray) line: 345
          GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 518
          GenericStatement.prepare(LanguageConnectionContext, boolean) line: 97
          GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 1103
          EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 610
          EmbedStatement40(EmbedStatement).execute(String) line: 559
          ij.executeImmediate(String) line: 367
          utilMain.doCatch(String) line: 527
          utilMain.runScriptGuts() line: 369
          utilMain.go(LocalizedInput[], LocalizedOutput) line: 245
          Main.go(LocalizedInput, LocalizedOutput) line: 229
          Main.mainCore(String[], Main) line: 184
          Main.main(String[]) line: 75
          ij.main(String[]) line: 59

          Show
          Mamta A. Satoor added a comment - With solution 1), we will need to know where the current code generation happens for join columns. Solution 1) requires that we change the code generation to following logic. 1)if the left table's join column value is null then pick up the right table's join column's value. 2)If the left table's join column value is non-null, then pick up that value Following is what I found while looking for code generation logic for the join columns. In the bind phase of a query, we start looking at result columns and assigning virutal column numbers to them (impl.sql.compile.ResultColumn:virtualColumnId). These virtual column ids are used to find mapping for those columns into runtime resultset for the query. connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; CREATE TABLE derby4631_t1(x varchar(5)); INSERT INTO derby4631_t1 VALUES 'A','B'; CREATE TABLE derby4631_t2(x varchar(5)); INSERT INTO derby4631_t2 VALUES 'b','c'; SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; For the NATURAL JOIN query above, an equi-join is generated which will be derby4631_t2.x = derby4631_t1.x. For this equi join, the join column x from left table(derby4631_t2) will get virtual column id of 1 and join column x from right table(derby4631_t1) will get virtual column id of 2. The column x in the SELECT query in Derby today gets associated with right table in case of RIGHT OUTER JOIN and hence that column's virtual column id is also 2. At the time of code generation, we use this information to decide what column position from the run time resultset would be used to get the values. Following is the stack trace of where the code generation for join column happens. Thread [main] (Suspended (breakpoint at line 1421 in ProjectRestrictNode)) ProjectRestrictNode.generateMinion(ExpressionClassBuilder, MethodBuilder, boolean) line: 1421 ProjectRestrictNode.generate(ActivationClassBuilder, MethodBuilder) line: 1334 ScrollInsensitiveResultSetNode.generate(ActivationClassBuilder, MethodBuilder) line: 109 CursorNode.generate(ActivationClassBuilder, MethodBuilder) line: 641 CursorNode(StatementNode).generate(ByteArray) line: 345 GenericStatement.prepMinion(LanguageConnectionContext, boolean, Object[], SchemaDescriptor, boolean) line: 518 GenericStatement.prepare(LanguageConnectionContext, boolean) line: 97 GenericLanguageConnectionContext.prepareInternalStatement(SchemaDescriptor, String, boolean, boolean) line: 1103 EmbedStatement40(EmbedStatement).execute(String, boolean, boolean, int, int[], String[]) line: 610 EmbedStatement40(EmbedStatement).execute(String) line: 559 ij.executeImmediate(String) line: 367 utilMain.doCatch(String) line: 527 utilMain.runScriptGuts() line: 369 utilMain.go(LocalizedInput[], LocalizedOutput) line: 245 Main.go(LocalizedInput, LocalizedOutput) line: 229 Main.mainCore(String[], Main) line: 184 Main.main(String[]) line: 75 ij.main(String[]) line: 59
          Hide
          Mamta A. Satoor added a comment -

          The code generation code in impl.sql.compile.ConditionalNode:generateExpression might be a good resource for us to make make the code generation changes for NATURAL JOIN's join column in the SELECT query. ConditionalNode generates "if then else" code which is similar to what solution 1) will require, ie if lefTablJoinColumn is null, then use rightTableJoinColumnValue else use leftTableJoinColumnValue.

          Show
          Mamta A. Satoor added a comment - The code generation code in impl.sql.compile.ConditionalNode:generateExpression might be a good resource for us to make make the code generation changes for NATURAL JOIN's join column in the SELECT query. ConditionalNode generates "if then else" code which is similar to what solution 1) will require, ie if lefTablJoinColumn is null, then use rightTableJoinColumnValue else use leftTableJoinColumnValue.
          Hide
          Mamta A. Satoor added a comment -

          I stepped through the ConditionalNode's code generation logic to see how "if..then..else.." code is generated. Based on that, I have following pseudo code for code generation for solution 1). The following psuedo code is for generating if(lefTablJoinColumnValue is null) then return rightTableJoinColumnValue else return lefTablJoinColumnValue. I have not done recent work in the code generation part and would appreciate feedback if the psuedo code looks incorrect. I will next work on trying to identify how to make this pseudo code kick in for ResultColumn code generation if we are dealing with join column.

          String receiverType = ClassName.DataValueDescriptor;
          String resultTypeName =
          getTypeCompiler(DataTypeDescriptor.getBuiltInDataTypeDescriptor(Types.BOOLEAN).getTypeId()).interfaceName();

          //Following will generate if(lefTablJoinColumnValue is null)
          //Then call generateExpression on left Table's column
          LeftTableColumn.generateExpression(acb, mb);
          mb.cast(receiverType); // cast the method instance
          mb.callMethod(VMOpcode.INVOKEINTERFACE, (String) null,
          "isNullOp",resultTypeName, 0);
          mb.cast(ClassName.BooleanDataValue);
          mb.push(true);
          mb.callMethod(VMOpcode.INVOKEINTERFACE, (String) null, "equals", "boolean", 1);

          //Following will generate then part of the if condition by generating expression for rightTablJoinColumnValue
          mb.conditionalIf();
          ((ValueNode) RightTableColumn.generateExpression(acb, mb);
          //Following will generate else part of the if condition by generating expression for lefTablJoinColumnValue
          mb.startElseCode();
          ((ValueNode) LeftTableColumn.generateExpression(acb, mb);
          mb.completeConditional();

          Show
          Mamta A. Satoor added a comment - I stepped through the ConditionalNode's code generation logic to see how "if..then..else.." code is generated. Based on that, I have following pseudo code for code generation for solution 1). The following psuedo code is for generating if(lefTablJoinColumnValue is null) then return rightTableJoinColumnValue else return lefTablJoinColumnValue. I have not done recent work in the code generation part and would appreciate feedback if the psuedo code looks incorrect. I will next work on trying to identify how to make this pseudo code kick in for ResultColumn code generation if we are dealing with join column. String receiverType = ClassName.DataValueDescriptor; String resultTypeName = getTypeCompiler(DataTypeDescriptor.getBuiltInDataTypeDescriptor(Types.BOOLEAN).getTypeId()).interfaceName(); //Following will generate if(lefTablJoinColumnValue is null) //Then call generateExpression on left Table's column LeftTableColumn.generateExpression(acb, mb); mb.cast(receiverType); // cast the method instance mb.callMethod(VMOpcode.INVOKEINTERFACE, (String) null, "isNullOp",resultTypeName, 0); mb.cast(ClassName.BooleanDataValue); mb.push(true); mb.callMethod(VMOpcode.INVOKEINTERFACE, (String) null, "equals", "boolean", 1); //Following will generate then part of the if condition by generating expression for rightTablJoinColumnValue mb.conditionalIf(); ((ValueNode) RightTableColumn.generateExpression(acb, mb); //Following will generate else part of the if condition by generating expression for lefTablJoinColumnValue mb.startElseCode(); ((ValueNode) LeftTableColumn.generateExpression(acb, mb); mb.completeConditional();
          Hide
          Mamta A. Satoor added a comment -

          I have a patch which is not ready for commit. It is a work in progress for the solution 1) proposed earlier in the jira which is as follows
          Solution 1) requires that we change the code generation to following logic.
          1)if the left table's join column value is null then pick up the right table's join column's value.
          2)If the left table's join column value is non-null, then pick up that value
          We should have the solution 1) kickin for only RIGHT OUTER JOIN with USING clause or NATURAL JOIN.
          LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN will work correctly with current Derby logic which is to always pickup the left table's join column value.
          This will work for LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN in both territory and non-territory based databases

          The attached patch now makes the following query return the correct results
          java -Dij.exceptionTrace=true org.apache.derby.tools.ij
          connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
          CREATE TABLE derby4631_t1(x varchar(5));
          INSERT INTO derby4631_t1 VALUES 'A','B';
          CREATE TABLE derby4631_t2(x varchar(5));
          INSERT INTO derby4631_t2 VALUES 'b','c';
          SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;
          ij> X


          A
          b

          The patch in it's current state is quite a bit of hacking because currently a ResultColumn does not need to know if it belongs to a Join ResultSet. But with solution 1), if ResultColumn is a join column part of a RIGHT OUTER JOIN with USING/NATURAL JOIN, then we want the join column's value to be determined using the solution 1) described above. For that, a ResultColumn will now need to know if it is part of a RIGHT OUTER JOIN with USING/NATURAL JOIN and it will need to know the column positions of the left table's
          join column and rigt table's join colunm and it will need to know the resultset number of the resultset from which these join colunms's values will be extracted.

          I get these values in this patch by putting extra information related to joins in ResultColunm at bind time and using them at code generation time.

          This patch may break many other queries and I will continue to work on the patch to make it more stable but I wanted to put the general approach for this solution out sooner than later to see if community has any feedback on keeping the extra information on ResultColumn needed to implement solution 1). Although, all this additional information maintenance is pretty localized and not many files are impacted by this solution.

          Next I plan to work on the patch more to make it stable and do more testing with it to see how it will work for the rest of the queries.

          Alternative to this approach would be to introduce a new kind of compilation node which will be created during the bind phase(unlike most nodes which get created during parsing) when we find that the ResultColumn belongs to Join Node(which means we will still have to do the checking I do in this patch to see if ResultColumn is part of RIGHT OUTER JOIN with USING/NATURAL JOIN but at code generation time,
          we can have the new node do this special code generation which is how we handle all the other special nodes like Conditional If node, Coalesce node etc). This approach of adding new node will require us to somehow fire the binding of the new node after replacing the ResultColumn which was created during the Parse time. At this point, I am unfamiliar with how to replace a node during the bind time with some other node and make it go through the binding step. Also, replacing the ResultColumn with a new node might also impact queries like following where I think order by column from right table is associated with the join column in the select list
          SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 ORDER BY derby4631_t1.x;

          The files changed by this patch are as follows
          $ svn stat -q
          M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
          M java\engine\org\apache\derby\impl\sql\compile\ProjectRestrictNode.java
          M java\engine\org\apache\derby\impl\sql\compile\JoinNode.java
          M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java
          M java\engine\org\apache\derby\impl\sql\compile\HashTableNode.java

          Following explains the changes in brief for the files touched by this patch
          ResultColumn.java has additioanl columns to keep RIGHT OUTER JOIN with USING/NATURAL JOIN information. These
          columns will get updated during bind time if ResultColumn is part of such a join. In my next patch, I think I should be able to remove virtualColumnIdLeftTable and virtualColumnIdRightTable and just get that information
          from joinResultSet.

          The signature of ResultColumnList.mapSourceColumns has changed and hence the changes in
          ProjectRestrictNode.java and HashTableNode.java.

          JoinNode.java - checks if it is RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then it marks the right
          table's join column's ResultColumn to recognize that.

          ResultColumnList.java in it's code generation logic checks if the ResultColumn is a join column for RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then it generates the following code for that column
          1)if the left table's join column value is null then pick up the right table's join column's value.
          2)If the left table's join column value is non-null, then pick up that value

          Will appreciate any feedback on this patch. I plan to work more on it to clean it up by looking at removing some of the redundant new informaiton in ResultColumn and also to check if there is a better place for code generation for a join column part of RIGHT OUTER JOIN with USING/NATURAL JOIN. I also anticipate existing queries failing with this current patch. I will work on identifying such queries. But I wanted to get feedback on general approach of this patch.

          Show
          Mamta A. Satoor added a comment - I have a patch which is not ready for commit. It is a work in progress for the solution 1) proposed earlier in the jira which is as follows Solution 1) requires that we change the code generation to following logic. 1)if the left table's join column value is null then pick up the right table's join column's value. 2)If the left table's join column value is non-null, then pick up that value We should have the solution 1) kickin for only RIGHT OUTER JOIN with USING clause or NATURAL JOIN. LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN will work correctly with current Derby logic which is to always pickup the left table's join column value. This will work for LEFT OUTER JOINs and INNER JOINs with USING clause or NATURAL JOIN in both territory and non-territory based databases The attached patch now makes the following query return the correct results java -Dij.exceptionTrace=true org.apache.derby.tools.ij connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; CREATE TABLE derby4631_t1(x varchar(5)); INSERT INTO derby4631_t1 VALUES 'A','B'; CREATE TABLE derby4631_t2(x varchar(5)); INSERT INTO derby4631_t2 VALUES 'b','c'; SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; ij> X A b The patch in it's current state is quite a bit of hacking because currently a ResultColumn does not need to know if it belongs to a Join ResultSet. But with solution 1), if ResultColumn is a join column part of a RIGHT OUTER JOIN with USING/NATURAL JOIN, then we want the join column's value to be determined using the solution 1) described above. For that, a ResultColumn will now need to know if it is part of a RIGHT OUTER JOIN with USING/NATURAL JOIN and it will need to know the column positions of the left table's join column and rigt table's join colunm and it will need to know the resultset number of the resultset from which these join colunms's values will be extracted. I get these values in this patch by putting extra information related to joins in ResultColunm at bind time and using them at code generation time. This patch may break many other queries and I will continue to work on the patch to make it more stable but I wanted to put the general approach for this solution out sooner than later to see if community has any feedback on keeping the extra information on ResultColumn needed to implement solution 1). Although, all this additional information maintenance is pretty localized and not many files are impacted by this solution. Next I plan to work on the patch more to make it stable and do more testing with it to see how it will work for the rest of the queries. Alternative to this approach would be to introduce a new kind of compilation node which will be created during the bind phase(unlike most nodes which get created during parsing) when we find that the ResultColumn belongs to Join Node(which means we will still have to do the checking I do in this patch to see if ResultColumn is part of RIGHT OUTER JOIN with USING/NATURAL JOIN but at code generation time, we can have the new node do this special code generation which is how we handle all the other special nodes like Conditional If node, Coalesce node etc). This approach of adding new node will require us to somehow fire the binding of the new node after replacing the ResultColumn which was created during the Parse time. At this point, I am unfamiliar with how to replace a node during the bind time with some other node and make it go through the binding step. Also, replacing the ResultColumn with a new node might also impact queries like following where I think order by column from right table is associated with the join column in the select list SELECT x FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1 ORDER BY derby4631_t1.x; The files changed by this patch are as follows $ svn stat -q M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java M java\engine\org\apache\derby\impl\sql\compile\ProjectRestrictNode.java M java\engine\org\apache\derby\impl\sql\compile\JoinNode.java M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java M java\engine\org\apache\derby\impl\sql\compile\HashTableNode.java Following explains the changes in brief for the files touched by this patch ResultColumn.java has additioanl columns to keep RIGHT OUTER JOIN with USING/NATURAL JOIN information. These columns will get updated during bind time if ResultColumn is part of such a join. In my next patch, I think I should be able to remove virtualColumnIdLeftTable and virtualColumnIdRightTable and just get that information from joinResultSet. The signature of ResultColumnList.mapSourceColumns has changed and hence the changes in ProjectRestrictNode.java and HashTableNode.java. JoinNode.java - checks if it is RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then it marks the right table's join column's ResultColumn to recognize that. ResultColumnList.java in it's code generation logic checks if the ResultColumn is a join column for RIGHT OUTER JOIN with USING/NATURAL JOIN and if yes, then it generates the following code for that column 1)if the left table's join column value is null then pick up the right table's join column's value. 2)If the left table's join column value is non-null, then pick up that value Will appreciate any feedback on this patch. I plan to work more on it to clean it up by looking at removing some of the redundant new informaiton in ResultColumn and also to check if there is a better place for code generation for a join column part of RIGHT OUTER JOIN with USING/NATURAL JOIN. I also anticipate existing queries failing with this current patch. I will work on identifying such queries. But I wanted to get feedback on general approach of this patch.
          Hide
          Mamta A. Satoor added a comment -

          I cam across atleast following 2 queries that both fail with NPE and I am looking into those,

          create table t1(a int, b int, c int);
          create table t2(b int, c int, d int);
          create table t3(d int, e varchar(5), f int);

          insert into t1 values (1,2,3),(2,3,4),(4,4,4);
          insert into t2 values (1,2,3),(2,3,4),(5,5,5);
          insert into t3 values
          (2,'abc',3),(4,'def',5),(null,null,null);

          select c from t1 right join t2 using (c)
          order by t1.c;
          select c,a+1 from t1 right join t2 using (c);

          The order by query is failing in generated code whereas the query with "a+1" is failing during bind time.

          Show
          Mamta A. Satoor added a comment - I cam across atleast following 2 queries that both fail with NPE and I am looking into those, create table t1(a int, b int, c int); create table t2(b int, c int, d int); create table t3(d int, e varchar(5), f int); insert into t1 values (1,2,3),(2,3,4),(4,4,4); insert into t2 values (1,2,3),(2,3,4),(5,5,5); insert into t3 values (2,'abc',3),(4,'def',5),(null,null,null); select c from t1 right join t2 using (c) order by t1.c; select c,a+1 from t1 right join t2 using (c); The order by query is failing in generated code whereas the query with "a+1" is failing during bind time.
          Hide
          Mamta A. Satoor added a comment -

          Attaching an updated patch which is still not ready for commit but it fixes two queries listed earlier which were failing with 1st patch,

          With the query using column a+1 in the select query, I was looking for field name in ResultColumn which would be null for an expression like a+1. Instead, I need to use exposedName which will not be null for an expression. Instead, we internally generate a name for such columns.

          With the query using order by, I found that in the new code for code generation, I need to let Derby do what it does today for ResultColumns with VirtualColumnNode underneath. Such a case can happen for order by query where we pull columns if needed for order by columns,

          Next I plan to make the code changes less of a hack and then run the derbyall and junit suite to see if we catch any failures. Another thing to do would be to add more RIGHT OUTER JOIN test variations to see if the suggested code changes work fine with it. I will appreciate any suggestion on what kind of RIGHT OUTER JOIN tests can be added to test the functionality.

          Show
          Mamta A. Satoor added a comment - Attaching an updated patch which is still not ready for commit but it fixes two queries listed earlier which were failing with 1st patch, With the query using column a+1 in the select query, I was looking for field name in ResultColumn which would be null for an expression like a+1. Instead, I need to use exposedName which will not be null for an expression. Instead, we internally generate a name for such columns. With the query using order by, I found that in the new code for code generation, I need to let Derby do what it does today for ResultColumns with VirtualColumnNode underneath. Such a case can happen for order by query where we pull columns if needed for order by columns, Next I plan to make the code changes less of a hack and then run the derbyall and junit suite to see if we catch any failures. Another thing to do would be to add more RIGHT OUTER JOIN test variations to see if the suggested code changes work fine with it. I will appreciate any suggestion on what kind of RIGHT OUTER JOIN tests can be added to test the functionality.
          Hide
          Mamta A. Satoor added a comment -

          DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

          I have a patch(DERBY_4631_patch3_diff.txt) which is ready for review and commit. To recap the issue, SQL:2003 says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table

          Derby has it's on logic to retrieve the join column values. It always picks up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join.

          But this logic does not work for all cases for right outer join. The fix provided in this patch will pick the join column's value based on following logic(this logic mimics the functionality of COALESCE)
          1)if the left table's column value is null then pick up the right table's column's value.
          2)If the left table's column value is non-null, then pick up that value

          Following are the files impacted by this patch
          $ svn stat -q
          M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
          M java\engine\org\apache\derby\impl\sql\compile\JoinNode.java
          M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java
          M java\testing\org\apache\derbyTesting\functionTests\tests\lang\CollationTest.java

          The changes are as follows
          Two additional fields have been added to ResultColumn.java rightOuterJoinUsingClause and joinResultSet
          rightOuterJoinUsingClause will be set to true for following 2 cases
          1)if this column represents the join column which is part of the SELECT list of a RIGHT OUTER JOIN with USING/NATURAL. eg
          select c from t1 right join t2 using (c)
          This case is talking about column c as in "select c"
          2)if this column represents the join column from the right table for predicates generated for the USING/NATURAL of RIGHT OUTER JOIN eg
          select c from t1 right join t2 using (c)
          For "using(c)", a join predicate will be created as follows
          t1.c=t2.c
          This case is talking about column t2.c of the join predicate.

          joinResultSet will be non-null for the case 1) above. It will show the association of this result column to the join resultset created for the RIGHT OUTER JOIN with USING/NATURAL. This information along with rightOuterJoinUsingClause will be used during the code generation time.
          These 2 additional fields will be used to identify ResultColumn which belong to a join column in the SELECT
          list and identify ResultColumn which belong to right join column for the predicate generated for USING/NATURAL
          columns. Additionally, ResultColumn which belong to a join column in the SELECT list will also know about the
          JoinNode which they belong to. These 2 pieces of information will then be used at the code generation time
          for join column for RIGHT OUTER JOIN with USING/NATURAL based on following logic
          1)if the left table's column value is null then pick up the right table's column's value.
          2)If the left table's column value is non-null, then pick up that value

          Changes in JoinNode.java just identifies the ResultColumn which represent the join column from the right table
          for predicates generated for the USING/NATURAL of RIGHT OUTER JOIN eg
          select c from t1 right join t2 using (c)
          For "using(c)", a join predicate will be created, t1.c=t2.c. JoinNode changes will set
          ResultColumn.rightOuterJoinUsingClause flag to true for t2.c

          The code generation changes have gone into ResultColumnList.java

          Show
          Mamta A. Satoor added a comment - DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation I have a patch(DERBY_4631_patch3_diff.txt) which is ready for review and commit. To recap the issue, SQL:2003 says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table Derby has it's on logic to retrieve the join column values. It always picks up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. But this logic does not work for all cases for right outer join. The fix provided in this patch will pick the join column's value based on following logic(this logic mimics the functionality of COALESCE) 1)if the left table's column value is null then pick up the right table's column's value. 2)If the left table's column value is non-null, then pick up that value Following are the files impacted by this patch $ svn stat -q M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java M java\engine\org\apache\derby\impl\sql\compile\JoinNode.java M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java M java\testing\org\apache\derbyTesting\functionTests\tests\lang\CollationTest.java The changes are as follows Two additional fields have been added to ResultColumn.java rightOuterJoinUsingClause and joinResultSet rightOuterJoinUsingClause will be set to true for following 2 cases 1)if this column represents the join column which is part of the SELECT list of a RIGHT OUTER JOIN with USING/NATURAL. eg select c from t1 right join t2 using (c) This case is talking about column c as in "select c" 2)if this column represents the join column from the right table for predicates generated for the USING/NATURAL of RIGHT OUTER JOIN eg select c from t1 right join t2 using (c) For "using(c)", a join predicate will be created as follows t1.c=t2.c This case is talking about column t2.c of the join predicate. joinResultSet will be non-null for the case 1) above. It will show the association of this result column to the join resultset created for the RIGHT OUTER JOIN with USING/NATURAL. This information along with rightOuterJoinUsingClause will be used during the code generation time. These 2 additional fields will be used to identify ResultColumn which belong to a join column in the SELECT list and identify ResultColumn which belong to right join column for the predicate generated for USING/NATURAL columns. Additionally, ResultColumn which belong to a join column in the SELECT list will also know about the JoinNode which they belong to. These 2 pieces of information will then be used at the code generation time for join column for RIGHT OUTER JOIN with USING/NATURAL based on following logic 1)if the left table's column value is null then pick up the right table's column's value. 2)If the left table's column value is non-null, then pick up that value Changes in JoinNode.java just identifies the ResultColumn which represent the join column from the right table for predicates generated for the USING/NATURAL of RIGHT OUTER JOIN eg select c from t1 right join t2 using (c) For "using(c)", a join predicate will be created, t1.c=t2.c. JoinNode changes will set ResultColumn.rightOuterJoinUsingClause flag to true for t2.c The code generation changes have gone into ResultColumnList.java
          Hide
          Mamta A. Satoor added a comment -

          I have been working on writing more tests for JOINs to see nothing breaks with my changes. Unfortunately, the following script with the patch DERBY_4631_patch3_diff.txt patch gives ERROR 38000: The exception 'java.lang.ArrayIndexOutOfBoundsException: Array index out of range: -2' was thrown while evaluating an expression.

          connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
          CREATE TABLE derby4631_t1(x varchar(5));
          INSERT INTO derby4631_t1 VALUES 'A','B';
          CREATE TABLE derby4631_t2(x varchar(5));
          INSERT INTO derby4631_t2 VALUES 'b','c';
          CREATE TABLE derby4631_t3(x1 varchar(5), y1 varchar(5));
          INSERT INTO derby4631_t3
          SELECT x,
          'a'
          FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;

          I am debugging the patch to see what is causing the failure.

          Show
          Mamta A. Satoor added a comment - I have been working on writing more tests for JOINs to see nothing breaks with my changes. Unfortunately, the following script with the patch DERBY_4631_patch3_diff.txt patch gives ERROR 38000: The exception 'java.lang.ArrayIndexOutOfBoundsException: Array index out of range: -2' was thrown while evaluating an expression. connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; CREATE TABLE derby4631_t1(x varchar(5)); INSERT INTO derby4631_t1 VALUES 'A','B'; CREATE TABLE derby4631_t2(x varchar(5)); INSERT INTO derby4631_t2 VALUES 'b','c'; CREATE TABLE derby4631_t3(x1 varchar(5), y1 varchar(5)); INSERT INTO derby4631_t3 SELECT x, 'a' FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; I am debugging the patch to see what is causing the failure.
          Hide
          Mamta A. Satoor added a comment -

          I found the problem which was causing array index out of bound exception with the previous patch(patch 3).

          The query which was throwing the exception is as follows
          INSERT INTO derby4631_t3
          SELECT x,
          'a'
          FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1;

          The columns in derby4631_t3 are named x1 and y1. The source for column x1 in insert above is join column x. In the patch, when we look to determine if the column is a join column, I was looking at column's external name rather than it's base name. So instead of looking for column named 'x', the earlier patch was looking for column 'x1' to see if it is a join column. Because of that, it didn't identify join column x in the SELECT query. I have fixed one line of code in ResultColumnList to fix the problem. So, instead of
          if (joinColumn.getName().equals(rc.getName())) {
          it now checks
          if (joinColumn.getName().equals(rc.name)) {

          I will run the derbyall and junit suite to make sure no existing tests fail with this new patch. I will also continue writing few more tests for JOINs.

          Show
          Mamta A. Satoor added a comment - I found the problem which was causing array index out of bound exception with the previous patch(patch 3). The query which was throwing the exception is as follows INSERT INTO derby4631_t3 SELECT x, 'a' FROM derby4631_t2 NATURAL RIGHT OUTER JOIN derby4631_t1; The columns in derby4631_t3 are named x1 and y1. The source for column x1 in insert above is join column x. In the patch, when we look to determine if the column is a join column, I was looking at column's external name rather than it's base name. So instead of looking for column named 'x', the earlier patch was looking for column 'x1' to see if it is a join column. Because of that, it didn't identify join column x in the SELECT query. I have fixed one line of code in ResultColumnList to fix the problem. So, instead of if (joinColumn.getName().equals(rc.getName())) { it now checks if (joinColumn.getName().equals(rc.name)) { I will run the derbyall and junit suite to make sure no existing tests fail with this new patch. I will also continue writing few more tests for JOINs.
          Hide
          Mike Matrigali added a comment -

          from review of latest patch.

          ResultColumn:
          It would be good to get a review of this with someone with more expertise in this area of the code. The placement
          of the code at 672 just seems strange given how simple the logic use to be. There is a lot of checking for instances
          here, is there anyway to do this work in the affected nodes like HalfOuterJoinNode.

          At end of ResultColumn changes why do you check and set for one of the new fields and not the other?
          + if (isRightOuterJoinUsingClause())

          { + newResultColumn.setRightOuterJoinUsingClause(true); + }

          +
          + newResultColumn.setJoinResultset(getJoinResultSet());
          +

          ResultColumnList.java:
          1266: get rid of the commented out line of code that you fixed with patch 4

          a comment in allExpressionsAreColumns explaining why returning false for isRightOuterJoinUsingClause would be useful.

          in mapSourceColumns() what does the -1 for right outer join columns mean?

          nits:
          typo - search for "righ "
          would be nice to have comments for setJoinResultset and setRightOuterJoinUsingClause, maybe something about what is expected
          to call this routine and in what circumstances.
          would be nice if code was 80 columns, critical bug at line 672 of ResultColumn is unreadable at 80.

          JoinNode change:
          inconsistent bracket use in if/then/else

          ResultColumnList:
          more > 80 lines

          indentation looks wrong for this code, maybe editor got confused by commented out if block:
          if (joinColumn.getName().equals(rc.name))

          { if (joinColumn.isRightOuterJoinUsingClause()) virtualColumnIdRightTable = joinColumn.getVirtualColumnId(); else virtualColumnIdLeftTable = joinColumn.getVirtualColumnId(); }
          Show
          Mike Matrigali added a comment - from review of latest patch. ResultColumn: It would be good to get a review of this with someone with more expertise in this area of the code. The placement of the code at 672 just seems strange given how simple the logic use to be. There is a lot of checking for instances here, is there anyway to do this work in the affected nodes like HalfOuterJoinNode. At end of ResultColumn changes why do you check and set for one of the new fields and not the other? + if (isRightOuterJoinUsingClause()) { + newResultColumn.setRightOuterJoinUsingClause(true); + } + + newResultColumn.setJoinResultset(getJoinResultSet()); + ResultColumnList.java: 1266: get rid of the commented out line of code that you fixed with patch 4 a comment in allExpressionsAreColumns explaining why returning false for isRightOuterJoinUsingClause would be useful. in mapSourceColumns() what does the -1 for right outer join columns mean? nits: typo - search for "righ " would be nice to have comments for setJoinResultset and setRightOuterJoinUsingClause, maybe something about what is expected to call this routine and in what circumstances. would be nice if code was 80 columns, critical bug at line 672 of ResultColumn is unreadable at 80. JoinNode change: inconsistent bracket use in if/then/else ResultColumnList: more > 80 lines indentation looks wrong for this code, maybe editor got confused by commented out if block: if (joinColumn.getName().equals(rc.name)) { if (joinColumn.isRightOuterJoinUsingClause()) virtualColumnIdRightTable = joinColumn.getVirtualColumnId(); else virtualColumnIdLeftTable = joinColumn.getVirtualColumnId(); }
          Hide
          Mamta A. Satoor added a comment -

          Hi Mike,

          thanks for reviewing patch 4. Here are some comments to your feedback(Attaching a new patch DERBY_4631_patch5_diff.txt which takes care of some of your comments.)

          1)ResultColumn:
          ********
          Mike's comment - There is a lot of checking for instances here, is there anyway to do this work in the affected nodes like HalfOuterJoinNode?
          ********
          Yes, I am concerned about the instance checking too but I had researched into putting the code in HalfOuterJoinNode and found that HalfOuterJoinNode does not ever get to see the ResultColumns for the query and hence it has no way of marking those ResultColumns as join columns.

          ********
          Mike's comment - At end of ResultColumn changes why do you check and set for one of the new fields and not the other?
          ********
          I was trying to follow the existing code where the boolean kinds of fields are first checked and then set to true if the check returned true. rightOuterJoinUsingClause is a boolean field and hence I checked for the return value and then set the new object's value to true. But joinResultSet is a non-boolean field and hence I simply used it's value to set new object's joinResultSet value. For clarity, I will go ahead and replace following
          newResultColumn.setJoinResultset(getJoinResultSet());
          with
          if (getJoinResultSet() != null)

          { newResultColumn.setJoinResultset(getJoinResultSet()); }

          2)ResultColumnList.java:
          ********
          Mike's comment - get rid of the commented out line of code that you fixed with patch 4
          ********
          Removed it.

          ********
          Mike's comment - a comment in allExpressionsAreColumns explaining why returning false for isRightOuterJoinUsingClause would be useful.
          ********
          Added a comment.

          ********
          Mike's comment - in mapSourceColumns() what does the -1 for right outer join columns mean?
          ********
          allExpressionsAreColumns() uses the -1 value set by mapSourceColumns() to decide if there are columns which require special consideration during code generation. When mapSourceColumns() assigns -1 to right outer join column, allExpressionsAreColumns() will return false. This will allow Derby to later generate code equivalent to COASLECE for right outer join columns(this code generation happens in newly added code in ResultColumnList.generateCore.)

          ********
          Mike's comment - typo - search for "righ "
          ********
          Fixed it.

          ********
          Mike's comment - would be nice to have comments for setJoinResultset and setRightOuterJoinUsingClause, maybe something about what is expected to call this routine and in what circumstances.
          ********
          Added comments around for both those methods.

          Additionally, took care of some indentation problems with the code.

          Show
          Mamta A. Satoor added a comment - Hi Mike, thanks for reviewing patch 4. Here are some comments to your feedback(Attaching a new patch DERBY_4631_patch5_diff.txt which takes care of some of your comments.) 1)ResultColumn: ******** Mike's comment - There is a lot of checking for instances here, is there anyway to do this work in the affected nodes like HalfOuterJoinNode? ******** Yes, I am concerned about the instance checking too but I had researched into putting the code in HalfOuterJoinNode and found that HalfOuterJoinNode does not ever get to see the ResultColumns for the query and hence it has no way of marking those ResultColumns as join columns. ******** Mike's comment - At end of ResultColumn changes why do you check and set for one of the new fields and not the other? ******** I was trying to follow the existing code where the boolean kinds of fields are first checked and then set to true if the check returned true. rightOuterJoinUsingClause is a boolean field and hence I checked for the return value and then set the new object's value to true. But joinResultSet is a non-boolean field and hence I simply used it's value to set new object's joinResultSet value. For clarity, I will go ahead and replace following newResultColumn.setJoinResultset(getJoinResultSet()); with if (getJoinResultSet() != null) { newResultColumn.setJoinResultset(getJoinResultSet()); } 2)ResultColumnList.java: ******** Mike's comment - get rid of the commented out line of code that you fixed with patch 4 ******** Removed it. ******** Mike's comment - a comment in allExpressionsAreColumns explaining why returning false for isRightOuterJoinUsingClause would be useful. ******** Added a comment. ******** Mike's comment - in mapSourceColumns() what does the -1 for right outer join columns mean? ******** allExpressionsAreColumns() uses the -1 value set by mapSourceColumns() to decide if there are columns which require special consideration during code generation. When mapSourceColumns() assigns -1 to right outer join column, allExpressionsAreColumns() will return false. This will allow Derby to later generate code equivalent to COASLECE for right outer join columns(this code generation happens in newly added code in ResultColumnList.generateCore.) ******** Mike's comment - typo - search for "righ " ******** Fixed it. ******** Mike's comment - would be nice to have comments for setJoinResultset and setRightOuterJoinUsingClause, maybe something about what is expected to call this routine and in what circumstances. ******** Added comments around for both those methods. Additionally, took care of some indentation problems with the code.
          Hide
          Mamta A. Satoor added a comment -

          I have the final patch which is ready for commit. I have added couple more tests
          1)have left or right table empty before doing joins
          2)Do arithmetic operation on join columns.

          I will commit it in a week's time but will appreciate any feedback on the patch before that.

          Show
          Mamta A. Satoor added a comment - I have the final patch which is ready for commit. I have added couple more tests 1)have left or right table empty before doing joins 2)Do arithmetic operation on join columns. I will commit it in a week's time but will appreciate any feedback on the patch before that.
          Hide
          Mamta A. Satoor added a comment -

          Attaching the release note for the issue

          Show
          Mamta A. Satoor added a comment - Attaching the release note for the issue
          Hide
          Mamta A. Satoor added a comment -

          I am attaching another patch (DERBY_4631_patch7_diff.txt) which is very similar to previous patch(DERBY_4631_patch6_diff.txt) except that I do not have instance of checks in ResultColumn to find JoinNodes from the FromList. Instead, I have added a new method isJoinColumnForRightOuterJoin which will allow HalfOuterJoinNode to see if the ResultColumn is a join column for a right outer join with using/natural clause. This makes the code more readable and removes the instanceof checkings. Please let me know if there are any comments to this patch. Thanks

          Show
          Mamta A. Satoor added a comment - I am attaching another patch (DERBY_4631_patch7_diff.txt) which is very similar to previous patch(DERBY_4631_patch6_diff.txt) except that I do not have instance of checks in ResultColumn to find JoinNodes from the FromList. Instead, I have added a new method isJoinColumnForRightOuterJoin which will allow HalfOuterJoinNode to see if the ResultColumn is a join column for a right outer join with using/natural clause. This makes the code more readable and removes the instanceof checkings. Please let me know if there are any comments to this patch. Thanks
          Hide
          Mamta A. Satoor added a comment -

          Committed(revision 1341204) changes for this jira with following comments
          DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

          To recap this issue, SQL:2003 says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table

          Derby has it's on logic to retrieve the join column values. It always picks up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join.

          But this logic does not work for all cases for right outer join. The fix provided in this patch will pick the join column's value based on following logic(this logic mimics the functionality of COALESCE)
          1)if the left table's column value is null then pick up the right table's column's value.
          2)If the left table's column value is non-null, then pick up that value

          Show
          Mamta A. Satoor added a comment - Committed(revision 1341204) changes for this jira with following comments DERBY-4631 Wrong join column returned by right outer join with NATURAL or USING and territory-based collation To recap this issue, SQL:2003 says that the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table Derby has it's on logic to retrieve the join column values. It always picks up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. But this logic does not work for all cases for right outer join. The fix provided in this patch will pick the join column's value based on following logic(this logic mimics the functionality of COALESCE) 1)if the left table's column value is null then pick up the right table's column's value. 2)If the left table's column value is non-null, then pick up that value
          Hide
          Kathey Marsden added a comment -

          Reopen for backport analysis. Temporarily assign to yourself if you backport and then reassign to Mamta before closing.

          Show
          Kathey Marsden added a comment - Reopen for backport analysis. Temporarily assign to yourself if you backport and then reassign to Mamta before closing.
          Hide
          Mamta A. Satoor added a comment -

          I will work on backporting this issue

          Show
          Mamta A. Satoor added a comment - I will work on backporting this issue
          Hide
          Mamta A. Satoor added a comment -

          Backported to 10.8 with revision 1384638

          Show
          Mamta A. Satoor added a comment - Backported to 10.8 with revision 1384638
          Hide
          Mamta A. Satoor added a comment -

          Backported to 10.7 with revision 1384814

          Show
          Mamta A. Satoor added a comment - Backported to 10.7 with revision 1384814
          Hide
          Mamta A. Satoor added a comment -

          Backported to 10.6 with revision 1384940.

          Show
          Mamta A. Satoor added a comment - Backported to 10.6 with revision 1384940.
          Hide
          Mamta A. Satoor added a comment -

          No further backporting required because NATURAL/USING clause on join queries was added in 10.6

          Show
          Mamta A. Satoor added a comment - No further backporting required because NATURAL/USING clause on join queries was added in 10.6
          Hide
          Mamta A. Satoor added a comment -

          Backport finished

          Show
          Mamta A. Satoor added a comment - Backport finished
          Hide
          Mamta A. Satoor added a comment -

          Attaching release note

          Show
          Mamta A. Satoor added a comment - Attaching release note

            People

            • Assignee:
              Mamta A. Satoor
              Reporter:
              Knut Anders Hatlen
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development