Issue Details (XML | Word | Printable)

Key: DERBY-3301
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Thomas Nielsen
Reporter: Craig Russell
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Derby

Incorrect result from query with nested EXIST

Created: 03/Jan/08 06:51 PM   Updated: 30/Jun/09 04:12 PM
Return to search
Component/s: SQL
Affects Version/s: 10.1.3.1, 10.2.1.6, 10.3.2.1
Fix Version/s: 10.3.3.0, 10.4.1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works d3301-queryplan.log 2008-01-18 10:14 AM Thomas Nielsen 8 kB
File Licensed for inclusion in ASF works derby-3301-1.diff 2008-01-22 08:48 AM Thomas Nielsen 4 kB
File Licensed for inclusion in ASF works derby-3301-1.stat 2008-01-22 08:48 AM Thomas Nielsen 0.2 kB
File Licensed for inclusion in ASF works derby-3301-2.diff 2008-01-23 09:13 AM Thomas Nielsen 5 kB
File Licensed for inclusion in ASF works derby-3301-3.diff 2008-01-23 08:34 PM Thomas Nielsen 6 kB
File Licensed for inclusion in ASF works derby-3301-3b.diff 2008-01-23 09:32 PM Thomas Nielsen 6 kB
File Licensed for inclusion in ASF works derby-3301-4.diff 2008-01-25 01:35 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-4b.diff 2008-01-26 07:20 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-4b.stat 2008-01-26 07:20 PM Thomas Nielsen 0.2 kB
File Licensed for inclusion in ASF works derby-3301-4c.diff 2008-01-28 06:19 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-5.diff 2008-01-30 07:57 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-6.diff 2008-01-31 08:32 AM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-7.diff 2008-01-31 07:22 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-8.diff 2008-01-31 09:32 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-extra.sql 2008-01-31 08:32 AM Thomas Nielsen 2 kB
File Licensed for inclusion in ASF works derby-3301-test-1.diff 2008-01-24 05:14 PM Thomas Nielsen 8 kB
File Licensed for inclusion in ASF works derby-3301-test-1.stat 2008-01-24 05:14 PM Thomas Nielsen 0.2 kB
File Licensed for inclusion in ASF works derby-3301-test-2.diff 2008-01-26 07:20 PM Thomas Nielsen 9 kB
File Licensed for inclusion in ASF works derby-3301-test-3.diff 2008-01-28 08:37 PM Thomas Nielsen 9 kB
File Licensed for inclusion in ASF works derby-3301-test-3.stat 2008-01-28 07:22 PM Thomas Nielsen 0.2 kB
File Licensed for inclusion in ASF works derby-3301-test-master-2.diff 2008-01-31 09:32 PM Thomas Nielsen 14 kB
File Licensed for inclusion in ASF works derby-3301-test-master-3.diff 2008-02-07 06:54 PM Thomas Nielsen 3 kB
File Licensed for inclusion in ASF works derby-3301-test-master-3.stat 2008-02-07 06:54 PM Thomas Nielsen 0.2 kB
File Licensed for inclusion in ASF works derby-3301-test-master.diff 2008-01-30 09:31 PM Thomas Nielsen 7 kB
File Licensed for inclusion in ASF works derby-3301-test-master.stat 2008-01-30 09:31 PM Thomas Nielsen 0.1 kB
HTML File Licensed for inclusion in ASF works Derby-3301.html 2008-03-19 09:36 PM Craig Russell 4 kB
File Licensed for inclusion in ASF works derby-3301.sql 2008-01-07 05:51 PM Craig Russell 2 kB
HTML File Licensed for inclusion in ASF works releaseNote.html 2008-03-25 09:45 AM Thomas Nielsen 4 kB
HTML File Licensed for inclusion in ASF works releaseNote.html 2008-03-19 09:58 PM Craig Russell 4 kB
Issue Links:
Dependants
 
Incorporates
 
Reference

Urgency: Urgent
Issue & fix info: Release Note Needed
Resolution Date: 15/Feb/08 08:23 AM


 Description  « Hide
Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL.

Here's the query:

SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID
FROM applicationidentity0.DEPARTMENTS THIS,
     applicationidentity0.PERSONS UNBOUND_E,
     applicationidentity0.PROJECTS UNBOUND_P
WHERE EXISTS (
    SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
    WHERE EXISTS (
        SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P
        WHERE THIS_EMPLOYEES_E_PROJECTS_P."MEMBER" = THIS_EMPLOYEES_E.PERSONID
        AND THIS_EMPLOYEES_E_PROJECTS_P."MEMBER" = THIS_EMPLOYEES_E.PERSONID
        AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
        AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
        AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
        AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID)
    );
PERSONID |PROJID
-----------------------
3 |1
5 |3
4 |3
2 |1
1 |1
5 rows selected

I'm expecting 7 rows to be returned here, one row for each row in the join table.

Here's the schema:
CREATE TABLE departments (
    ID INTEGER NOT NULL,
    NAME VARCHAR(32) NOT NULL,
    EMP_OF_THE_MONTH INTEGER,
    COMPANYID INTEGER,
    DISCRIMINATOR VARCHAR(255),
    CONSTRAINT DEPTS_COMP_FK FOREIGN KEY (COMPANYID) REFERENCES companies,
    CONSTRAINT DEPTS_PK PRIMARY KEY (ID)
);

CREATE TABLE persons (
    PERSONID INTEGER NOT NULL,
    FIRSTNAME VARCHAR(32) NOT NULL,
    LASTNAME VARCHAR(32) NOT NULL,
    MIDDLENAME VARCHAR(32),
    BIRTHDATE TIMESTAMP NOT NULL,
    ADDRID INTEGER,
    STREET VARCHAR(64),
    CITY VARCHAR(64),
    STATE CHAR(2),
    ZIPCODE CHAR(5),
    COUNTRY VARCHAR(64),
    HIREDATE TIMESTAMP,
    WEEKLYHOURS REAL,
    DEPARTMENT INTEGER,
    FUNDINGDEPT INTEGER,
    MANAGER INTEGER,
    MENTOR INTEGER,
    HRADVISOR INTEGER,
    SALARY REAL,
    WAGE REAL,
    DISCRIMINATOR varchar(255) NOT NULL,
    CONSTRAINT PERS_DEPT_FK FOREIGN KEY (DEPARTMENT) REFERENCES departments,
    CONSTRAINT PERS_FUNDDEPT_FK FOREIGN KEY (FUNDINGDEPT) REFERENCES departments,
    CONSTRAINT PERS_MANAGER_FK FOREIGN KEY (MANAGER) REFERENCES persons,
    CONSTRAINT PERS_MENTOR_FK FOREIGN KEY (MENTOR) REFERENCES persons,
    CONSTRAINT PERS_HRADVISOR_FK FOREIGN KEY (HRADVISOR) REFERENCES persons,
    CONSTRAINT EMPS_PK PRIMARY KEY (PERSONID)
);

CREATE TABLE projects (
    PROJID INTEGER NOT NULL,
    NAME VARCHAR(32) NOT NULL,
    BUDGET DECIMAL(11,2) NOT NULL,
    DISCRIMINATOR VARCHAR(255),
    CONSTRAINT PROJS_PK PRIMARY KEY (PROJID)
);
CREATE TABLE project_member (
    PROJID INTEGER REFERENCES projects NOT NULL,
    MEMBER INTEGER REFERENCES persons NOT NULL
);

ij> connect 'jdbc:derby:/Users/clr/apache/jdo/trunk/tck2/target/database/derby/jdotckdb';
ij> set schema applicationidentity0;
0 rows inserted/updated/deleted
ij> select * from persons;
PERSONID |FIRSTNAME |LASTNAME |MIDDLENAME |BIRTHDATE |ADDRID |STREET |CITY |STA&|ZIPC&|COUNTRY |HIREDATE |WEEKLYHOURS |DEPARTMENT |FUNDINGDEPT|MANAGER |MENTOR |HRADVISOR |SALARY |WAGE |DISCRIMINATOR
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 |emp1First |emp1Last |emp1Middle |1970-06-09 21:00:00.0 |NULL |NULL |NULL |NULL|NULL |NULL |1998-12-31 21:00:00.0 |40.0 |NULL |NULL |NULL |NULL |NULL |20000.0 |NULL |org.apache.jdo.tck.pc.company.FullTimeEmployee
2 |emp2First |emp2Last |emp2Middle |1975-12-21 21:00:00.0 |NULL |NULL |NULL |NULL|NULL |NULL |2003-06-30 21:00:00.0 |40.0 |NULL |NULL |NULL |NULL |NULL |10000.0 |NULL |org.apache.jdo.tck.pc.company.FullTimeEmployee
3 |emp3First |emp3Last |emp3Middle |1972-09-04 21:00:00.0 |NULL |NULL |NULL |NULL|NULL |NULL |2002-08-14 21:00:00.0 |19.0 |NULL |NULL |NULL |NULL |NULL |NULL |15.0 |org.apache.jdo.tck.pc.company.PartTimeEmployee
4 |emp4First |emp4Last |emp4Middle |1973-09-05 21:00:00.0 |NULL |NULL |NULL |NULL|NULL |NULL |2001-04-14 21:00:00.0 |0.0 |NULL |NULL |NULL |NULL |NULL |NULL |13.0 |org.apache.jdo.tck.pc.company.PartTimeEmployee
5 |emp5First |emp5Last |emp5Middle |1962-07-04 21:00:00.0 |NULL |NULL |NULL |NULL|NULL |NULL |1998-08-14 21:00:00.0 |0.0 |NULL |NULL |NULL |NULL |NULL |45000.0 |NULL |org.apache.jdo.tck.pc.company.FullTimeEmployee

5 rows selected
ij> select * from projects;
PROJID |NAME |BUDGET |DISCRIMINATOR
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 |orange |2500000.99 |org.apache.jdo.tck.pc.company.Project
2 |blue |50000.00 |org.apache.jdo.tck.pc.company.Project
3 |green |2000.99 |org.apache.jdo.tck.pc.company.Project

3 rows selected
ij> select * from project_member;
PROJID |MEMBER
-----------------------
2 |3
1 |3
2 |2
3 |5
3 |4
1 |2
1 |1

7 rows selected



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Craig Russell added a comment - 04/Jan/08 12:02 AM
I just tried this with 10.3.2.1 and the same error occurs.

I changed the fix version from 10.2.1.6 to unknown (oops).

A B added a comment - 04/Jan/08 12:29 AM
Any chance you could a) post a fully reproducing script or Java program, and/or b) try it out
with the 10.1.3.1 release? I'm wondering if this might be the same underlying issue as
DERBY-3288--if it is, then it should work with 10.1.3.1. Trying it out there might help
determine when the problem was introduced...(esp. is it a regression?)

Craig Russell added a comment - 04/Jan/08 05:03 PM
Reproduced also on 10.1.3.1. There is a slight difference in behavior (of a similar query) that I will have to check into, and update this report.

Craig Russell added a comment - 04/Jan/08 07:42 PM - edited
There is a similar query that sometimes returns the expected result and sometimes returns a single row.

The difference in the query is that there is an extra clause in the WHERE clause:
AND UNBOUND_P.NAME = 'orange'

This query is expected to return three rows
PERSONID |PROJID
-----------------------
3 |1
2 |1
1 |1
3 rows selected

but sometimes only one row is returned
PERSONID |PROJID
-----------------------
1 |1
1 rows selected


Craig Russell added a comment - 07/Jan/08 05:51 PM
The attached patch demonstrates the problem.

The expected result of the last query in the file is the 7 rows corresponding to the rows in the join table.

To duplicate the problem, run ij < derby-3301.sql

Any suggestions are welcome.

Jørgen Løland added a comment - 18/Jan/08 08:06 AM
Linking issue to DERBY-3321: NPE if 'exists' is followed by a nested subquery.

Thomas Nielsen added a comment - 18/Jan/08 10:14 AM
Took the liberty to generate and attach the queryplan for the reproscript.

Thomas Nielsen added a comment - 18/Jan/08 10:34 AM
I'm not very familiar with tracking down these bugs, but it seems to me that the RH side of the
"Hash Exists Join ResultSet", a "Hash Scan ResultSet" for PROJECT_EMPLOYEES see (returns) only 3 rows,
even though the underlying scan produces 7 rows?

--- snip --
                                        Rows seen = 3
                                        Rows filtered = 0

...
                                        scan information:
                                                Bit set of columns fetched=All
                                                Number of columns fetched=2
                                                Number of pages visited=1
                                                Number of rows qualified=7
                                                Number of rows visited=7
--- snip ---

Hopefully this will help the well versed to track this one down :)

Thomas Nielsen added a comment - 18/Jan/08 02:47 PM - edited
A little further investigation shows that HashScanResultSet.getNextRowCore() is acutally doing what it is supposed to do.

The underlying ResultSets build a HashMap based on a key, and puts all rows matching this key into the map with that key.
For the repro this means a HashMap with 3 keys with 3, 2 and 3 rows for the map entries respectively as seen in the debugger.

In getNextRow() we pick the first element for a given key from the map, then move on to the next key, even though
there are unfetched entries left in the map. This is where the code is flawed at the moment.

For each invocation of getNextRowCore(), a call is done to reopenCore() as well. This looks suspicious. reopenCore() is a good citizen
and resets any reference variables with a call to resetProbeVariables(). This call resets the indexing we use in getNextRowCore() to
keep track of the index into the map entries, and force us to move on to the next key in the HashMap instead of the next entry in the HashMap.

It's right there in the queryplan as well with "Number of opens = 3" for the lower HashScanResultSet :)
There's the reason for the three, and not seven, rows.

The big question right now is who calls, and why is, reopenCore() called twice?

I probably won't have time to investigate until Monday. If someone else feels like fixing this, please go ahead.

Bryan Pendleton added a comment - 18/Jan/08 04:59 PM
Hi Thomas, thanks for having a look at this.

I agree with your analysis of that portion of the query plan: there are
7 total rows that are seen in the project_employees table, but they
are being organized into 3 hash buckets based on the projid field.
The rows in the table have projid values 101, 102, and 103, so this seems ok.

What seems to be happening to me is that we are "collapsing" the
results. The actual values that the query returns (with the current trunk) are:

EMPID |PROJID
-----------------------
13 |101
13 |102
15 |103

Note that we are getting *one* of the employees for *each* of the 3 projects,
whereas I think Craig expects us to be fetching *all* of the employees for
*each* of the projects.

I suspect this means that the flattening of the exists queries was performed
incorrectly, but beyond that I don't have anything brilliant to say. I looked at
some of this logic when I was studying DERBY-3033, but I don't think that
issue is extremely relevant to this issue.

Still, you might spend some time stepping through FromList.preprocess()
and FromList.flattenFromTables() for the query in question, and see if
you notice any logic that seems particularly relevant for this query.

Thomas Nielsen added a comment - 18/Jan/08 10:49 PM - edited
Thanks for the pointers Bryan.

Above the HashScanResultSet, there is a NestedLoopJoinResultSet pulling rows from both left and right childs in getNextRowCore().
NesteLoopJoinResultSet.getNextRowCore() will stop pulling rows if its member oneRowRightSide is true. oneRowRightSide is supplied to the constructor, so it's set at ResultSet generation time.

NestedLoopJoinResultSet is generated by GenericResultSetFactory.getNestedLoopJoinResult(), which again is only called from JoinNode.generate().
To see this you first have to find the getter to the NestedLoopJoinResultSet in GenericResultSetFactory, then do a text(!) search for use of the getter method. In this case it's only used once - in
NesteLoopJoinStrategy.joinResultSetMethodName(), which again is only used in JoinNode.generate.

So why is it setting oneRowRightSide to true?

The arguments to the NestedLoopJoinResultSet constructor is pushed to the stack by JoinNode.getJoinArguments(), where one calls JoinNode.oneRowRightSide().
oneRowRightSide() simply pushes rightResultSet.isOneRowResultSet() to the stack and continues along merrily (JoinNode.java @ 1691)

In the debugger we see that the first hit of a breakpoint in JoinNode.java @1691 is the interesting one.
rightResultSet is a ProjectRestrictNode over a FromBaseTable. That is the select we want to look at.

ProjectRestrictNode.isOneRowResult() simply calls its childResult.isOneRowResult(), so we end up in FromBaseTable.isOneRowResult().

FromBaseTable.isOneRowResult() start off with this comment and code:
   // EXISTS FBT will only return a single row
   if (existsBaseTable)
   {
       return true;
    }

existsBaseTable is true in the failing query.

existsBaseTable is only set in FromBaseTable.setExistsBaseTable(), and this method is only called form FromList.genExistsBaseTables()
where it is explicitly set to true.

It seems to me that the assumption made in FromBaseTable.isOneRowResultI() is wrong?
It's been there ever since derby code was donated. That's most likely the reason for all previous versions showing identical behaviour.

With those 4 lines making the single row assumption for exists commented out, and letting the rest of the isOneRowResult() logic make the decision,
it returns false as expected in this case. The query returns 7 rows as Craig expects.

EMPID |PROJID
-----------------------
13 |101
12 |101
11 |101
13 |102
12 |102
15 |103
14 |103

Any one care to comment on my analysis and especially on the exists assumption made?

If this is correct, any query involving and EXISTS *potentially* will produce wrong results, and this could be the root cause for a couple of the other EXISTS issues?

Thomas Nielsen added a comment - 18/Jan/08 11:09 PM
Too late to keep working - EXIST should return only one row.

Dyre Tjeldvoll added a comment - 19/Jan/08 09:38 PM
Hi Thomas, looks like you've been busy (even though it is the weekend).
Are you saying that the query behaves correctly?

SELECT <something> FROM <tables> WHERE EXISTS <subquery>

must still evaluate subquery for all rows, right? And just return those for which the subquery produces at least one row? Or am I totally mis-understanding?

Craig Russell added a comment - 20/Jan/08 11:44 PM
> Too late to keep working - EXIST should return only one row.
There are many ways to evaluate the query, depending on the strategy. Yes, each of the EXISTS clauses should return either zero or one row, depending on whether the EXISTS condition is true for the specific row of the outermost FROM clause.

For each row of the outer product of the outermost FROM clause (DEPARTMENTS, EMPLOYEES, PROJECTS), the middle WHERE EXISTS should return true when there is a match between the EMPLOYEES and PROJECTS in the join table. As an optimization, you could invert the scan and look at all of the rows of the join table. It's hard to tell from the analysis here whether Derby is inverting the scan.

Thomas Nielsen added a comment - 21/Jan/08 08:35 AM
Dyre> Are you saying that the query behaves correctly?
No, it does not on the current trunk. And your understanding of how this should be evaluated is correct I believe. Also see Craigs comment above.

Craig> As an optimization, you could invert the scan and look at all of the rows of the join table.
Looking at the queryplan I think this is what Derby does, and during this optimization something goes wrong.

As Bryan suggests, flattening of the exists query could be what goes wrong. I didn't see any obvious errors in the methods he suggested though.

The problem appears since the NestedLoopJoinResultSet above the HashScanResultSet that returns 3 out of 7 rows, incorrectly has its right side
"optimized" to expect a single row since it's flagged as an EXISTS query. This causes NestedLoopJoinResultSet to retrieve 1 row, then close and reopen the
HashScanResultSet subquery which then returns from the next hash bucket like it's supposed to. This way we continue through the 3 hash buckets returning 3 out of 7 rows.

As I stated in my last comment (the long one), if the NestedLoopJoinResultSet has it's right child (correctly) flagged as *not* being an EXISTS query
and expected to return a single row, the query returns the expected 7 rows. So we need to figure out the root cause of why the NesteLoopJoinResultSet
thinks its right child will return only one row.

Bryan Pendleton added a comment - 21/Jan/08 03:26 PM
Perhaps there are some clues in this section of the manual:
http://db.apache.org/derby/docs/10.3/tuning/ctuntransform13699.html
In particular, this section:
http://db.apache.org/derby/docs/10.3/tuning/ctuntransform36368.html

I find it interesting that there are two levels of subquery flattening
going on here; that is, we have a subquery in the where clause of
a subquery which is itself in the where clause of the top-level query.
Perhaps the flattening logic is evaluating the correctness rules properly
for a single level of flattening but isn't handling flattening-within-flattening?

Bryan Pendleton added a comment - 21/Jan/08 03:39 PM
This slight variation of the query seems to return the correct 7 rows. The
change is that I removed the extra join of this_employees_e, and just
used unbound_e throughout:

ij> select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
    where exists (
        select 1 from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = unbound_e.empid
        and unbound_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = unbound_e.empid) ;

EMPID |PROJID
-----------------------
11 |101
12 |101
12 |102
13 |101
13 |102
14 |103
15 |103

7 rows selected

Craig Russell added a comment - 21/Jan/08 04:41 PM
Bryan> This slight variation of the query seems to return the correct 7 rows.

So the intermediate flattening of the query is probably causing the problem. Note that for more complex mappings from JDOQL to SQL, there will be even more intermediate EXISTS expressions.

I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that goes into some detail of the EXISTS join flattening.

A B added a comment - 21/Jan/08 05:21 PM - edited
bryan> Perhaps the flattening logic is evaluating the correctness rules properly
bryan> for a single level of flattening but isn't handling flattening-within-flattening?

Without having done any explicit tracing myself, this is what I would guess is the
problem. Perhaps the inner-most EXISTS subquery can be considered an "exists
base table" (and thus only returns a single row) w.r.t. it's parent query, but if it is then
flattened _again_ to the outer-most query, maybe it should no longer be marked as
an "exists base table"? I don't if that's actually true, I'm just speculating.

craig> I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that
craig> goes into some detail of the EXISTS join flattening

Something from that page which stands out to me is the following condition:

  "None of the predicates in the subquery, including the additional one formed
  between the left side of the subquery operator and the column in the subquery's
  SELECT list (for IN or ANY subqueries), include any subqueries, method calls,
  or field accesses."

If an EXISTS clause is considered a "predicate", then the query

  select <...> from <...>
  where exists (
    select 1 from <...>
    where exists (
        select 1 from <...>
          ...

seems to violate the condition quoted above from the documentation. That is, the
subquery for the first EXISTS has a predicate (in this case another EXISTS query)
which includes another subquery ("select 1 from ..."), and therefore the first
subquery should not be flattened. If this is the correct reading then the bug would
appear to be in the logic that checks for the aforementioned condition--presumably
because that check occurs after the inner-most subquery has itself been flattened?
Again, I'm just guessing, I haven't done much examination of the actual codepath...

Thomas Nielsen added a comment - 21/Jan/08 09:54 PM - edited
Thanks a lot for all the pointers folks. I think both Army and Bryan nailed it on where/why this query fails on the trunk.

To check the hypothesis, I tried marking any subquery in the wherePredicates as such and skip flattening if this is a subquery in a whereClause.
The happens to be exactly what Kathey did in DERBY-3257 for havingSubqueries. With the mark-and-skip fix the correct 7 rows are returned as the flattening is skipped.

My approach is possibly too brute, and might be relaxed, but I'll post a proper patch tomorrow morning.

Craig Russell added a comment - 21/Jan/08 09:58 PM
Great! I'm looking forward to trying the patch.

Thomas Nielsen added a comment - 22/Jan/08 08:48 AM
The attached diff 'derby-3301-1.diff' works in similar fashion as Katheys diff for DERBY-3257 by
- marking any subquery in a WHERE clause as such
- skip flattening of the subquery on the where subquery condition

This diff produces the correct results:
---- snip ---
ij(CONNECTION1)> select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e
    where exists (
        select 1 from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );
EMPID |PROJID
-----------------------
11 |101
12 |101
13 |101
12 |102
13 |102
14 |103
15 |103

7 rows selected
---- snip ---

Even though the diff works, the condition for skipping flattening should be relaxed to only apply to EXISTS subqueries in a WHERE clause, not *all* subqueries in a where clause as is done in 'derby-3301-1.diff'.

A B added a comment - 22/Jan/08 05:20 PM - edited
> Even though the diff works, the condition for skipping flattening should be relaxed
> to only apply to EXISTS subqueries in a WHERE clause

I think the condition may need to be narrowed down even further: EXISTS
subqueries _are_ allowed to be flattened from a WHERE clause _if_ that
subquery's WHERE clause does not itself contain another subquery. As an
example, take a look at the last query on the doc page mentioned above,
namely:

  SELECT t1.* FROM t1, t2
  WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1)

    gets flattened into

  SELECT t1.* FROM t1, t2, t3 WHERE t1.c1 = t3.c1

If changes are made to completely avoid flattening of EXISTS subqueries in
WHERE clauses, then the above query will *not* be flattened into an exists join,
even though it's perfectly valid to perform flattening in that case. With the approach
that you've outlined the exists join optimization as a whole will be disabled (I think?).
I think that has the potential to cause a performance regression for users whose
queries benefit from the EXISTS join optimization today.

A similar discussion was held for DERBY-3231 and in that case a decision was
made to deliberately remove an optimization for the sake of correctness. But the
use cases affected by that particular optimization were (most likely) few and far
between. With this issue, though, I think removing EXISTS subquery flattening
across the board will affect far more users, so I worry about committing such a fix.

The patch as posted does not apply to the current trunk, but from what I can tell, if
you run a query like the one mentioned in the documentation, ex.:

create table t1 (c1 int, c2 int, c3 int);
create table t2 (i int, j int);
create table t3 (c1 int, vc varchar(10));

insert into t1 values (1, -1, 1), (3, -3, 9), (2, -2, 4);
insert into t2 values (2, 4);
insert into t3 values (1, 'one'), (3, 'three');

select t1.* from t1, t2 where exists (select * from t3 where t1.c1 = t3.c1);

I think derby-3301-1.diff will cause the SELECT to skip flattening of the EXISTS
subquery. Is that correct?

The core problem for this issue appears to be the specific case where we have a
subquery SQ1 that appears in the whereClause of *another* subquery SQ0. In that
case disabling the flattening of SQ0 would be appropriate--as the documentation
states. But if subquery SQ1 appears in the whereClause of an *OUTER* query--as
shown above--I don't think we should disable flattening altogether.

If at all possible, I think it'd be better to fix the flattening condition for the specific
situation of nested subqueries than to completely disable WHERE clause flattening
for EXISTS subqueries.

Or put differently: if the documentation is correct, the _intent_ is to skip flattening of
an EXISTS subquery that has predicates which in turn contain other subqueries.
But the current code does not correctly implement that intent. So I think it'd be
good to figure out _why_ the current code is wrong for the case of nested
subqueries, and then try to make a change that addresses that specific problem.

On a slightly different note, have you had a chance to run the regression suites
with this change? I'm curious to know if any of the existing tests actually test
for EXISTS join flattening--and if so, do those tests still pass with the proposed
change?

Thomas Nielsen added a comment - 23/Jan/08 09:13 AM
Attaching updated patch 'derby-3301-2.diff'.

The condition for skipping flattening should be even more specific than what is implemented in 'derby-3301-2.diff' as per the discussion, so I'm still not marking this with patch available. This updated patch will skip flattening if the WHERE EXISTS subquery is a select with a where subquery.

The issue I'm facing is that flatteing of the lower/child WHERE EXISTS removes it from the whereSubquerys as it's supposed to. When we later preprocess the upper/parent WHERE EXISTS we lack information about what the lower whereSubquery originally was. It seems a similar issue was once seen for a normal whereClause as a copy of the original clause is kept in the SelectNode.originalWhereClause member.

Thomas Nielsen added a comment - 23/Jan/08 11:08 AM
Running the query on a faster, dual-core machine actually produce the correct results *without* running down the path provided by the last patch.
The queryplan looks totally different on the faster machine. Need to verify it still runs the problematic path with a slower machine.
Either different statistics cause the change in queryplan, or (more likely as we still are in preprocessing) something else has changed on trunk as well.

Thomas Nielsen added a comment - 23/Jan/08 08:34 PM
Attaching updated patch with correct condition checks for skipping flattening of WHERE EXISTS subquerys with nested WHERE EXISTS subqueries.
'derby-3301-3.diff' also takes SubqueryNodes in the where clause, not only where subqueries into consideration.

Will run derbyAll and suites.All on this patch and report the results.

Checking patch available, and assigning it to me.

Thomas Nielsen added a comment - 23/Jan/08 08:37 PM
I should probably add that I got it reproduced on the faster box as well. Environmental issue caused breakpoints to not trigger.

Thomas Nielsen added a comment - 23/Jan/08 09:32 PM
'derby-3301-3b.diff' fixes NPE seen in suites.All with the last patch.

Thomas Nielsen added a comment - 24/Jan/08 10:06 AM - edited
suites.All and derbyall passes with 'derby-3301-3b.diff'

Thomas Nielsen added a comment - 24/Jan/08 11:52 AM
Created DERBY-3349 for improved testing for nested WHERE EXISTS clauses

Dyre Tjeldvoll added a comment - 24/Jan/08 12:10 PM
I would like commit this fairly soon, but given the complexity of the issue, I would feel more at ease if someone else could also look at the 3b patch.

Bryan Pendleton added a comment - 24/Jan/08 03:06 PM
Does the 3b patch contain new regression tests?

Thomas Nielsen added a comment - 24/Jan/08 03:20 PM
No, 3b does not include any new regression tests or extentions.

An email comment from Army on derby-dev suggested I create a separate jira flagging the need for testing this issue. DERBY-3349 was filed for this.
Maybe I misinterpreted his comment?

Bryan Pendleton added a comment - 24/Jan/08 03:25 PM
I think that we should at least include the basic repro script
as a regression test in this patch, then as part of DERBY-3349
we could construct additional tests in this area.

Craig, you mentioned that your O/R tool generates lots of
queries like this; is it possible for you to contribute some
additional test cases, so that we can explore some other
varieties of flattenable and non-flattenable subqueries?

Thomas Nielsen added a comment - 24/Jan/08 03:44 PM
I can make the attached repro into a junit test that can be extended by DERBY-3349 later.
Or is there an existing junit test the repro should go into?

A B added a comment - 24/Jan/08 04:26 PM
> Maybe I misinterpreted his comment?

Yes, I think so. But probably more my fault than anything.

What I meant to say was that I expect any patch which completely disables EXISTS
subquery flattening to theoretically cause at least one test in the regression suite
to fail. If that's not true--i.e. if all tests pass with such a patch--then it means that
there are no existing test cases to verify that EXISTS subquery flattening is actually
occuring, and that would warrant a separate (testing) issue.

But for your 3b patch, you are *NOT* disabling EXISTS subquery flattening altogether (as
discussed earlier), so it's not surprising that existing tests run without error. That said,
I do think it would be good to add a new regression test to correspond to the specific
patch that you've posted--ex. a new test case based on the attached repro would be
good.

If that's still unclear, then the short version is "Yes, I think a new test case (or set of
test cases) should be added for this issue". A JUnit test based on the repro would be
ideal.

Apologies for any confusion cause by my earlier comment.

Thomas Nielsen added a comment - 24/Jan/08 05:14 PM - edited
Attaching 'derby-3301-test-1.diff', a junit version of Craigs original 'derby-3301.sql' reproscript.
The new regression test called WhereExistsTest.java can be used as a starting point for DERBY-3349.
The test-1 patch also adds the new test to functionTests/tests/lang/_Suite.java.

The test-1 patch runs successfully with the 3b patch applied, both as a single test and when run through the lang-suite.
 

A B added a comment - 24/Jan/08 06:58 PM
Two comments on the 3b patch.

1) The additional comments in SubqueryNode that describe the new condition for a flattenable
subquery seem like they might be slightly inverted. The comments say:

   /* Values subquery is flattenable if:
    * ...
    * o It is not a nested WHERE EXISTS subquery in a WHERE EXISTS clause (DERBY-3301)
    */

Ignoring the conditions which were already there, I read this as:

  "A subquery SQ1 is flattenable if it is not a nested WHERE EXISTS subquery in the WHERE
  EXISTS clause of another subquery, SQ2".

Is that a correct reading? If so, I think that if SQ1 *is* a nested WHERE EXISTS subquery in
the WHERE EXISTS clause of SQ2, then it's technically _SQ2_ that is not flattenable, not SQ1.
That said, the actual code changes in the new "hasNestedWhereExistsClause()" look correct,
so I think this is just a comment issue, not a code one. Maybe a better condition would be:

  o Either a) it (the subquery) does not appear within a WHERE clause, or b) it appears within
    a WHERE clause but does not itself contain a WHERE clause with other subqueries in it.

This definition is quite a bit more wordy, but it matches the documentation and it also matches
the code in hasNestedWhereExistsClause a bit more closely--though not exactly. See #2
comment below...

2) The logic that exists in hasNestedWhereExistsClause() does exactly what its name implies,
i.e. it checks specifically for "nested WHERE EXISTS clauses". That in itself is good since it
addresses the issue reported. But I did notice that the documentation for EXISTS join flattening
suggests a slightly more general condition:

 http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html

Namely, it seems like a more complete approach would be check to see if the subquery's
WHERE clause "includes _any_ subqueries"--not just WHERE EXISTS subqueries. For
example: if the inner-most WHERE EXISTS query was replaced with a WHERE IN or WHERE
ANY clause, would we still get the correct results?

As an example, I re-wrote the inner-most WHERE EXISTS subquery from derby-3301.sql to use a
WHERE IN clause that effectively duplicates one of the (already-existing) equality predicate, and
so should in theory return the same results...I think? (I could be wrong here):

select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e
    where this_employees_e.empid in ( -- *** Army changed this
        select this_employees_e_projects_p.empid -- *** Army changed this
        from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)

This returns three rows:

EMPID |PROJID
-----------------------
13 |101
13 |102
15 |103

Is that correct? Or should that query be returning 7 rows, as well? What about if
the first WHERE EXISTS query is changed to a "WHERE IN" query, as well--will
that behave correctly?

If it should return seven rows, as well, then perhaps the new method in SubqueryNode
could be expanded to check for the existence of _any_ subquery, not just a WHERE
EXISTS subquery?

One way to do that might be to use a CollectNodesVisitor to collect all instances of
SubqueryNode.class that appear in "sn.originalWhereSubquerys" and/or in
"sn.originalWhereClause()"--and if that visitor returns any matches, then the method
returns "false". I'm not entirely sure that will work, but it seems like it could...?

Thomas Nielsen added a comment - 25/Jan/08 10:34 AM
Thanks for the input Army.

Ad 1)
I agree, the 3b comments miss the point slightly, and may be seen as 'inverted'.
>
> I think that if SQ1 *is* a nested WHERE EXISTS subquery in the WHERE EXISTS clause of SQ2, then it's technically _SQ2_ that is not flattenable, not SQ1.
>
This is correct, to my understanding. Your comment suggestion is a lot better, even if it is more elaborate.

Ad 2)
True, 3b only takes the actual problem reported into consideration.
I reread the last bullet on the flattening criteria list (http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html) again. You are once again correct Army.
If SQ2 a has *any* subquery SQ1 in its WHERE EXISTS clause, flattening of SQ2 should be skipped. Not flattening in the case where SQ2 is an ANY or IN query
is already handled in the existing code.

I'll look into CollectNodesVisitor and post an updated patch.

Thomas Nielsen added a comment - 25/Jan/08 01:35 PM
Attaching 'derby-3301-4.diff'
- comments changed
- condition for flattening changed to exclude all where subquerys in a WHERE EXISTS subquery from flattening and renamed the helper method accordingly.

I have not touched flattening of IN at all with derby-3301-4.diff.

suites.All and derbyall pass


A B added a comment - 25/Jan/08 05:11 PM
Thank you for your continued work on this, Thomas, and for your patience with my feedback.
A couple of follow-up comments...

-- 1 --

> Not flattening in the case where SQ2 is an ANY or IN query
> is already handled in the existing code.

Can you point me to the place in the code where this check currently occurs? Note how if I replace
all occurrences of "WHERE EXISTS" from derby-3301.sql with "WHERE ... IN", the query returns 5 rows
after applying your patch--but I think it should return 7 like the others? So it seems the existing
code to handle ANY or IN queries may be incorrect, as well...

ij> select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where this.id in (
    select this_employees_e.department from employees this_employees_e
    where this_employees_e.empid in (
        select this_employees_e_projects_p.empid
          from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );

EMPID |PROJID
-----------------------
11 |101
12 |102
13 |102
14 |103
15 |103

5 rows selected

If I make the following change w.r.t your patch:

- if ( isWhereSubquery() && isEXISTS() ) {
+ if ( isWhereSubquery() && (isEXISTS() || isANY() || isIN()) ) {

then the above query returns 7 rows, as I think it should.

-- 2 --

With respect to derby-3301-4.diff, I noticed that we still check for an EXISTS WHERE subquery for the
second if statement, i.e.: at line 2382 (with your patch applied):

                if (sn.originalWhereClause != null &&
                    sn.originalWhereClause instanceof SubqueryNode){
                        SubqueryNode tmp = (SubqueryNode) sn.originalWhereClause;
                        if ( tmp.isWhereSubquery() && tmp.isEXISTS() ) { -- *** Note the "isExists()" check...

Is there any reason not to remove "tmp.isEXISTS()" from inner "if" statement, similar to what you
did in the first half of this method? Note how the query I posted yesterday with "WHERE ... IN" still
looks to return incorrect results (3 rows, when I think it should return 7?)--but if I remove the
"tmp.isEXISTS()" check from the "if" statement, that query returns 7 rows. I.e.

                        SubqueryNode tmp = (SubqueryNode) sn.originalWhereClause;
- if ( tmp.isWhereSubquery() && tmp.isEXISTS() ) {
+ if ( tmp.isWhereSubquery() ) {

Yields:

ij> select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e
    where this_employees_e.empid in (
        select this_employees_e_projects_p.empid
          from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );

EMPID |PROJID
-----------------------
11 |101
12 |101
13 |101
12 |102
13 |102
14 |103
15 |103

7 rows selected

From what I can tell, the two modifications to your patch mentioned above make the method
agree fully with the documentation, and appear to make all of the queries in question return
the correct number of rows. Do they seem like reasonable modifications to you?

Of course this entire comment is based on the assumption that the two queries I wrote above
with "WHERE ... IN" are in fact supposed to return 7 rows. If that's not true, then you can pretty
much ignore everything in this particular comment...

Thomas Nielsen added a comment - 26/Jan/08 07:05 PM
I'm the one to thank for your continued comments Army :)

Ad 1)
The IN variant of the query should return 7 rows as well I think. To verify I checked with another opensource database, and it does indeed return 7 rows too.

>> Not flattening in the case where SQ2 is an ANY or IN query
>> is already handled in the existing code.
>
>Can you point me to the place in the code where this check currently occurs?

There are checks in SubqueryNode @698, @757 that I at writing thought would handle this - obviously not so.

Ad 2)
The inner most tmp.isEXISTS() was not intentional :/ Thanks for spotting that.

Thomas Nielsen added a comment - 26/Jan/08 07:20 PM
Attaching 'derby-3301-4b.diff' based on Armys latest comments.
I renamed the helper function to better describe what it checks.

Attaching 'derby-3301-test-2.diff' which includes the IN variant of the query.

derbyall passes.
suites.All has 7 errors off todays head of trunk, but they seem unrelated (StringIndexOutOfBounds in SecureServerTest). I'll look into it on monday.
The new WhereExistsTest pass with 4b applied.

A B added a comment - 28/Jan/08 05:29 PM
Thank you for the 4b patch, Thomas.

From what I can tell, the changes now look functionally good to me--thank you for incorporating
my feedback.

I did notice that some of the code comments still reference "WHERE EXISTS" predicates only,
when in fact the new method (now) coveres IN and ANY predicates, as well. But that's just a
nit--it shouldn't block commit of the patch.

My only remaining minor comment is that I think it would be good to add test cases for
the various IN, ANY, and EXISTS combinations that are not currently tested. As an example,
the query I posted previously that replaces _both_ instances of "EXISTS" with "IN" would be a
good test case to add. I see that you added a new test case for "EXISTS ... IN", which was
great--thanks for doing that. I think a minimal list of combinations to check should be:

  * WHERE EXISTS ... WHERE EXISTS (done with test-1.diff -- thanks!)
  * WHERE EXISTS ... WHERE IN (done with test-2.diff -- thanks!)
  * WHERE EXISTS .. WHERE ANY
  * WHERE IN ... WHERE IN (I posted an example in my previous post, but it's not included in the tests)
  * WHERE ANY ... WHERE ANY

But again, the additional test cases could come as a separate patch later, or perhaps even
as a patch for DERBY-3349. I.e. I don't think this should block commit of the 4b patch nor the
test-2 patch.

> suites.All has 7 errors off todays head of trunk

I haven't seen these errors in the tinderbox runs. Do you see them in a clean codeline? And
if so, were you able to track down the cause?

Thank you for all of your work on this, Thomas!

Thomas Nielsen added a comment - 28/Jan/08 06:19 PM
I agree that all the listed combinations should be tested, but let's leave the test extension to DERBY-3349.

Reran suites.All with the 4b and test-2 patches on two different machines, and it passed on both. Most likely the failure was caused by test folder that wasn't clean before starting the test.

Attaching derby-3301-4c with updated comments in SubqueryNode.isWhereExistsAnyInWithWhereSubquery() according to Armys commnets.

Thomas Nielsen added a comment - 28/Jan/08 06:28 PM
Would it be a good idea to rename the new test in test-2 from WhereExistsTest.java to something like WhereExistsAnyInTest.java?

A B added a comment - 28/Jan/08 06:46 PM
> Would it be a good idea to rename the new test in test-2 from
> WhereExistsTest.java to something like WhereExistsAnyInTest.java?

Or maybe NestedWhereSubqueryTest.java would be a more general but still appropriate name?

Dyre, if you're still looking to commit the changes for this issue, I have no further comments
to make on the 4c nor the test-2 patch. Many thanks again to Thomas for his persistence on
this one...

Thomas Nielsen added a comment - 28/Jan/08 07:22 PM
Attaching 'derby-3301-test-3.diff' with renamed test.

Test passes when run standalone, and is running as part of the lang suite now.

Thomas Nielsen added a comment - 28/Jan/08 07:56 PM
test-3 passed as part of the suite as well.

Thanks a lot for all your guidance Army!

Thomas Nielsen added a comment - 28/Jan/08 08:37 PM
Reattaching derby-3301-test-3.diff

Dyre Tjeldvoll added a comment - 29/Jan/08 11:46 AM
AB> Dyre, if you're still looking to commit the changes for this issue,

Anyone should feel free to commit it :) But yes, I do think it is an important and complex fix which will benefit from being run in the nightly regression tests for a while before the next release.
I have applied the patches successfully and I'm running the tests. I plan to commit them later this afternoon.

On a related note; this is not a regression, but would it still make sense to merge it to 10.3. ?

Dyre Tjeldvoll added a comment - 29/Jan/08 04:30 PM
With the two patches I get a failure in lang/subqueryFlattening.sql.
From a quick look at the diff it seems like the test dumps a query plan that is different with the patch. I guessing that the master file needs to be updated,
but it would be nice if someone could confirm it.

Craig Russell added a comment - 29/Jan/08 05:10 PM
I'd like to summarize my understanding of the patch.

If you have an EXISTS nested inside an EXISTS, the optimized inversion of the query execution plan does not occur. Instead, the natural execution plan is performed. That is, the outer tables are iterated and for each candidate tuple of the joined tables, the EXISTS is performed.

For each EXISTS check, the scans are reinitialized and there might not be any clues as to exactly where in the outer scan the EXISTS is being invoked. This might yield sub-optimal performance compared to an optimized inverted scan.

Also, I'm trying to get some more test cases by putting additional criteria into the EXISTS and running the query in debug mode and filtering the output. It's slower than I expected but I'll continue if it will help.

A B added a comment - 29/Jan/08 05:26 PM
> With the two patches I get a failure in lang/subqueryFlattening.sql.

Good catch, Dyre. I was operating on the assumption that derbyall ran cleanly with
the 4c patch; I hadn't run it myself.

> From a quick look at the diff it seems like the test dumps a query plan that is different
> with the patch

I confirmed that there is one query for which the query plan is different--and for that specific query,
I think the difference is correct. Namely, the query has nested WHERE EXISTS subqueries and so,
according to the findings for this issue, cannot be safely flattened. So that part seems okay.

I then noticed that another query further down in the test _also_ has nested WHERE EXISTS
subqueries, but that query _is_ in fact flattened into an exists join--which runs counter to the
findings so far for this issue (i.e. it shouldn't be flattened into an EXISTS due to the nested
EXISTS subqueries). The fact that the query in subqueryFlattening still returns correct rows
suggests that maybe there are cases where flattening of EXISTS subqueries is "safe"--but
that seems beyond the scope of this issue.

I played around with the repro for this issue and was able to write the following query:

select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e
    where 1 = 1 and exists (
        select 1 from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );

The only difference between this query and the one in derby-3301.sql is that I've
added a (no-op) predicate "1 = 1" alongside the inner-most EXISTS clause.
Since this creates an AndNode whose left operand is a relational op ("1 = 1") and
whose right operand is a SubqueryNode, the code in the 4c patch that checks
specifically for a SubqueryNode will not detect it:

+ if (sn.originalWhereClause != null &&
+ sn.originalWhereClause instanceof SubqueryNode) {
+ ...

Since whereClause here is actually an AndNode, not a SubqueryNode, the "if" statement
above is not triggered, thus the query is incorrectly flattened into an EXISTS joins. The
end result is that the query only returns 5 rows when it should return 7 (the extra "1 = 1"
predicate shouldn't affect the results).

I think I mentioned a few comments ago that it might be possible to use a CollectNodesVisitor
to search the whereClause for any SubqueryNodes. The above example demonstrates why
such a visitor would be helpful, as opposed to just checking directly for a SubqueryNode...

So to answer Dyre's question, yes, I think a master update will be needed. But further
inspection of subqueryFlattening.sql shows that the 4c patch may need a few more
tweaks, after all...

Craig Russell added a comment - 29/Jan/08 08:50 PM
Here are more queries, just slightly modified from the original queries. I've done little editing of the queries to avoid introducing copy/paste errors.

This one works:
SELECT UNBOUND_E.PERSONID FROM applicationidentity0.DEPARTMENTS THIS , applicationidentity0.PERSONS UNBOUND_E
  WHERE EXISTS (
    SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
      WHERE THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID AND THIS.ID = <2>)

This fails, returns 5 rows where 7 are expected. The difference between this and the original query is that THIS.ID is also returned in the outer select:
SELECT THIS.ID,UNBOUND_E.PERSONID,UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS , applicationidentity0.PERSONS UNBOUND_E , applicationidentity0.PROJECTS UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P
         WHERE THIS_EMPLOYEES_E_PROJECTS_P."MEMBER" = THIS_EMPLOYEES_E.PERSONID
           AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
           AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
           AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID
))

This fails, returns 3 rows where 5 are expected.
SELECT UNBOUND_E.PERSONID,UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS , applicationidentity0.PERSONS UNBOUND_E , applicationidentity0.PROJECTS UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P
          WHERE THIS_EMPLOYEES_E_PROJECTS_P."MEMBER" = THIS_EMPLOYEES_E.PERSONID
          AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
          AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
          AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID
          AND THIS.ID = <1>))

This fails, returning 5 rows where 7 are expected.
SELECT UNBOUND_E.PERSONID,UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS , applicationidentity0.PERSONS UNBOUND_E , applicationidentity0.PROJECTS UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P
          WHERE THIS_EMPLOYEES_E_PROJECTS_P."MEMBER" = THIS_EMPLOYEES_E.PERSONID
          AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
          AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
          AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID
          AND THIS.COMPANYID = <1>))


Dyre Tjeldvoll added a comment - 30/Jan/08 09:12 AM
AB> I was operating on the assumption that derbyall ran cleanly with the 4c patch; I hadn't run it myself.

I talked to Thomas offline yesterday and he was adamant that he HAD run derbyall with 0 failed tests. Which was true, except that when he looked closer at derbyall_report.txt it turned out that
an environment problem also had caused 0 tests to be run...

Thomas Nielsen added a comment - 30/Jan/08 07:57 PM
Very sorry for the derbyall hickup :|

Attaching
- updated patch 'derby-3301-5.diff' which uses a CollectNodesVisitor to find all lower SubqueryNodes as pointed out by Army.
- extra sql script in 'derby-3301-extra.sql' containig Craigs additional queries modified to work with the original repro script schema sql'.

With the 5 patch applied both the original query and the additional queries Craig posted return as expected. See output below.
I still need to
- have another look at Armys latest variant, as it doesn't seem to end up with a similar querytree to the others (i.e no SubqueryNodes in the WHERE clause).
- modify the master output for lang/subqueryFlattening.sql.

ij>
select unbound_e.empid, unbound_p.projid
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e
    where exists (
        select 1 from project_employees this_employees_e_projects_p
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );

EMPID |PROJID
-----------------------
11 |101
12 |101
13 |101
12 |102
13 |102
14 |103
15 |103

7 rows selected

---

Craigs additional queries:
ij>
SELECT UNBOUND_E.empid FROM DEPARTMENTS THIS , employees UNBOUND_E
  WHERE EXISTS (
    SELECT 1 FROM employees THIS_EMPLOYEES_E
      WHERE THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_E.empid = THIS_EMPLOYEES_E.empid AND THIS.ID = 2);
EMPID
-----------
14
15

2 rows selected
ij>
SELECT THIS.ID,UNBOUND_E.empid,UNBOUND_P.PROJID FROM DEPARTMENTS THIS , employees UNBOUND_E , projects UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM employees THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM project_employees THIS_EMPLOYEES_E_PROJECTS_P
         WHERE THIS_EMPLOYEES_E_PROJECTS_P."EMPID" = THIS_EMPLOYEES_E.empid
           AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
           AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
           AND UNBOUND_E.empid = THIS_EMPLOYEES_E.empid
));
ID |EMPID |PROJID
-----------------------------------
1 |11 |101
1 |12 |101
1 |13 |101
1 |12 |102
1 |13 |102
2 |14 |103
2 |15 |103

7 rows selected
ij>
SELECT UNBOUND_E.empid,UNBOUND_P.PROJID FROM DEPARTMENTS THIS , employees UNBOUND_E , PROJECTS UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM employees THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM project_employees THIS_EMPLOYEES_E_PROJECTS_P
          WHERE THIS_EMPLOYEES_E_PROJECTS_P."EMPID" = THIS_EMPLOYEES_E.empid
          AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
          AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
          AND UNBOUND_E.empid = THIS_EMPLOYEES_E.empid
          AND THIS.ID = 1));
EMPID |PROJID
-----------------------
11 |101
12 |101
13 |101
12 |102
13 |102

5 rows selected
ij>
SELECT UNBOUND_E.empid,UNBOUND_P.PROJID FROM DEPARTMENTS THIS , employees UNBOUND_E , PROJECTS UNBOUND_P
  WHERE EXISTS (
    SELECT 1 FROM employees THIS_EMPLOYEES_E
      WHERE EXISTS (
        SELECT 1 FROM project_employees THIS_EMPLOYEES_E_PROJECTS_P
          WHERE THIS_EMPLOYEES_E_PROJECTS_P."EMPID" = THIS_EMPLOYEES_E.empid
          AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID
          AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID
          AND UNBOUND_E.empid = THIS_EMPLOYEES_E.empid
          AND THIS.COMPANYID = 1));
EMPID |PROJID
-----------------------
11 |101
12 |101
13 |101
12 |102
13 |102
14 |103
15 |103

7 rows selected
ij>

Thomas Nielsen added a comment - 30/Jan/08 09:31 PM
Attaching updated master file for lang/subqueryFlattening. Diffs only in qyeryplans, so the changes in patch 5 produce identical results for the tested queries.
When run standalone the updated lang/subqueryFlattening test pass.

A B added a comment - 30/Jan/08 11:42 PM
> I still need to have another look at Armys latest variant, as it doesn't seem to end
> up with a similar querytree to the others

Actually, I think the query does fit the general shape. In your v5 patch you iterate through each of
the SubqueryNode's that are found by CollectNodesVisitor and check to see if any of them was
marked as a "whereSubquery()". For some reason that check fails in the query that I posted--i.e.
the one where I include the "1 = 1" predicate. While I don't know why the call to whereSubquery()
returns false, I also don't think that it's necessary at this particular point.

The code that calls CollectNodesVisitor is this line:

          sn.originalWhereClause.accept(cnv);

Since we're visiting the original WHERE clause, any SubqueryNodes that we find are necessarily
going to be "WHERE subqueries" because we found them in a WHERE clause. So it seems like
we shouldn't have to call "isWhereSubquery()" on any of them--we just need to check to see if the
CollectNodesVisitor found at least one, and if so, we're done. I changed your version 5 patch to
look like this:

  ....
                if (sn.originalWhereClause != null)
                {
                    /* Second instance of SubqueryNode.class effectively means "don't bother
                     * searching beneath SubqueryNodes since if we found one, we're done."
                     */
                    CollectNodesVisitor cnv =
                        new CollectNodesVisitor(
                            SubqueryNode.class, SubqueryNode.class);

                    sn.originalWhereClause.accept(cnv);
                    return cnv.getList().isEmpty();
                }
  ....
               
and with that change (everything else the same), the query with "1 = 1" returns 7 rows, as it should. The
other queries mentioned by Craig also return the correct results (assuming the first one is supposed to
return 2 rows, which I assume it is...?)

So it looks like the use of CollectNodesVisitor does help. As for the issue of why the SubqueryNode's
found in originalWhereClause return false for "isWhereSubquery()", I haven't looked at that.

A B added a comment - 30/Jan/08 11:46 PM
On a complete unrelated note, I think the preference for Derby codeline is to keep lines
shorter than 80 characters (where possible). Some of the lines in the latest patch look
to exceed that. Not a big deal, but thought I'd mention it.

Craig Russell added a comment - 31/Jan/08 01:19 AM
> and with that change (everything else the same), the query with "1 = 1" returns 7 rows, as it should. The other queries mentioned by Craig also return the correct results (assuming the first one is supposed to return 2 rows, which I assume it is...?)

Yes, it is supposed to return 2 rows. It's not a particularly interesting query since it works with or without the patch, but I included it because it also uses the EXISTS pattern.

And let me say again I appreciate all the effort you all are putting into this.

Craig

Craig Russell added a comment - 31/Jan/08 06:11 AM
I've applied the patch derby-3301-5.diff and rebuilt the derby and derbytools jars.

The JDO TCK tests that fail on the released jars now pass with the patch applied.

Thanks so much for getting this bug resolved to this point. This is no longer a blocker for me, assuming that the changes can be committed to the code line and a patch release made in due course.

Craig


Thomas Nielsen added a comment - 31/Jan/08 08:32 AM
Reattaching the derby-3301-extra.sql script with Armys 1=1 variant included as well.

Attaching derby-3301-6.diff with Armys latest comments
- fixed length of some lines
- CollectNodesVisitor stops when finding the first SubqueryNode.
- no need to check for isWhereSubquery() when using the CNV on the original where clause, we already know it is

All reported queries work as expected with the 6 patch.

suites.All and derbyall are running.

Thanks for all your help Army, and to Craig for the positive feedback!

A B added a comment - 31/Jan/08 06:14 PM
Hi Thomas,

I really hate to keep bringing up more issues, but as I was about to sign off on patch 6, the
following caught my eye:

                   /*
                    * This WHERE EXISTS | ANY | IN subquery has another
                    * subquery in its own WHERE *clause* if the CNV is not
                    * empty.
                    */
                   return cnv.getList().isEmpty();

Note how the code comments don't match the code--there's a missing "!" operator in the
actual code. This is how I wrote it in my previous comment, but that was wrong. The
method is supposed to return "true" if the SubqueryNode has a WHERE clause with
another subquery in it--which will be true if cnv.getList() is NOT empty. So I missed
the negation (sorry).

With this fix in place all of Craigs queries still run correctly--but the queries with
"1 = 1" in them start failing again. In tracing it turns out that, for those cases,
the CollectNodesVisitor does not find the SubqueryNodes in "sn.originalWhereClause".
I think it comes down to something you mentioned earlier, namely:

> I still need to have another look at [the "1 = 1"] variant, as it doesn't seem
> to end up with a similar querytree to the others

Upon further inspection, I think you are right about this.

For reference, this is the query in question. Note the tags on the right for
the sake of discussion, where "SN" implies "SelectNode":

select unbound_e.empid, unbound_p.projid -- SN_OUTER
from departments this,
     employees unbound_e,
     projects unbound_p
where exists (
    select 1 from employees this_employees_e -- SN_INNER_1
    where 1 = 1 and exists (
        select 1 from project_employees this_employees_e_projects_p -- SN_INNER_2
        where this_employees_e_projects_p.empid = this_employees_e.empid
        and this_employees_e.department = this.id
        and unbound_p.projid = this_employees_e_projects_p.projid
        and unbound_e.empid = this_employees_e.empid)
    );

By the time we reach SubqueryNode.isWhereExistsAnyInWithWhereSubquery() for the
SubqueryNode wrapping SN_INNER_1, the clause:

  where 1 = 1 and exists ( ... )

ends up as an AndNode whose left operand is "1 = 1" but whose right operand is the
constant literal TRUE--which is not what we expect. We expect the AndNode's right
operand to be the SubqueryNode corresponding to SN_INNER_2. That is, if "sn" is the
SelectNode for SN_INNER_1 then sn.originalWhereClause should include the SubqueryNode
that wraps SN_INNER_2. That is true when we first set sn.originalWhereClause in the
init() method of SelectNode--but by the time we get to preprocessing for the SubqueryNode
wrapping SN_INNER_1, the SubqueryNode for SN_INNER_2 is no longer in SN_INNER_1's
originalWhereClause.

From what I can tell, this is because sn.originalWhereClause for SN_INNER_1 points to
the same object as sn.whereClause. So when sn.whereClause is itself transformed due to
flattening of the subquery SN_INNER_2 (which is legal), sn.originalWhereClause() sees
the same transformation. Thus the SubqueryNode wrapping SN_INNER_2 disappears from
sn.whereClause, which is good--but it also disappears from sn.originalWhereClause(),
which is bad. The fact that it disappears means that CollectNodesVisitor for SN_INNER_1
will not find it, and thus SN_INNER_1 will be flattened, which is not legal.

With the negation operator gone (as in patch 6), the fact that we can't find the SubqueryNode
for SN_INNER_2 causes the method to incorrectly return "true" (because CNV's list is empty),
which is then negated by the caller and thus flattening of SN_INNER_1 is accidentally (but
correctly) avoided. I think it's clear that the negation operator should be there, though.

I'm not sure what the best way to address that would be--but before you go there,
can you double-check these findings to see if you agree? Maybe I'm just missing
something obvious...

Apologies for the negation slip-up to begin with.

Thomas Nielsen added a comment - 31/Jan/08 07:21 PM
Another excellent catch Army! I'm very glad you're helping out, so do not feel bad about bringing these things up!

I didn't report back on the testing of the 6 patch - due to fact that my updated derbyall test failed. It fails for that exact reason - missing negation on
   return cnv.getList().isEmpty();

Fixed the missing negation in derby-3301-7.diff.

With the 7 patch
- all of Craigs queries work as expected
- updated lang/subqueryFlattening.sql (part of derbyall) pass
- new NestedWhereSubqueryTest (part of suites.All)

I'll rerun derbyall and suites.All with the 7 patch to verify nothing else broke.

With the 7 patch the 1=1 variant returns only 3 rows, and I'll need to have a further look into this. From what I found at last glance, I think you are onto it with your explaination. If I remember correctly I had the
AndNode in the whereClause with the 1=1 as its left node and a TRUE as its right node (as you describe). The where subqueries had ended up in the wherePredicates list. But this needs a more thorough look.

Given Craigs urgency with getting this fixed, do you think we can commit the 7 patch, and leave the 1=1 for a later commit?

A B added a comment - 31/Jan/08 07:45 PM
> With the 7 patch the 1=1 variant returns only 3 rows, and I'll need to have a further
> look into this.

One thing that occurred me: would it be possible to replace the originalWhereSubquerys field
in SelectNode with a simple boolean, something like origWhereClauseHadSubqueries. Then
in the init() method of SelectNode you could invoke the CollectNodeVisitor on whereClause
and set the boolean to true if you find any SubqueryNodes. That way the boolean would not
be affected by subsequent binding/preprocessing transformations. If you do that, then the code
in SubqueryNode.isWhereExistsAnyInWithWhereSubquery() could theoretically replace the
sn.origWhereClause and sn.origWhereSubquerys processing with a simple check of
sn.origWhereClauseHadSubqueries. I haven't actually tried it, but I wonder if that might be
a simple way to get things working...

> With the 7 patch
> - updated lang/subqueryFlattening.sql (part of derbyall) pass
> - new NestedWhereSubqueryTest (part of suites.All)

Are these intended to be part of the 7 patch, or are these going to be attached separately?

> Given Craigs urgency with getting this fixed, do you think we can commit the 7 patch, and
> leave the 1=1 for a later commit?

If the 7 patch passes the regression tests, and if you understand all of the updates that were necessary
for subqueryFlattening.sql, then yes, I think this would be fine. Incremental development is a good thing :)

Thomas Nielsen added a comment - 31/Jan/08 08:11 PM
That sounds like a doable approach Army - I'll give it a go and let you know how it works out.

>> - updated lang/subqueryFlattening.sql (part of derbyall) pass
>> - new NestedWhereSubqueryTest (part of suites.All)
>
> Are these intended to be part of the 7 patch, or are these going to be attached separately?

They are already attached derby-3301-master.diff and derby-3301-test-4.diff respectively :) I agree the names of the test patches could be a lot better...

Thomas Nielsen added a comment - 31/Jan/08 09:32 PM
Armys suggestion of flagging the SelectNode as having subqueries in its where clause in init() using a ColletNodesVisitor seems to work like a charm :)
Attaching 'derby-3301-8.diff' which implements this change to the 7 diff.

Had to revisit my subqueryFlattening.sql master diff with this change though, so I'm attaching an updated master in 'derby-3301-test-master-2.diff. The changes to the master are correct to my knowledge - replace some of the PRNs and Hash Exists Joins with subqueries and Index Scan nodes. Just what we want :)

The 8 patch works for
- all reported queries, including the 1=1 variant.
- new NestedWhereSubqueryTest with test-4 patch applied
- lang/subqueryFlattening.sql with test-master-2 patch applied

I'll rerun derbyall and suites.All on the 8 patch.

Thomas Nielsen added a comment - 01/Feb/08 11:02 AM
Updated derbyall pass and lang._Suite runs successfully with the 8 code patch, and the master-2 and test-4 test patches applied.

I still get some unrelated errors in SecureServerTest in suites.All on Windows, but not on Solaris - I believe it's an environment issue and has nothing to do with the code patch.

Patches should hopefully be ready for final review, and commit :)

Dyre Tjeldvoll added a comment - 01/Feb/08 07:10 PM
Hi Thomas, thank you for your continued hard work on this.
Just to clarify: In your last comment you mention a test-4 patch, but there is no test-4 patch, only a test-3 patch (dated 28. Jan). I can apply the test-3 patch, but it results
in compilation errors:

compilet1:
    [javac] Compiling 95 source files to /export/home/tmp/derby/derby-scratch/classes
    [javac] /export/home/tmp/derby/derby-scratch/java/testing/org/apache/derbyTesting/functionTests/tests/lang/NestedWhereSubqueryTest.java:229: class or interface expected
    [javac] package org.apache.derbyTesting.functionTests.tests.lang;
    [javac] ^
    [javac] /export/home/tmp/derby/derby-scratch/java/testing/org/apache/derbyTesting/functionTests/tests/lang/NestedWhereSubqueryTest.java:231: class or interface expected
    [javac] import java.sql.ResultSet;
    [javac] ^

...

 

Thomas Nielsen added a comment - 01/Feb/08 10:08 PM
Oh dear... test-4 is a typo, it should say test-3 in my latest comments.

The new test in test-3 is only 209 lines long, and your build error happens at 229.
I think your environment managed to duplicate the contents of the new file? NB 6.0 seems to do that if you apply a diff with a newly created file twice btw.

A B added a comment - 04/Feb/08 04:50 PM
I applied patch 8 and verified that the reported queries return correct results.

I haven't applied the test/master patches, nor have I run the regression tests, but based on
inspection, I think it might be nice to update the comments preceding the affected query plans
in derby-3301-test-master-2.diff so that they agree with the new results--i.e. indicate which parts
of the query should/should not be flattened, and explain why. But that can come as cleanup,
it doesn't need to block commit of the current patches.

No other comments to make; thanks again, Thomas!

Dyre Tjeldvoll added a comment - 05/Feb/08 09:32 AM
Committed revision 618586.

Yes Thomas, the test-3 patch compiled cleanly when i manually deleted the old version of that file. (svn revert will not delete added files). All tests passed too.

Craig Russell added a comment - 05/Feb/08 04:49 PM
Hooray. Thanks everyone for your efforts on this.

Craig

Thomas Nielsen added a comment - 07/Feb/08 06:54 PM
Army> I think it might be nice to update the comments preceding the affected query plans
Army> in derby-3301-test-master-2.diff so that they agree with the new results

Attaching derby-3301-test-master-3.diff with updates to the two comments preceeding the changed queryplans.
The updated test pass.

Craig, can you confirm the current trunk works for you, and if so can we close the issue once the last comment patch has been comitted?

A B added a comment - 07/Feb/08 07:13 PM
> Attaching derby-3301-test-master-3.diff with updates to the two comments preceeding
> the changed queryplans.

Thanks, Thomas.

A B added a comment - 07/Feb/08 07:35 PM
Committed derby-3301-test-master-3.diff with svn # 619591.

Thomas Nielsen added a comment - 07/Feb/08 09:27 PM
Thanks for committing, Army.

Thomas Nielsen added a comment - 15/Feb/08 08:23 AM
Marking as resolved for 10.4.0.0.

This fix should probably be merged to 10.3 too?

Craig Russell added a comment - 16/Feb/08 03:57 AM
I'm convinced that this issue is resolved. I think it should stay open until a release containing the fix is made. Is that correct?

V.Narayanan added a comment - 16/Feb/08 04:07 AM
I think you can find more information about what to do here

http://wiki.apache.org/db-derby/DerbyCommitProcess

Dyre Tjeldvoll added a comment - 16/Feb/08 01:33 PM
Thanks for confirming Craig. I'm planning to merge it to 10.3. I just need to run the tests before I can commit the merge. After that I think it is OK to close it.

Dyre Tjeldvoll added a comment - 18/Feb/08 09:26 AM
Merged main fix to 10.3 in revision 628661.

Dyre Tjeldvoll added a comment - 18/Feb/08 09:43 AM
Merged the subqueryFlattening fix to 10.3 with revision 628669.

Thomas Nielsen added a comment - 18/Feb/08 09:51 AM
Thanks Dyre

Dyre Tjeldvoll added a comment - 18/Mar/08 10:13 PM
This issue has fix version 10.4 and is marked with either 'Release note needed' or 'Existing application impact', but does not have a releaseNote.html attached to it. Should it?

Michelle Caisse added a comment - 18/Mar/08 10:33 PM
I'm not sure. I forwarded the question to Craig. I was not following the
implication of this issue beyond the fact that if caused a JDO TCK test
to fail.

-- Michelle



Craig Russell added a comment - 18/Mar/08 10:46 PM
Seems like a release note might be appropriate. I could try to put together a description from the body of this JIRA if someone can give me a clue as to the format of a releaseNote.html attachment.

Dyre Tjeldvoll added a comment - 19/Mar/08 07:40 PM
The following Wiki page
http://wiki.apache.org/db-derby/ReleaseNoteProcess
has some info about how to format releaseNote.html

Craig Russell added a comment - 19/Mar/08 09:36 PM
I've tried to summarize the issue with the attached release note.

Daniel John Debrunner added a comment - 19/Mar/08 09:44 PM
I think the release note has its sections or tenses mixed up:

> Applications that execute SQL statements containing nested EXISTS clauses can see fewer rows than satisfy the query.

Hopefully Derby returns the correct results, not fewer rows than the query is meant to return.

Kathey Marsden added a comment - 19/Mar/08 09:53 PM
I think also the file needs to be called releaseNote.html for it to work with the
release note generator.

Craig Russell added a comment - 19/Mar/08 09:58 PM
I've changed the description to address the tense issue, and changed the name of the file.

Daniel John Debrunner added a comment - 19/Mar/08 10:14 PM
Looks better, I think the "Summary of the Change" section is incorrect:

> Derby can return fewer rows than expected.

That describes the old bug not a change, well it doesn't really describe anything since it's so vague. :-)

Is the summary more along the lines of:

Queries with nested EXIST clauses now return correct results.

???

Maybe someone more familiar with the issue could come up with the correct wording?

I'm not really sure that such changes require release notes, looking at this:

> Applications that depended on the previous incorrect behavior will need to be updated.

what does that really mean? How would such an application be updated?

Maybe a release note is good, but not suggesting application changes. More along the
lines of:

 > Applications using nested EXISTS may have previously returned incorrect results.

which is what the Symptoms section is for (and says) so maybe these two sections are
not required for wrong result fixes:

Incompatibilities with Previous Release
Application Changes Required

To reduce to a simple case, if the expression x + 2 actually returned x + 3 for any x, then would it make sense
to have a release note that effectively said "applications that depended on the incorrect addition of 2 need to be updated"??

Thomas Nielsen added a comment - 25/Mar/08 09:45 AM
Thanks for having a go at the RN Craig :)

Taking the liberty to attach an updated version of the releaseNotes.html based on my knowledge of the issue.

Craig Russell added a comment - 25/Mar/08 04:03 PM
Thanks for the update, Thomas. I could feel death by a thousand cuts coming...

Dyre Tjeldvoll added a comment - 02/Apr/08 08:28 AM
Checking 'release note needed' so the release note filter can pick it up.

Kathey Marsden added a comment - 25/Apr/08 04:13 PM
updating fix version to include 10.3.2.2 as this fix was ported to the 10.3 branch.