Issue Details (XML | Word | Printable)

Key: DERBY-3603
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Bryan Pendleton
Reporter: David Butterworth
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Derby

'IN' clause ignores valid results, incorrect qualifier handling suspected

Created: 08/Apr/08 02:14 AM   Updated: 30/Jun/09 03:55 PM
Return to search
Component/s: SQL
Affects Version/s: 10.3.2.1, 10.4.1.3
Fix Version/s: 10.3.3.0, 10.4.1.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works d3603_v1.patch 2008-04-11 05:16 PM A B 5 kB
File Licensed for inclusion in ASF works d3603_with_test.diff 2008-04-14 03:08 AM Bryan Pendleton 8 kB
Java Archive File derbydb.jar 2008-04-09 03:45 PM Kathey Marsden 425 kB
File derbydb.tar.bz2 2008-04-08 02:17 AM David Butterworth 232 kB
File Licensed for inclusion in ASF works subqueryFlattening.diff 2008-04-14 03:08 AM Bryan Pendleton 2 kB
File Licensed for inclusion in ASF works updateSubqueryMaster.diff 2008-04-16 01:23 AM Bryan Pendleton 12 kB

Urgency: Urgent
Bug behavior facts: Regression
Resolution Date: 16/Apr/08 04:05 PM


 Description  « Hide
Derbys' 'IN' clause is returning different results depending on which side of a joined table
I am doing my 'IN' comparison against. This only occurs when the number of items within the 'IN' clause is greater then 1.

This behaviour was also confirmed by Bryan Pendleton in this thread:
http://mail-archives.apache.org/mod_mbox/db-derby-user/200804.mbox/%3c47FA5974.2060705@amberpoint.com%3e

Using the test database attatched the following 2 queries produce the issue:

ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    admin_unit.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1
-----------
2

1 row selected
ij> select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
    spike.bookings booking
    WHERE booking.child_id = 2 AND
    account.admin_unit_id IN (1,21) AND
    booking.booking_date_time_out >= 20080331000000 AND
    booking.booking_date_time_in <= 20080406235900 AND
    account.account_id = booking.account_id AND
    admin_unit.admin_unit_id = account.admin_unit_id;
1
-----------
3

1 row selected
ij>

The only difference between the 2 statements is which side of a join the 'IN' clause is matched against.

Bryan performed some initial testing and stated the following:

--------------------- SNIP ------------------------

Interestingly, although the actual results do NOT contain any values
for admin_unit_id = 21, if I change the query to:

    admin_unit.admin_unit_id IN (1)
or
    account.admin_unit_id IN (1)

then the problem disappears -- I get 3 rows for both queries.

I also ran query plans for both the queries (in the IN (1,21) case)
and have pasted the (simplified) query plans at the end of this message.

I notice that in the case where the query gives 2 rows, which is
when we specify admin_unit.admin_unit_id in (1,21), the admin_unit_id
index scan output in the query plan contains:

           qualifiers:
Column[0][0] Id: 0
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

However, in the case where the query gives 3 rows, which is
when we specify account.admin_unit_id in (1,21), the admin_unit_id
index scan output in the query plan contains:

           qualifiers:
None

I think it is the presence/absence of this qualifier on the query
scan which is causing the different results in the query, as in
the first case we see:

           Number of rows qualified=2
           Number of rows visited=3

but in the second case we see:

           Number of rows qualified=3
           Number of rows visited=3

I definitely don't have any explanation for why you are getting
this odd behavior; it certainly seems like a bug to me.

-------------END SNIP -----------------------


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #648492 Wed Apr 16 03:40:30 UTC 2008 bpendleton DERBY-3603: 'IN' clause ignores valid results.

Patch contributed by A B (qozinx at gmail dot com)

Some queries using multi-valued IN clauses were not returning the right
results. An example of a query which was processed incorrectly is:

    select count(*) FROM spike.accounts account, spike.admin_units admin_unit,
        spike.bookings booking
    WHERE booking.child_id = 2 AND
        admin_unit.admin_unit_id IN (1,21) AND
        booking.booking_date_time_out >= 20080331000000 AND
        booking.booking_date_time_in <= 20080406235900 AND
        account.account_id = booking.account_id AND
        admin_unit.admin_unit_id = account.admin_unit_id;

The issue involves the behavior of MultiProbeTableScanResultSet when it
goes to re-open the scan; under certain circumstances, it was failing to
reset the probing state, and so was performing the probing incorrectly,
using only partial portions of the IN list values. For example, in the
above query, there were certain rows which were only tested against the
value "admin_unit_id = 21"; the "admin_unit_id = 1" case was skipped.

MultiProbeTableScanResultSet.reopenCore() was using a heuristic test to
distinguish between the two cases of:

   * A - The first is for join processing. In this case we have
   * a(nother) row from some outer table and we want to reopen this
   * scan to look for rows matching the new outer row.
   *
   * B - The second is for multi-probing. Here we want to reopen
   * the scan on this table to look for rows matching the next value
   * in the probe list.

The patch modifies this code so that the caller passes in a boolean flag
to specify which case is occurring, which avoids the problem thinking that
it was in case "B" when in fact it was actually in case "A".
Files Changed
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/tests/lang/InListMultiProbeTest.java
MODIFY /db/derby/code/trunk/java/testing/org/apache/derbyTesting/functionTests/master/subqueryFlattening.out
MODIFY /db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/execute/MultiProbeTableScanResultSet.java

Repository Revision Date User Message
ASF #648722 Wed Apr 16 15:26:28 UTC 2008 bpendleton DERBY-3603: 'IN' clause ignores valid results.

Patch contributed by A B (qozinx at gmail dot com)

Merged by svn merge -r 648491:648492 ../trunk/

Simple merge with no conflicts; no additional changes were necessary.
Files Changed
MODIFY /db/derby/code/branches/10.4/java/testing/org/apache/derbyTesting/functionTests/tests/lang/InListMultiProbeTest.java
MODIFY /db/derby/code/branches/10.4/java/testing/org/apache/derbyTesting/functionTests/master/subqueryFlattening.out
MODIFY /db/derby/code/branches/10.4/java/engine/org/apache/derby/impl/sql/execute/MultiProbeTableScanResultSet.java

Repository Revision Date User Message
ASF #649226 Thu Apr 17 18:47:18 UTC 2008 bpendleton DERBY-3603: 'IN' clause ignores valid results.

Patch contributed by A B (qozinx at gmail dot com)

Merged by svn merge -r 648491:648492 ../trunk/

Simple merge with no conflicts; no additional changes were necessary.
Files Changed
MODIFY /db/derby/code/branches/10.3/java/testing/org/apache/derbyTesting/functionTests/master/subqueryFlattening.out
MODIFY /db/derby/code/branches/10.3/java/engine/org/apache/derby/impl/sql/execute/MultiProbeTableScanResultSet.java
MODIFY /db/derby/code/branches/10.3/java/testing/org/apache/derbyTesting/functionTests/tests/lang/InListMultiProbeTest.java