Issue Details (XML | Word | Printable)

Key: DERBY-504
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Knut Anders Hatlen
Reporter: Knut Anders Hatlen
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

SELECT DISTINCT returns duplicates when selecting from subselects

Created: 12/Aug/05 06:07 PM   Updated: 11/Jan/07 09:49 AM
Return to search
Component/s: SQL
Affects Version/s: 10.1.2.1
Fix Version/s: 10.1.2.1, 10.2.1.6

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works DERBY-504-10.1-unix.diff 2005-09-07 06:27 PM Knut Anders Hatlen 64 kB
File Licensed for inclusion in ASF works DERBY-504-10.1-windows.diff 2005-09-07 06:27 PM Knut Anders Hatlen 65 kB
Text File Licensed for inclusion in ASF works DERBY-504-10.1.stat 2005-09-07 06:27 PM Knut Anders Hatlen 1.0 kB
File Licensed for inclusion in ASF works DERBY-504-cleanup.diff 2005-09-07 06:17 PM Knut Anders Hatlen 2 kB
Text File Licensed for inclusion in ASF works DERBY-504-cleanup.stat 2005-09-07 06:17 PM Knut Anders Hatlen 0.1 kB
File Licensed for inclusion in ASF works DERBY-504.diff 2005-08-28 11:10 PM Knut Anders Hatlen 27 kB
File Licensed for inclusion in ASF works DERBY-504.stat 2005-08-28 11:10 PM Knut Anders Hatlen 0.7 kB
Text File Licensed for inclusion in ASF works DERBY-504_b.diff 2005-08-31 10:16 AM Myrna van Lunteren 65 kB
Text File Licensed for inclusion in ASF works DERBY-504_b.stat 2005-08-31 10:16 AM Myrna van Lunteren 0.9 kB
File Licensed for inclusion in ASF works DERBY-504_c-CRLF.diff 2005-09-02 01:38 AM Knut Anders Hatlen 64 kB
File Licensed for inclusion in ASF works DERBY-504_c-CRLF.diff 2005-09-02 01:35 AM Knut Anders Hatlen 64 kB
File Licensed for inclusion in ASF works DERBY-504_c.diff 2005-09-01 09:19 PM Knut Anders Hatlen 63 kB
Text File Licensed for inclusion in ASF works DERBY-504_c.stat 2005-09-01 09:19 PM Knut Anders Hatlen 0.9 kB
Environment: Latest development sources (SVN revision 232227), Sun JDK 1.5, Solaris/x86
Issue Links:
Dependants
 

Resolution Date: 11/Jan/07 09:49 AM


 Description  « Hide
When one performs a select distinct on a table generated by a subselect, there sometimes are duplicates in the result. The following example shows the problem:

ij> CREATE TABLE names (id INT PRIMARY KEY, name VARCHAR(10));
0 rows inserted/updated/deleted
ij> INSERT INTO names (id, name) VALUES
       (1, 'Anna'), (2, 'Ben'), (3, 'Carl'),
       (4, 'Carl'), (5, 'Ben'), (6, 'Anna');
6 rows inserted/updated/deleted
ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n;
NAME
----------
Anna
Ben
Carl
Carl
Ben
Anna

Six names are returned, although only three names should have been returned.

When the result is explicitly sorted (using ORDER BY) or the id column is removed from the subselect, the query returns three names as expected:

ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n ORDER BY name;
NAME
----------
Anna
Ben
Carl

3 rows selected
ij> SELECT DISTINCT(name) FROM (SELECT name FROM names) AS n;
NAME
----------
Anna
Ben
Carl

3 rows selected

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Knut Anders Hatlen added a comment - 19/Aug/05 01:33 AM
Fixes the optimization bug which caused the trouble. Disables some optimization, so maybe we should fix it in another way?

Knut Anders Hatlen added a comment - 19/Aug/05 11:56 PM
The GROUP BY in the test is rewritten to DISTINCT, and the DISTINCT is lost in the optimization.

Knut Anders Hatlen added a comment - 26/Aug/05 08:40 PM
I have attached a patch which fixes the optimization bug by checking that a subquery has the same columns as the top-level query before pushing the duplicate elimination into the subquery. This patch supersedes the previously submitted patch.

This patch does not remove the optimization of SELECT DISTINCT in other cases than those that were incorrectly optimized in the original Derby code.

I have run derbyall successfully with the exception of two tests:
   - lang/groupBy.sql fails because of DERBY-519
   - store/encryptionKey.sql fails, but I have seen that others have had trouble with this test too, so I believe it is not related to my patch

I will submit tests for this bug later.

Knut Anders Hatlen added a comment - 28/Aug/05 11:10 PM
Added tests for the bug. No code changes from the previous patch.

The following tests were modified:

   1) lang/distinct.sql: Tests for this bug was added and the master file was updated. There are master files for this test in the j9_13 and j9_22 subdirectories, but I haven't changed those files.

   2) lang/groupBy.sql: No changes in the test, but the master files were updated to reflect the new behaviour (see DERBY-519). The j9_13 and j9_22 versions of the master file were also updated, but I haven't actually tested them.

Could someone with access to the j9_13 and j9_22 platforms have a look at these tests and update the master files?

This patch is now ready for review.

Rick Hillegas added a comment - 01/Sep/05 07:51 AM
This is a clean, compact bug fix. It could be made a little more compact: SelectNode and ProjectRestrictNode have almost identical while loops for extracting the base column reference. Please don't miss the opportunity to abstract out a common method here.

Once this change is made, I will apply the patch on my machine and run derbyall against jdk1.4.

Knut Anders Hatlen added a comment - 01/Sep/05 09:19 PM
Changed the patch as suggested by Rick.

Rick Hillegas added a comment - 02/Sep/05 12:46 AM
The patch seems to be broken. I can't apply the changes to the j2me canons. Here's the output from applying the patch:

patching file java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java
patching file java/engine/org/apache/derby/impl/sql/compile/SelectNode.java
patching file java/engine/org/apache/derby/impl/sql/compile/ProjectRestrictNode.java
patching file java/engine/org/apache/derby/impl/sql/compile/ResultColumn.java
patching file java/engine/org/apache/derby/impl/sql/compile/FromBaseTable.java
patching file java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinct.sql
patching file java/testing/org/apache/derbyTesting/functionTests/master/groupBy.out
patching file java/testing/org/apache/derbyTesting/functionTests/master/distinct.out
patching file java/testing/org/apache/derbyTesting/functionTests/master/j9_13/distinct.out
patching file java/testing/org/apache/derbyTesting/functionTests/master/j9_13/groupBy.out
patching file java/testing/org/apache/derbyTesting/functionTests/master/j9_22/distinct.out
Hunk #1 FAILED at 2477.
1 out of 1 hunk FAILED -- saving rejects to file java/testing/org/apache/derbyTesting/functionTests/master/j9_22/distinc
t.out.rej
patching file java/testing/org/apache/derbyTesting/functionTests/master/j9_22/groupBy.out
Hunk #1 FAILED at 293.
1 out of 1 hunk FAILED -- saving rejects to file java/testing/org/apache/derbyTesting/functionTests/master/j9_22/groupBy
.out.rej

Knut Anders Hatlen added a comment - 02/Sep/05 01:35 AM
Uploaded a new patch with windows line terminators.

Perhaps one of the committers could set svn:eol-style=native on the canon files in the j9_22 directory?

Knut Anders Hatlen added a comment - 02/Sep/05 01:38 AM
Uploaded a new patch with windows line terminators.

Perhaps one of the committers could set svn:eol-style=native on the canon files in the j9_22 directory?

Rick Hillegas added a comment - 02/Sep/05 10:43 AM
Looks great. Derball passes. Ready for a committer to check in.

Knut Anders Hatlen added a comment - 02/Sep/05 04:12 PM
After this last patch is applied, the only difference between the main canons and the j9_22 canons is the eol-style. This means that the committer could just remove the entire java/testing/org/apache/derbyTesting/functionTests/master/j9_22 directory if he or she pleases.

Satheesh Bandaram added a comment - 03/Sep/05 03:14 AM
I have submitted this patch to trunk. I wonder if the change in SelectNode.java may be doing redundant work by getting BaseColumnColumnNode and checking for simpleColumns. Since we have already matched (resultColumns.countNumberOfSimpleColumnReferences() == resultColumns.size()), doesn't that guarantee resultColumns to be only simpleColumns?

Sending java\engine\org\apache\derby\impl\sql\compile\FromBaseTable.java
Sending java\engine\org\apache\derby\impl\sql\compile\ProjectRestrictNode.java
Sending java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java
Sending java\engine\org\apache\derby\impl\sql\compile\ResultSetNode.java
Sending java\engine\org\apache\derby\impl\sql\compile\SelectNode.java
Sending java\testing\org\apache\derbyTesting\functionTests\master\distinct.out
Sending java\testing\org\apache\derbyTesting\functionTests\master\groupBy.out
Sending java\testing\org\apache\derbyTesting\functionTests\master\j9_13\distinct.out
Sending java\testing\org\apache\derbyTesting\functionTests\master\j9_13\groupBy.out
Sending java\testing\org\apache\derbyTesting\functionTests\master\j9_22\distinct.out
Sending java\testing\org\apache\derbyTesting\functionTests\master\j9_22\groupBy.out
Sending java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinct.sql
Transmitting file data ............
Committed revision 267239.

Knut Anders Hatlen added a comment - 05/Sep/05 06:25 PM
It is true that the change in SelectNode is doing redundant work, but not because (resultColumns.countNumberOfSimpleColumnReferences() == resultColumns.size()) guarantees that all result columns are simple columns (if we by "simple columns" mean a column in a base table, no aggregates etc). E.g. in the query 'SELECT a FROM (SELECT AVG(age) AS a FROM names) AS n', resultColumns.countNumberOfSimpleColumnReferences() equals resultColumns.size(), but the result column is not simple. The redundancy is the other way around: If (but not only if) all colums are simple, then (countNumberOfSimpleColumnReferences() == size()) is true.

I can submit a patch which removes this redundant checking. It doesn't seem like ResultColumnList.countNumberOfSimpleColumnReferences() is used anywhere else in the code. If I remove the call to countNumberOfSimpleColumnReferences() and it is not used anywhere else, should I then also remove the definition of the method to make the code cleaner, or should I leave the method in case it would be needed in the future?

Knut Anders Hatlen added a comment - 07/Sep/05 06:17 PM
Attached a clean-up patch against trunk which removes a redundant check pointed out by Satheesh. I have run derbyall successfully.

Knut Anders Hatlen added a comment - 07/Sep/05 06:27 PM
I have attached a patch for 10.1. Because of the svn:eol-style issue in some of the tests, I have included one patch which can be applied on windows and one which can be applied on unix-like systems. The j9_13 and j9_22 canons are modified but not tested.

Daniel John Debrunner added a comment - 08/Sep/05 01:58 AM
Couple of questions on the 10.1 patch:

1) Does this include the cleanup patch posted 07/Sep/05 11:17 AM], or just the committed change?
2) Is it a clean merge, generated from an svn merge command, or did you need to make changes?

Knut Anders Hatlen added a comment - 08/Sep/05 04:01 PM
About the 10.1 patch:

1) Yes, it includes the cleanup patch.
2) No, it is not generated from an svn merge command, but it could be done. I incorrectly assumed that it wouldn't work since applying the trunk patch directly failed. Seems like subversion is smarter than me... ;)

Applying this command will merge the committed changes into 10.1:

     svn merge -r 267238:267239 mytrunkdir my10.1dir

This doesn't include the cleanup patch, but it will fix the bug. The cleanup patch could be applied directly, but it is not important in the 10.1 backport.

There is one difference between my 10.1 patch and the svn merge. One of the j9_22 canons (java/testing/org/apache/derbyTesting/functionTests/master/j9_22/distinct.out) is different. Since I haven't tested the j9_{13,22} files, that canon must be checked by someone else anyway.

Knut Anders Hatlen added a comment - 13/Sep/05 03:37 PM
Bug fixed in revision 267239.

The clean-up patch removing redundant checking and the backport are still not committed.

Knut Anders Hatlen added a comment - 20/Sep/05 04:08 PM
Reopen the bug so the fix can be included in 10.1.2. Since the bug was reported on derby-user before 10.1.1 was released, I think it should be fixed in 10.1.2.

Knut Anders Hatlen added a comment - 20/Sep/05 04:09 PM
Changed fix version.

Knut Anders Hatlen added a comment - 07/Oct/05 04:02 PM
Fixed in trunk (revision 267239) and 10.1 (revision 306964).

Knut Anders Hatlen added a comment - 11/Jan/07 09:46 AM
Reopen to commit cleanup patch.

Knut Anders Hatlen added a comment - 11/Jan/07 09:49 AM
Committed DERBY-504-cleanup.diff to trunk with revision 495171.