Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Subqueries with aggregates may return wrong results due to incorrect splitting.
Let's consider a table person:
SELECT id, firstName FROM person: [1, firstName1], [2, firstName2], [3, firstName3], [4, firstName4], [5, firstName5], [6, firstName6], [7, firstName7], [8, firstName8], [9, firstName9], [10, firstName10]
The result of query SELECT COUNT(*) FROM person is 10, which is correct.
The result of query SELECT * FROM person WHERE id = 10 is [10, firstName10], which is also correct.
But the result of the query SELECT * FROM person WHERE id = (SELECT COUNT(*) FROM person) is [1, firstName1] which is completely wrong.
The root cause of this behavior is the incorrect query splitting. The latest query is split into these parts:
Map:
SELECT __Z0.ID __C0_0, __Z0.FIRSTNAME __C0_1 FROM PUBLIC.PERSON __Z0 WHERE __Z0.ID = (SELECT COUNT(*) FROM PUBLIC.PERSON __Z1)
Reduce:
SELECT __C0_0 ID, __C0_1 FIRSTNAME FROM PUBLIC.__T0
As we can see, aggregate COUNT(*) is calculated locally on each map node instead of calculating a single global aggregate and then using it in predicate.
Reproducer is attached.
Attachments
Attachments
Issue Links
- mentioned in
-
Page Loading...