OpenJPA generates wrong SQL for a query with nested subqueries, causing the result to be wrong. My attempts to rewrite the query helped to get the correct results, even though the generated alternate queries seem not to be perfect. (There might a way to write them better.)
The following query:
select u from User u where u.email=:firstEmail and u<>:firstUser and exists (select o1 from Order o1 where o1.user=u and exists(select o2 from Order o2 where o2.user=:firstUser and o1.productCode=o2.productCode))
generates the following SQL:
SELECT t0.id, t0.email, t0.name FROM users t0 WHERE (t0.email = ? AND t0.id <> ? AND EXISTS (SELECT t1.id FROM orders t1, orders t3 WHERE (t1.user_id = t0.id AND EXISTS (SELECT t2.id FROM orders t2 WHERE (t2.user_id = ? AND t3.product_code = t2.product_code)))))
Notice the superfluous join to “orders t3” which causes the whole query result to be wrong.
When the query is modified to replace the second nested subquery with a join to
select u from User u where u.email=:firstEmail and u<>:firstUser and exists (select o1 from Order o1, Order o2 where o1.user=u and o2.user=:firstUser and o1.productCode=o2.productCode)
the generated SQL is:
SELECT t0.id, t0.email, t0.name FROM users t0 WHERE (t0.email = ? AND t0.id <> ? AND EXISTS (SELECT t3.id FROM users t1 CROSS JOIN orders t2, orders t3 WHERE (t3.user_id = t0.id AND t2.user_id = ? AND t3.product_code = t2.product_code)))
which contains a strange construction of doing a cross join to „users t1“ (which is completely unused later). Still, this seems to return the correct result, at least. (In my short testing; to be honest, I do not understand what exactly is this construction doing, and if its semantics is better, equal, or worse than not doing the cross join.) However, the unnecessary join does seem to affect performance.
Finally, when I rewrote the query from the other side, to:
select o from Order o join o.user u where u.email=:firstEmail and u<>:firstUser and o.productCode in (select o1.productCode from Order o1 where o1.user=:firstUser)
The resulting SQL is:
SELECT t0.id, t0.product_code, t3.id, t3.email, t3.name FROM orders t0 INNER JOIN users t1 ON t0.user_id = t1.id LEFT OUTER JOIN users t3 ON t0.user_id = t3.id WHERE (t1.email = ? AND t0.user_id <> ? AND t0.product_code IN (SELECT t2.product_code FROM orders t2 WHERE (t2.user_id = ?)))
Which generates correct results, even though it still contains a superfluous outer join to „users t3“ (used only in the result projection) which is identical to the previous inner join, which again affects performance.
A complete reproducible project is available at my Github.