Resolution: Won't Fix
Affects Version/s: 0.6.0
Fix Version/s: None
Component/s: Query Processor
Consider this query:
SELECT a.num FROM (
SELECT a.num AS num, b.num AS num2
FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
WHERE a.num2 IS NULL;
...in this case, the table alias 'a' is ambiguous. It could be the outer table (i.e., the subquery result), or it could be the inner table (foo).
In the above case, Hive silently parses the outer reference to a as the inner reference. The result, then, is akin to:
SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
The bigger problem, however, is that Hive even lets people use the same table alias at multiple points in the query. We should simply throw an exception during the parse stage if there is any ambiguity in which table is which, just like we do if the column names are ambiguous.
Or, if for some reason we need people to be able to use 'a' to refer to multiple tables or subqueries, it would be excellent if the exact parsing structure were made clear and added to the wiki. In that case, I will file a separate bug JIRA to complain about how it should be different.