Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 2.1
-
None
Description
In order to support a dplyr source for Impala (see https://github.com/piersharding/dplyrimpaladb and https://github.com/hadley/dplyr/issues/383#issuecomment-46007851), we should add support for selecting a non-qualified column ref when there's a equijoin with a USING clause on that column.
For example, the following query fails today:
select `hofid`, `playerid` FROM (SELECT `playerid`, `hofid`, FROM `Master`) as t1 INNER JOIN (SELECT `hofid` , `votedby` FROM `HallOfFame` WHERE `inducted` = 'Y') as t2 USING (`hofid`) LIMIT 10; ERROR: AnalysisException: Unqualified column reference 'hofid' is ambiguous
At least PostgreSQL do support this. We'd have to pick one of the two columns to return. If there's a cast involved, we need to decide how to pick one. E.g. if hofid is an INT on one table and a BIGINT on the other. In a simple test it looks to me like PostgreSQL picked the larger type. We should also find out of this is actually part of the SQL spec or if PostgreSQL (and other DBs, I haven't checked) just happen to support this.