Description
The following query throws a StringIndexOutOfBoundsException:
with v1 as ( select * from values (1, 2) as (c1, c2) ), v2 as ( select * from values (2, 3) as (c1, c2) ) select v1.c1, v1.c2, v2.c1, v2.c2, b from v1 full outer join v2 using (c1);
The query should fail anyway, since b refers to a non-existent column. But it should fail with a helpful error message, not with a StringIndexOutOfBoundsException.
The issue seems to be in StringUtils#orderSuggestedIdentifiersBySimilarity. orderSuggestedIdentifiersBySimilarity assumes that a list of candidate attributes with a mix of prefixes will never have an attribute name with an empty prefix. But in this case it does (c1 from the coalesce has no prefix, since it is not associated with any relation or subquery):
+- 'Project [c1#5, c2#6, c1#7, c2#8, 'b] +- Project [coalesce(c1#5, c1#7) AS c1#9, c2#6, c2#8] <== c1#9 has no prefix, unlike c2#6 (v1.c2) or c2#8 (v2.c2) +- Join FullOuter, (c1#5 = c1#7) :- SubqueryAlias v1 : +- CTERelationRef 0, true, [c1#5, c2#6] +- SubqueryAlias v2 +- CTERelationRef 1, true, [c1#7, c2#8]
Because of this, orderSuggestedIdentifiersBySimilarity returns a sorted list of suggestions like this:
ArrayBuffer(.c1, v1.c2, v2.c2)
UnresolvedAttribute.parseAttributeName chokes on an attribute name that starts with a namespace separator ('.').