[SPARK-27561] Support "lateral column alias references" to allow column aliases to be used within SELECT clauses - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

Amazon Redshift has a feature called "lateral column alias references": https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/. Quoting from that blogpost:

The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;

There's more information about this feature on https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_list.html:

The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority. For example, in the above query if there is a column named 'probability' in table raw_data, the 'probability' in the second expression in the target list will refer to that column instead of the alias name 'probability'.

It would be nice if Spark supported this syntax. I don't think that this is standard SQL, so it might be a good idea to research if other SQL databases support similar syntax (and to see if they implement the same column resolution strategy as Redshift).

We should also consider whether this needs to be feature-flagged as part of a specific SQL compatibility mode / dialect.

One possibly-related existing ticket: ~~SPARK-9338~~, which discusses the use of SELECT aliases in GROUP BY expressions.

/cc hvanhovell

Attachments

Issue Links

links to

[Github] Pull Request #38776 (anchovYu)

[Github] Pull Request #39040 (anchovYu)

[Github] Pull Request #39054 (gengliangwang)

(1 links to)

Sub-Tasks

1.	Support lateral column alias in Project code path	Resolved	Xinyi Yu
2.	Support lateral column alias in Aggregate code path	Resolved	Xinyi Yu
3.	Support lateral column alias in queries with Window	Resolved	Xinyi Yu
4.	Support explicit lateral virtual table name	Open	Unassigned
5.	Move most tests to .sql files	Open	Unassigned
6.	Support LCA in grouping expressions	Reopened	Unassigned
7.	Ease restriction of LCA resolution regarding queries with having	Resolved	Xinyi Yu
8.	Preempt low priority LCA internal error until the end of check analysis	Resolved	Xinyi Yu
9.	Block LCA with Generate	Resolved	Yuming Wang
10.	Support order-insensitive lateral column alias	Open	Unassigned

Activity

People

Assignee:: Xinyi Yu

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 24/Apr/19 23:06

Updated:: 13/Dec/22 21:18

Resolved:: 13/Dec/22 16:14