[SPARK-46779] Grouping by subquery with a cached relation can fail - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.2, 3.5.0, 4.0.0
Fix Version/s: 4.0.0, 3.5.1, 3.4.3
Component/s: SQL
Labels:
- pull-request-available

Description

Example:

create or replace temp view data(c1, c2) as values
(1, 2),
(1, 3),
(3, 7),
(4, 5);

cache table data;

select c1, (select count(*) from data d1 where d1.c1 = d2.c1), count(c2) from data d2 group by all;

It fails with the following error:

[INTERNAL_ERROR] Couldn't find count(1)#163L in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000
org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find count(1)#163L in [c1#78,_groupingexpression#149L,count(1)#82L] SQLSTATE: XX000

If you don't cache the view, the query succeeds.

Note, in 3.4.2 and 3.5.0 the issue happens only with cached tables, not cached views. I think that's because cached views were not getting properly deduplicated in those versions.

Attachments

Issue Links

links to

GitHub Pull Request #44806

Activity

People

Assignee:: Bruce Robbins

Reporter:: Bruce Robbins

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Jan/24 21:07

Updated:: 22/Jan/24 19:19

Resolved:: 22/Jan/24 19:19