[SPARK-26084] AggregateExpression.references fails on unresolved expression trees - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.1
Fix Version/s: 2.3.3, 2.4.1, 3.0.0
Component/s: SQL
Labels:
- aggregate
- regression
- sql

Description

SPARK-18394 introduced a stable ordering in AttributeSet.toSeq using expression IDs (PR-18959) without noticing that AggregateExpression.references used AttributeSet.toSeq as a shortcut (link). The net result is that AggregateExpression.references fails for unresolved aggregate functions.

org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression(
  org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr),
  mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete,
  isDistinct = false
).references

fails with

org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to exprId on unresolved object, tree: 'y
	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104)
	at org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
	at org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
	at scala.math.Ordering$$anon$5.compare(Ordering.scala:122)
	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
	at java.util.TimSort.sort(TimSort.java:220)
	at java.util.Arrays.sort(Arrays.java:1438)
	at scala.collection.SeqLike$class.sorted(SeqLike.scala:648)
	at scala.collection.AbstractSeq.sorted(Seq.scala:41)
	at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623)
	at scala.collection.AbstractSeq.sortBy(Seq.scala:41)
	at org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128)
	at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201)

The solution is to avoid calling toSeq as ordering is not important in references and simplify (and speed up) the implementation to something like

mode match {
  case Partial | Complete => aggregateFunction.references
  case PartialMerge | Final => AttributeSet(aggregateFunction.aggBufferAttributes)
}

Attachments

Issue Links

links to

[Github] Pull Request #23075 (ssimeonov)

Activity

People

Assignee:: Simeon Simeonov

Reporter:: Simeon Simeonov

Votes:: 4 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Nov/18 22:56

Updated:: 20/Nov/18 20:59

Resolved:: 20/Nov/18 20:59