[SPARK-38286] Union's maxRows and maxRowsPerPartition may overflow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.3, 3.1.2, 3.2.1, 3.3.0
Fix Version/s: 3.1.3, 3.0.4, 3.3.0, 3.2.2
Component/s: SQL
Labels:
None

Description

scala> val df1 = spark.range(0, Long.MaxValue, 1, 1)
df1: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> val df2 = spark.range(0, 100, 1, 10)
df2: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> val union = df1.union(df2)
union: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> union.queryExecution.logical.maxRowsPerPartition
res19: Option[Long] = Some(-9223372036854775799)

scala> union.queryExecution.logical.maxRows
res20: Option[Long] = Some(-9223372036854775709)

Attachments

Issue Links

links to

[Github] Pull Request #35609 (zhengruifeng)

Activity

People

Assignee:: Ruifeng Zheng

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Feb/22 11:41

Updated:: 24/Feb/22 02:55

Resolved:: 24/Feb/22 02:55