[SPARK-29768] nondeterministic expression fails column pruning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.4
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Target Version/s:

3.0.0

Description

nondeterministic expression like monotonically_increasing_id fails column pruning

    spark.range(10).selectExpr("id as key", "id * 2 as value").
      write.format("parquet").save("/tmp/source")
    spark.range(10).selectExpr("id as key", "id * 3 as s1", "id * 5 as s2").
      write.format("parquet").save("/tmp/target")

    val sourceDF = spark.read.parquet("/tmp/source")
    val targetDF = spark.read.parquet("/tmp/target").
      withColumn("row_id", monotonically_increasing_id())
    sourceDF.join(targetDF, "key").select("key", "row_id").explain()

Spark reads all columns from targetDF, but actually, we only need `key` column.

scala>     sourceDF.join(targetDF, "key").select("key", "row_id").explain()
== Physical Plan ==
*(2) Project [key#78L, row_id#88L]
+- *(2) BroadcastHashJoin [key#78L], [key#82L], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
   :  +- *(1) Project [key#78L]
   :     +- *(1) Filter isnotnull(key#78L)
   :        +- *(1) FileScan parquet [key#78L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/source], PartitionFilters: [], PushedFilters: [IsNotNull(key)], ReadSchema: struct<key:bigint>
   +- *(2) Filter isnotnull(key#82L)
      +- *(2) Project [key#82L, monotonically_increasing_id() AS row_id#88L]
         +- *(2) FileScan parquet [key#82L,s1#83L,s2#84L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/target], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:bigint,s1:bigint,s2:bigint>

Attachments

Issue Links

links to

GitHub Pull Request #26629

GitHub Pull Request #27073

Activity

People

Assignee:: wuyi

Reporter:: yucai

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Nov/19 02:31

Updated:: 03/Jan/20 13:49

Resolved:: 27/Nov/19 08:03