Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22662

Failed to prune columns after rewriting predicate subquery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      As a simple example:

      spark-sql> create table base (a int, b int) using parquet;
      Time taken: 0.066 seconds
      spark-sql> create table relInSubq ( x int, y int, z int) using parquet;
      Time taken: 0.042 seconds
      spark-sql> explain select a from base where a in (select x from relInSubq);
      == Physical Plan ==
      *Project [a#83]
      +- *BroadcastHashJoin [a#83], [x#85], LeftSemi, BuildRight
         :- *FileScan parquet default.base[a#83,b#84] Batched: true, Format: Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/base], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:int,b:int>
         +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
            +- *Project [x#85]
               +- *FileScan parquet default.relinsubq[x#85] Batched: true, Format: Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/relinsubq], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<x:int>
      

      We only need column `a` in table `base`, but all columns (`a`, `b`) are fetched.

      Attachments

        Issue Links

          Activity

            People

              ZenWzh Zhenhua Wang
              ZenWzh Zhenhua Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: