Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22662

Failed to prune columns after rewriting predicate subquery

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      As a simple example:

      spark-sql> create table base (a int, b int) using parquet;
      Time taken: 0.066 seconds
      spark-sql> create table relInSubq ( x int, y int, z int) using parquet;
      Time taken: 0.042 seconds
      spark-sql> explain select a from base where a in (select x from relInSubq);
      == Physical Plan ==
      *Project [a#83]
      +- *BroadcastHashJoin [a#83], [x#85], LeftSemi, BuildRight
         :- *FileScan parquet default.base[a#83,b#84] Batched: true, Format: Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/base], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:int,b:int>
         +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
            +- *Project [x#85]
               +- *FileScan parquet default.relinsubq[x#85] Batched: true, Format: Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/relinsubq], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<x:int>
      

      We only need column `a` in table `base`, but all columns (`a`, `b`) are fetched.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ZenWzh Zhenhua Wang
                Reporter:
                ZenWzh Zhenhua Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: