Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27956

Allow subqueries as partition filter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.3.0
    • None
    • SQL
    • None

    Description

      Subqueries are not pushed down as partition filters. See following example

       

      create table user_mayerjoh.tab (c1 string)
      partitioned by (c2 string)
      stored as parquet;
      

       

       

      explain select * from user_mayerjoh.tab where c2 < 1;

       

      == Physical Plan ==

      (1) FileScan parquet user_mayerjoh.tabc1#22,c2#23 Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, *PartitionFilters: isnotnull(c2#23), (cast(c2#23 as int) < 1), PushedFilters: [], ReadSchema: struct<c1:string>

       

       

      explain select * from user_mayerjoh.tab where c2 < (select 1);

       

      == Physical Plan ==

       

      +- (1) FileScan parquet user_mayerjoh.tabc1#30,c2#31 Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, *PartitionFilters: isnotnull(c2#31), PushedFilters: [], ReadSchema: struct<c1:string>

       

      Is it possible to first execute the subquery and use the result as partition filter?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              joha0123 Johannes Mayer
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: