Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32811

Replace IN predicate of continuous range with boundary checks

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      This expression 

      select a from t where a in (1, 2, 3, 3, 4)

      can be translated to 

      select a from t where a >= 1 and a <= 4 

      This would speed up parquet row group filter (currently or(or(or(or(or(eq(x, 1), eq(x, 2)), eq(x, 3), eq(x, 4.....)))) and make query more compact

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            vho Vu Ho
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: