Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40002

Limit improperly pushed down through window using ntile function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0, 3.2.2
    • 3.3.1, 3.2.3, 3.4.0
    • SQL

    Description

      Limit is pushed down through a window using the ntile function, which causes results that differ from Hive 2.3.9, and Prestodb 0.268, and older versions of Spark (e.g., 3.1.3).

      Assume this data:

      create table t1 stored as parquet as
      select *
      from range(101);
      

      Also assume this query:

      select id, ntile(10) over (order by id) as nt
      from t1
      limit 10;
      

      Spark 3.2.2, Spark 3.3.0, and master produce the following:

      +---+---+
      |id |nt |
      +---+---+
      |0  |1  |
      |1  |2  |
      |2  |3  |
      |3  |4  |
      |4  |5  |
      |5  |6  |
      |6  |7  |
      |7  |8  |
      |8  |9  |
      |9  |10 |
      +---+---+
      

      However, Spark 3.1.3, Hive 2.3.9, and Prestodb 0.268 produce the following:

      +---+---+
      |id |nt |
      +---+---+
      |0  |1  |
      |1  |1  |
      |2  |1  |
      |3  |1  |
      |4  |1  |
      |5  |1  |
      |6  |1  |
      |7  |1  |
      |8  |1  |
      |9  |1  |
      +---+---+
      

      Attachments

        Issue Links

          Activity

            People

              bersprockets Bruce Robbins
              bersprockets Bruce Robbins
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: