Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30335

Clarify behavior of FIRST and LAST without OVER caluse.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • Documentation, SQL
    • None

    Description

      Unlike many databases, Spark SQL allows usage of FIRST and LAST in non-analytic contexts.

       

      At the moment FIRST

       

      > first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

       

      and LAST

       

      > last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

       

      descriptions, suggest that their behavior is deterministic and many users assume that it return specific values for example when query
       

      SELECT first(foo)
      FROM (
          SELECT * FROM table ORDER BY bar
      )
      

      That however doesn't seem to be the case.

      To make situation worse, it seems to work (for example on small samples in local mode).

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              xqods9o5ekm3 xqods9o5ekm3
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: