Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16721

Lead/lag needs to respect nulls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • SQL

    Description

      Seems 2.0.0 changes the behavior of lead and lag to ignore nulls. This PR is changing the behavior back to 1.6's behavior, which is respecting nulls.

      For example

      SELECT
      b,
      lag(a, 1, 321) OVER (ORDER BY b) as lag,
      lead(a, 1, 321) OVER (ORDER BY b) as lead
      FROM (SELECT cast(null as int) as a, 1 as b
      UNION ALL
      select cast(null as int) as id, 2 as b) tmp
      

      This query should return

      +---+----+----+
      |  b| lag|lead|
      +---+----+----+
      |  1| 321|null|
      |  2|null| 321|
      +---+----+----+
      

      instead of

      +---+---+----+
      |  b|lag|lead|
      +---+---+----+
      |  1|321| 321|
      |  2|321| 321|
      +---+---+----+
      

      Attachments

        Activity

          People

            yhuai Yin Huai
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: