Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None

    Description

      1. What should NaN = NaN return?

      NaN = NaN should return true.

      2. If we see NaN in the group by key column, should we group NaN values into one group, or into different groups?

      All NaN values should be grouped together.

      3. What about NaN in join keys?

      NaN should be treated as a normal value in join keys.

      4. When aggregating over columns containing NaN, should the result be NaN, or should the result exclude NaN values (treating them like nulls)?

      This is TO BE DECIDED. By default, the behavior is to return NaN.

      5. Where should NaN go in sorting?

      NaN should go last when in ascending order, larger than any other numeric value.

      Note that 5 is much more important than the other 4 since right now the sorter throws exceptions on NaN values. See SPARK-8797.

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: