Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 3.0, Impala 2.12.0
    • Component/s: None
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      (I'll only write min and max, but I'll also mean min_value and max_value by that)

      When both min and max is NaN:

      • Written by Impala:
        • first element in the row group is NaN, but not all of them (Impala writer bug)
        • all element is NaN
      • Written by Hive/Parquet-mr:
        • all element is NaN

      Either min or max is NaN, but not both:

      • Written by Impala:
        • this cannot happen currently
      • Written by Hive/Parquet-mr:
        • only the max can be NaN (needs to be checked)

      Therefore, if both min and max is NaN, we can't use the statistics for filtering.

      If only the max is NaN, we still have a valid lower bound.

       

      A workaround can be to change the NaNs to infinities, ie. max => Inf, min => -Inf

      Based on my experiments, min/max statistics are not applied to predicates that can be true for NaN, e.g. 'NOT x < 3'

        Attachments

          Activity

            People

            • Assignee:
              boroknagyz Zoltán Borók-Nagy
              Reporter:
              boroknagyz Zoltán Borók-Nagy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: