Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 3.0, Impala 2.12.0
    • Component/s: None
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      If the first value of a column chunk is NaN, then mix_value = max_value = NaN.

      If the first value of a column chunk is not NaN, i.e. it is an ordinary number or +/-infinity, then in the end min_value != NaN and max_value != NaN.

       

      Until the Parquet community doesn't agree on the ordering of floating point numbers, we can make our write path consistent.

      A quick fix is to ignore NaNs when calculating min/max statistics, except for the case when all the values are NaN. This behavior would be the same as the fmax()/fmin() functions behave in the standard math library of C/C++.

      This way we can use min/max statistics and still the results remain correct, because only binary predicates that contain constants are tested against min/max statistics. In other words, if we want to get NaNs back by a predicate (e.g. 'NOT x < 3', 'x != x'), min/max statistics won't be used, ie. we will get the NaNs as well.

       

        Attachments

          Activity

            People

            • Assignee:
              boroknagyz Zoltán Borók-Nagy
              Reporter:
              boroknagyz Zoltán Borók-Nagy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: