Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.0, Impala 2.12.0
    • None
    • None
    • ghx-label-5

    Description

      If the first value of a column chunk is NaN, then mix_value = max_value = NaN.

      If the first value of a column chunk is not NaN, i.e. it is an ordinary number or +/-infinity, then in the end min_value != NaN and max_value != NaN.

       

      Until the Parquet community doesn't agree on the ordering of floating point numbers, we can make our write path consistent.

      A quick fix is to ignore NaNs when calculating min/max statistics, except for the case when all the values are NaN. This behavior would be the same as the fmax()/fmin() functions behave in the standard math library of C/C++.

      This way we can use min/max statistics and still the results remain correct, because only binary predicates that contain constants are tested against min/max statistics. In other words, if we want to get NaNs back by a predicate (e.g. 'NOT x < 3', 'x != x'), min/max statistics won't be used, ie. we will get the NaNs as well.

       

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: