Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9699

Skip '-1' values when aggregating num_null incremental statistics

    XMLWordPrintableJSON

Details

    • ghx-label-12

    Description

      IMPALA-7659 added the population of NULL counts while computing stats, later IMPALA-8566 fixed an accuracy issue caused by the initialization of statistics. The initial value was changed from '-1' to '0'. The fix also contained a slight change on how the values are being summarized. Earlier the negative values were excluded from the summary:

      if (num_new_nulls >= 0) num_nulls += num_new_nulls;
      

      while in the new implementation, as these values should not be negative, the condition was removed:

      num_nulls += num_new_nulls;
      

      This change does not cause any problem for stats created after this fix, however it can make table metadata unavailable between earlier and newer releases. The metadata can be invalid if a compute incremental stats is issued on a partition because the '-1' values can decrease the column level num_nulls under '-1'. Later a smaller than '-1' num_null will fail on a precondition check when CatalogD is trying to fetch the table metadata.

      The condition should not cause any problem and due to backward compatibility reasons we should put it back.

      Attachments

        Issue Links

          Activity

            People

              tmate Tamas Mate
              tmate Tamas Mate
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: