Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Resolved
-
Impala 3.3.0
-
ghx-label-12
Description
IMPALA-7659 added the population of NULL counts while computing stats, later IMPALA-8566 fixed an accuracy issue caused by the initialization of statistics. The initial value was changed from '-1' to '0'. The fix also contained a slight change on how the values are being summarized. Earlier the negative values were excluded from the summary:
if (num_new_nulls >= 0) num_nulls += num_new_nulls;
while in the new implementation, as these values should not be negative, the condition was removed:
num_nulls += num_new_nulls;
This change does not cause any problem for stats created after this fix, however it can make table metadata unavailable between earlier and newer releases. The metadata can be invalid if a compute incremental stats is issued on a partition because the '-1' values can decrease the column level num_nulls under '-1'. Later a smaller than '-1' num_null will fail on a precondition check when CatalogD is trying to fetch the table metadata.
The condition should not cause any problem and due to backward compatibility reasons we should put it back.
Attachments
Issue Links
- is duplicated by
-
IMPALA-10230 column stats num_nulls less than -1
- Resolved