When Impala calculates table stats, NULL count gets overridden with -1.
Number of NULLs in a table is a useful information. Even if Impala does not benefit from this information, some other tools do. Thus, not collecting this information may pose a problem for Impala users (potentially forcing them to run COMPUTE STATS elsewhere).
Now, counting NULLs should be an operation that is cheaper than counting NDVs. However, code comment in ComputeStatsStmt.java suggests otherwise (Tim Armstrong suggested this is because of
My suggestion would be to
- improve expression used to collect NULL count
- collect NULL count during COMPUTE STATS