Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.0, Impala 2.12.0, Impala 3.1.0
-
None
-
ghx-label-5
Description
When Impala calculates table stats, NULL count gets overridden with -1.
Number of NULLs in a table is a useful information. Even if Impala does not benefit from this information, some other tools do. Thus, not collecting this information may pose a problem for Impala users (potentially forcing them to run COMPUTE STATS elsewhere).
Now, counting NULLs should be an operation that is cheaper than counting NDVs. However, code comment in ComputeStatsStmt.java suggests otherwise (tarmstrong suggested this is because of IMPALA-7655).
My suggestion would be to
- improve expression used to collect NULL count
- collect NULL count during COMPUTE STATS
Attachments
Issue Links
- is duplicated by
-
IMPALA-7497 Consider reintroducing numNulls count in compute stats
-
- Resolved
-
- is related to
-
IMPALA-8205 Illegal statistics for numFalse and numTrue
-
- Resolved
-
- relates to
-
IMPALA-7655 Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal
-
- Resolved
-