Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7659

Collect count of nulls when collecting stats

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 3.0, Impala 2.12.0, Impala 3.1.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Backend, Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      When Impala calculates table stats, NULL count gets overridden with -1.
      Number of NULLs in a table is a useful information. Even if Impala does not benefit from this information, some other tools do. Thus, not collecting this information may pose a problem for Impala users (potentially forcing them to run COMPUTE STATS elsewhere).

      Now, counting NULLs should be an operation that is cheaper than counting NDVs. However, code comment in ComputeStatsStmt.java suggests otherwise (Tim Armstrong suggested this is because of IMPALA-7655).

      My suggestion would be to

      • improve expression used to collect NULL count
      • collect NULL count during COMPUTE STATS

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv Bharath Vissapragada
                Reporter:
                findepi Piotr Findeisen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: