Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7659

Collect count of nulls when collecting stats

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.0, Impala 2.12.0, Impala 3.1.0
    • Impala 3.2.0
    • Backend, Frontend
    • None
    • ghx-label-5

    Description

      When Impala calculates table stats, NULL count gets overridden with -1.
      Number of NULLs in a table is a useful information. Even if Impala does not benefit from this information, some other tools do. Thus, not collecting this information may pose a problem for Impala users (potentially forcing them to run COMPUTE STATS elsewhere).

      Now, counting NULLs should be an operation that is cheaper than counting NDVs. However, code comment in ComputeStatsStmt.java suggests otherwise (tarmstrong suggested this is because of IMPALA-7655).

      My suggestion would be to

      • improve expression used to collect NULL count
      • collect NULL count during COMPUTE STATS

      Attachments

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              findepi Piotr Findeisen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: