Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8331

Provide method for listing tables without statistics

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Catalog
    • None
    • ghx-label-9

    Description

      Lack of statistics can cause performance issues with Impala making "wrong" choices during the planning stage. It would be useful to provide a mechanism to list tables with without statistics, or with corrupt statistics.

      It can be done indirectly with queries on the metastore DB:

      Tables without columns stats:
      SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='COLUMN_STATS_ACCURATE' AND PARAM_VALUE='true');

      Tables without row stats
      SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows');

       

      Impala shows messages about corrupt statistics where size >0 but row=0, i.e.

       

      SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows' AND PARAM_VALUE=0 ) AND EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='totalSize' AND PARAM_VALUE>0 );

       

      (There are almost certainly more efficient queries !)

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            tonyhill Tony Hill
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: