Details
-
New Feature
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-9
Description
Lack of statistics can cause performance issues with Impala making "wrong" choices during the planning stage. It would be useful to provide a mechanism to list tables with without statistics, or with corrupt statistics.
It can be done indirectly with queries on the metastore DB:
Tables without columns stats:
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='COLUMN_STATS_ACCURATE' AND PARAM_VALUE='true');
Tables without row stats
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE NOT EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows');
Impala shows messages about corrupt statistics where size >0 but row=0, i.e.
SELECT DBS.NAME, TBL_NAME FROM DBS INNER JOIN TBLS ON TBLS.DB_ID=DBS.DB_ID WHERE EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='numRows' AND PARAM_VALUE=0 ) AND EXISTS (SELECT * FROM TABLE_PARAMS WHERE TABLE_PARAMS.TBL_ID=TBLS.TBL_ID AND PARAM_KEY='totalSize' AND PARAM_VALUE>0 );
(There are almost certainly more efficient queries !)