Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
HIVE-1010 adds an information schema to Hive, also taking the opportunity to expose some non-standard but valuable things like statistics in a SYS table.
One common thing users want to know is the number of rows in tables, system wide.
This information is in the table_params table but the structure of this table makes it quite inconvenient to access since it is essentially a table of key-value pairs. More table stats are likely to be added over time, especially because of ACID. It would be a lot better if this were a first class table.
For what it's worth I deal with the current table by pivoting it into something easier to deal with as follows:
create view table_stats as select tbl_id, max(case param_key when 'COLUMN_STATS_ACCURATE' then param_value end) as COLUMN_STATS_ACCURATE, max(case param_key when 'numFiles' then param_value end) as numFiles, max(case param_key when 'numRows' then param_value end) as numRows, max(case param_key when 'rawDataSize' then param_value end) as rawDataSize, max(case param_key when 'totalSize' then param_value end) as totalSize, max(case param_key when 'transient_lastDdlTime' then param_value end) as transient_lastDdlTime from table_params group by tbl_id;
It would be better to not have users provide workarounds and make table stats first-class like column stats currently are.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-16946 Information Schema Improvements
- Open