Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Sprint 51
Description
Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses "select count from <tb_name>" for table row counting. The method is invoked in flat intermediate table redistribution step in cube building.
This stats can be loaded in metastore. It costs much less time than scanning all rows in Hive table. Since intermediate tables are created and inserted by Kylin, statistics will be automatically calculated and stored in metastore when `hive.stats.autogather` is enabled (which is the default setting for Hive).
ref Hive wiki for more detail about `numRows` stats: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE
Attachments
Issue Links
- is duplicated by
-
KYLIN-4509 get hive table rows from metadata when using beeline
- Closed
- is related to
-
KYLIN-4526 Enhance get the hive table rows
- Closed
- links to