Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4315

Use metadata numRows in beeline client for quick row counting

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v3.1.0
    • Job Engine
    • None
    • Sprint 51

    Description

      Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses "select count from <tb_name>" for table row counting. The method is invoked in flat intermediate table redistribution step in cube building.

      This stats can be loaded in metastore. It costs much less time than scanning all rows in Hive table. Since intermediate tables are created and inserted by Kylin, statistics will be automatically calculated and stored in metastore when `hive.stats.autogather` is enabled (which is the default setting for Hive). 

      ref Hive wiki for more detail about `numRows` stats: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE

      Attachments

        Issue Links

          Activity

            People

              xiacongling Congling Xia
              xiacongling Congling Xia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: