Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23721

MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.1.2, 4.0.0
    • None
    • Standalone Metastore
    • Hadoop 3.1(1700+ nodes)
      YARN 3.1 (with timelineserver enabled,https enabled)
      Hive 3.1 (15 HS2 instance)
      60000+ YARN Applications every day

    Description

      From Hive3.0,catalog added to hivemeta,many schema of metastore added column “catName”,and index for table added column “catName”。

      In MetaStoreDirectSql.ensureDbInit() ,two queries below

      initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == ''"));
      initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName == ''"));

      should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column。

      When data of metastore become large,for example, table of MPartitionColumnStatistics have millions of lines。The “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore executed very slowly,and the query “show tables“ for hiveserver2 executed very slowly too。

      Attachments

        1. HIVE-23721.01.patch
          1 kB
          Butao Zhang

        Issue Links

          Activity

            People

              zhangbutao Butao Zhang
              YulongZ YulongZ
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m