Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9447

Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0, 1.0.0, 1.2.0, 1.1.0
    • Fix Version/s: 3.0.0
    • Component/s: Metastore
    • Labels:
      None

      Description

      Metastore needs removing unused column descriptors when drop/add partitions or tables. For query the unused column descriptor, the current implementation utilizes datanuleus' range function, which basically equals LIMIT syntax. However, Oracle does not support LIMIT, the query is converted as

      SQL> SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
      'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
      NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
      A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0
      WHERE A0.CD_ID = ? ) subq ) WHERE rn <= 1;

      Given that CD_ID is not very selective, this query may have to access large amount of rows (depends how many partitions the table has, millions of rows in our case). Metastore may become unresponsive because of this.

      Since Metastore only needs to know if the specific CD_ID is referenced in SDS table and does not need access the whole row. We can use

      select count(1) from SDS where SDS.CD_ID=?

      CD_ID is index column, the above query will do range scan for index, which is faster.

      For other DBs support LIMIT syntax such as MySQL, this problem does not exist. However, the new query does not hurt.

        Attachments

        1. HIVE-9447.1.patch
          3 kB
          Selina Zhang
        2. HIVE-9447.2.patch
          3 kB
          Pengcheng Xiong
        3. HIVE-9447.3.patch
          3 kB
          Ádám Szita
        4. HIVE-9447.4.patch
          3 kB
          Ádám Szita
        5. HIVE-9447.5.patch
          3 kB
          Ádám Szita

          Issue Links

            Activity

              People

              • Assignee:
                selinazh Selina Zhang
                Reporter:
                selinazh Selina Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 3h
                  3h
                  Remaining:
                  Remaining Estimate - 3h
                  3h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified