Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10631

create_table_core method has invalid update for Fast Stats

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.0.0
    • 1.3.0, 2.0.0
    • Metastore
    • None

    Description

      HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it.

      "Fast Stats" was implemented by HIVE-3959

      https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363

      From create_table_core method

              if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
                  !MetaStoreUtils.isView(tbl)) {
                if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
                  MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir);
                } else { // Partitioned table with no partitions.
                  MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
                }
              }
      

      Particularly Line 1363: // Partitioned table with no partitions.

      MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
      

      This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true

      Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions.
      Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments

      Attachments

        1. HIVE-10631.patch
          5 kB
          Aaron Tokhy
        2. HIVE-10631-branch-1.0.patch
          5 kB
          Aaron Tokhy

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            aartokhy Aaron Tokhy Assign to me
            dongwook Dongwook Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment