Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18743

CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0, 1.2.0, 2.0.2, 3.0.0
    • 3.1.0, 2.4.0, 3.0.0
    • Metastore
    • None

    Description

      When hive.stats.autogather=true then the Metastore lists all files under the table directory to populate basic stats like file counts and sizes. This file listing operation can be very expensive particularly on filesystems like S3.

      One way to address this issue is to reconfigure hive.stats.autogather=false.

      Here's the bug
      It is my understanding that the DO_NOT_UPDATE_STATS table property is intended to selectively prevent this stats collection. Unfortunately, this table property is checked after the expensive file listing operation, so the DO_NOT_UPDATE_STATS does not seem to work as intended. See:

      https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633

      Relevant code snippet:

        public static boolean updateTableStatsFast(Database db, Table tbl, Warehouse wh,
                                                   boolean madeDir, boolean forceRecompute, EnvironmentContext environmentContext) throws MetaException {
          if (tbl.getPartitionKeysSize() == 0) {
            // Update stats only when unpartitioned
            FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, tbl);
            return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after wh.getFileStatusesForUnpartitionedTable() has already been called
          } else {
            return false;
          }
        }
      

      Attachments

        1. HIVE-18743.01.patch
          19 kB
          Alex Kolbasov
        2. HIVE-18743.01-branch-2.patch
          19 kB
          Alex Kolbasov

        Issue Links

          Activity

            People

              akolb Alex Kolbasov
              alex.behm Alexander Behm
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: