Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18743

CREATE TABLE on S3 data can be extremely slow. DO_NOT_UPDATE_STATS workaround is buggy.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.0, 2.0.2, 3.0.0
    • Fix Version/s: 3.0.0, 2.4.0, 3.1.0
    • Component/s: Metastore
    • Labels:
      None

      Description

      When hive.stats.autogather=true then the Metastore lists all files under the table directory to populate basic stats like file counts and sizes. This file listing operation can be very expensive particularly on filesystems like S3.

      One way to address this issue is to reconfigure hive.stats.autogather=false.

      Here's the bug
      It is my understanding that the DO_NOT_UPDATE_STATS table property is intended to selectively prevent this stats collection. Unfortunately, this table property is checked after the expensive file listing operation, so the DO_NOT_UPDATE_STATS does not seem to work as intended. See:

      https://github.com/apache/hive/blob/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java#L633

      Relevant code snippet:

        public static boolean updateTableStatsFast(Database db, Table tbl, Warehouse wh,
                                                   boolean madeDir, boolean forceRecompute, EnvironmentContext environmentContext) throws MetaException {
          if (tbl.getPartitionKeysSize() == 0) {
            // Update stats only when unpartitioned
            FileStatus[] fileStatuses = wh.getFileStatusesForUnpartitionedTable(db, tbl);
            return updateTableStatsFast(tbl, fileStatuses, madeDir, forceRecompute, environmentContext); <--- DO_NOT_UPDATE_STATS is checked in here after wh.getFileStatusesForUnpartitionedTable() has already been called
          } else {
            return false;
          }
        }
      

        Attachments

        1. HIVE-18743.01-branch-2.patch
          19 kB
          Alexander Kolbasov
        2. HIVE-18743.01.patch
          19 kB
          Alexander Kolbasov

          Issue Links

            Activity

              People

              • Assignee:
                akolb Alexander Kolbasov
                Reporter:
                alex.behm Alexander Behm
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: