Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24367

Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • HiveServer2

    Description

      Writing lots of delta in non-partitioned table creates runtime issues, when lot of delta folders are present.

       

      Following code in HiveAlterHandler is invoked for every insert operation. It computes {{

      updateTableStatsSlow}} for every insert causing runtime delays.

       

      if (MetaStoreUtils.requireCalStats(null, null, newt, environmentContext) &&
          !isPartitionedTable) {
        Database db = msdb.getDatabase(catName, newDbName);
        assert(isReplicated == HiveMetaStore.HMSHandler.isDbReplicationTarget(db));
        // Update table stats. For partitioned table, we update stats in alterPartition()
        MetaStoreUtils.updateTableStatsSlow(db, newt, wh, false, true, environmentContext);
      }
      

      It would be good to explore whether only the newly added delta can be listed for computing stats. This would avoid huge listing call during stats collection.

      e.g queries to repro

      CREATE TABLE IF NOT EXISTS test (name String, value int);
      INSERT INTO test VALUES('K1',1);
      INSERT INTO test VALUES('K2',2);
      ..
      ..
      ..
      INSERT INTO test VALUES('K20000',2)
      
       

       

      Attachments

        Issue Links

          Activity

            People

              harshit.gupta Harshit Gupta
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h