Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16143

Improve msck repair batching

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.4.0, 3.0.0
    • None

    Description

      Currently, the msck repair table command batches the number of partitions created in the metastore using the config HIVE_MSCK_REPAIR_BATCH_SIZE. Following snippet shows the batching logic. There can be couple of improvements to this batching logic:

       
      int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
                if (batch_size > 0 && partsNotInMs.size() > batch_size) {
                  int counter = 0;
                  for (CheckResult.PartitionResult part : partsNotInMs) {
                    counter++;
                    apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
                    repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName()
                        + ':' + part.getPartitionName());
                    if (counter % batch_size == 0 || counter == partsNotInMs.size()) {
                      db.createPartitions(apd);
                      apd = new AddPartitionDesc(table.getDbName(), table.getTableName(), false);
                    }
                  }
                } else {
                  for (CheckResult.PartitionResult part : partsNotInMs) {
                    apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
                    repairOutput.add("Repair: Added partition to metastore " + msckDesc.getTableName()
                        + ':' + part.getPartitionName());
                  }
                  db.createPartitions(apd);
                }
              } catch (Exception e) {
                LOG.info("Could not bulk-add partitions to metastore; trying one by one", e);
                repairOutput.clear();
                msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
              }
      

      1. If the batch size is too aggressive the code falls back to adding partitions one by one which is almost always very slow. It is easily possible that users increase the batch size to higher value to make the command run faster but end up with a worse performance because code falls back to adding one by one. Users are then expected to determine the tuned value of batch size which works well for their environment. I think the code could handle this situation better by exponentially decaying the batch size instead of falling back to one by one.
      2. The other issue with this implementation is if lets say first batch succeeds and the second one fails, the code tries to add all the partitions one by one irrespective of whether some of the were successfully added or not. If we need to fall back to one by one we should atleast remove the ones which we know for sure are already added successfully.

      Attachments

        1. HIVE-16143.10-branch-2.patch
          56 kB
          Vihang Karajgaonkar
        2. HIVE-16143.10-branch-2.patch
          56 kB
          Vihang Karajgaonkar
        3. HIVE-16143.09.patch
          56 kB
          Vihang Karajgaonkar
        4. HIVE-16143.08.patch
          54 kB
          Vihang Karajgaonkar
        5. HIVE-16143.07.patch
          55 kB
          Vihang Karajgaonkar
        6. HIVE-16143.06.patch
          49 kB
          Vihang Karajgaonkar
        7. HIVE-16143.05.patch
          49 kB
          Vihang Karajgaonkar
        8. HIVE-16143.04.patch
          34 kB
          Vihang Karajgaonkar
        9. HIVE-16143.03.patch
          32 kB
          Vihang Karajgaonkar
        10. HIVE-16143.02.patch
          31 kB
          Vihang Karajgaonkar
        11. HIVE-16143.01.patch
          26 kB
          Vihang Karajgaonkar

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vihangk1 Vihang Karajgaonkar Assign to me
            vihangk1 Vihang Karajgaonkar
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment