Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24734

Sanity check in HiveSplitGenerator available slot calculation

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Tez
    • None

    Description

      HiveSplitGenerator calculates the number of available slots from available memory like this:

      if (getContext() != null) {
        totalResource = getContext().getTotalAvailableResource().getMemory();
        taskResource = getContext().getVertexTaskResource().getMemory();
        availableSlots = totalResource / taskResource;
      }
      

      I had a scenario where the total memory was calculated correctly, but the task memory returned -1. This led to error like these:

      tez.HiveSplitGenerator: Number of input splits: 1. -3641 available slots, 1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat
      
      Estimated number of tasks: -6189 for bucket 1
      
      java.lang.IllegalArgumentException: Illegal Capacity: -6189
      

      Admittedly, this happened during development, and hopefully will not occur on a properly configured cluster. (Although I'm not sure what the issue was on my setup, possibly XMX set higher than physical memory.)

      In any case, it feels like setting availableSlots < 1 will never lead to desired behavior, so in such cases we could emit a warning and correct the value to 1.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            zmatyus Zoltan Matyus

            Dates

              Created:
              Updated:

              Slack

                Issue deployment