Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14791

Batch Deletes in MapReduce jobs (0.98)

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.98.16
    • 0.98.17
    • None
    • Reviewed

    Description

      We found that some of our copy table job run for many hours, even when there isn't that much data to copy.

      Vikas Vishwakarma did his magic and found that the issue is with copying delete markers (we use raw mode to also move deletes across).
      Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched and hence sent to the other side one by one, causing a network RTT for each delete marker.

      Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alexaraujo Alex Araujo Assign to me
            larsh Lars Hofhansl
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment