Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14791

Batch Deletes in MapReduce jobs (0.98)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.98.16
    • Fix Version/s: 0.98.17
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      We found that some of our copy table job run for many hours, even when there isn't that much data to copy.

      Vikas Vishwakarma did his magic and found that the issue is with copying delete markers (we use raw mode to also move deletes across).
      Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched and hence sent to the other side one by one, causing a network RTT for each delete marker.

      Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.

        Attachments

        1. HBASE-14791-0.98.patch
          12 kB
          Alex Araujo
        2. HBASE-14791-0.98-v1.patch
          23 kB
          Alex Araujo
        3. HBASE-14791-0.98-v2.patch
          14 kB
          Alex Araujo

          Activity

            People

            • Assignee:
              alexaraujo Alex Araujo
              Reporter:
              lhofhansl Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: