[HBASE-14791] Batch Deletes in MapReduce jobs (0.98) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.98.16
Fix Version/s: 0.98.17
Component/s: None
Labels:
- mapreduce

Hadoop Flags:

Reviewed

Description

We found that some of our copy table job run for many hours, even when there isn't that much data to copy.

vik.karma did his magic and found that the issue is with copying delete markers (we use raw mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched and hence sent to the other side one by one, causing a network RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-14791-0.98.patch
17/Nov/15 21:26
12 kB
Alex Araujo
HBASE-14791-0.98-v2.patch
17/Nov/15 00:30
14 kB
Alex Araujo
HBASE-14791-0.98-v1.patch
12/Nov/15 01:18
23 kB
Alex Araujo

Activity

People

Assignee:: Alex Araujo

Reporter:: Lars Hofhansl

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Nov/15 17:53

Updated:: 19/Nov/15 19:06

Resolved:: 17/Nov/15 23:34