[HBASE-7743] Replace *SortReducers with Hadoop Secondary Sort - ASF JIRA

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: mapreduce, Performance
Labels:
None

Description

The mapreduce package provides two Reducer implementations, KeyValueSortReducer and PutSortReducer, which are used by Import, ImportTsv, and WALPlayer in conjunction with the HFileOutputFormat. Both of these implementations make use of a TreeSet to sort values matching a key. This reducer will OOM when rows are large.

A better solution would be to implement secondary sort of the values. That way hadoop sorts the records, spilling to disk when necessary.