[HBASE-5140] TableInputFormat subclass to allow N number of splits per region during MR jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Trivial
Resolution: Won't Fix
Affects Version/s: 0.90.4
Fix Version/s: None
Component/s: mapreduce
Labels:
- mapreduce
- split

Release Note:
Used the 0.90 branch for the patch but code looks compatible in trunk as well (with one deprecated method)
Tags:
mapreduce splits tableinputformat

Description

In regards to HBASE-5138 I am working on a patch for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
09/Jan/12 23:27
7 kB
Josh Wymer
Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch
10/Jan/12 04:45
9 kB
Josh Wymer
Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch.1
10/Jan/12 06:29
9 kB
Josh Wymer

Issue Links

relates to

HBASE-4063 Improve TableInputFormat to allow application to configure the number of mappers

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Josh Wymer

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Jan/12 18:59

Updated:: 12/Jun/22 20:03

Resolved:: 08/Jun/14 21:40

Time Tracking

Estimated:

72h

Remaining:

72h

Logged:

Not Specified