Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used.
This will modify the splitting algorithm to do the following:
- Specify 0 mappers when you want # mappers = # regions
- If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm
- If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions.
There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. I'm going to deal with that in a separate jira as the issue currently exists, there are a number of ways to implement this, and it's not required to complete this issue.
Attachments
Issue Links
- depends upon
-
HBASE-1183 New MR splitting algorithm and other new features need a way to split a key range in N chunks
- Closed
- is part of
-
HBASE-1385 Revamp TableInputFormat, needs updating to match hadoop 0.20.x AND remove bit where we can make < maps than regions
- Closed