[HBASE-1172] Modify TableInputFormat splitting algorithm to allow any number of mappers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 0.20.0
Component/s: None
Labels:
None

Description

Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used.

This will modify the splitting algorithm to do the following:

Specify 0 mappers when you want # mappers = # regions
If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm
If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions.

There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. I'm going to deal with that in a separate jira as the issue currently exists, there are a number of ways to implement this, and it's not required to complete this issue.

Attachments

Issue Links

depends upon

HBASE-1183 New MR splitting algorithm and other new features need a way to split a key range in N chunks

Closed

is part of

HBASE-1385 Revamp TableInputFormat, needs updating to match hadoop 0.20.x AND remove bit where we can make < maps than regions

Closed

Activity

People

Assignee:: Michael Stack

Reporter:: Jonathan Gray

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Jan/09 20:48

Updated:: 02/May/13 02:29

Resolved:: 26/May/09 16:57