Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1172

Modify TableInputFormat splitting algorithm to allow any number of mappers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • None
    • 0.20.0
    • None
    • None

    Description

      Currently, the number of mappers specified when using TableInputFormat is strictly followed if less than total regions on the input table. If greater, the number of regions is used.

      This will modify the splitting algorithm to do the following:

      • Specify 0 mappers when you want # mappers = # regions
      • If you specify fewer mappers than regions, will use exactly the number you specify based on the current algorithm
      • If you specify more mappers than regions, will divide regions up by determining [start,X) [X,end). The number of mappers will always be a multiple of number of regions. This is so we do not have scanners spanning multiple regions.

      There is an additional issue in that the default number of mappers in JobConf is set to 1. That means if a user does not explicitly set number of map tasks, a single mapper will be used. I'm going to deal with that in a separate jira as the issue currently exists, there are a number of ways to implement this, and it's not required to complete this issue.

      Attachments

        Issue Links

          Activity

            People

              stack Michael Stack
              streamy Jonathan Gray
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: