Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5140

TableInputFormat subclass to allow N number of splits per region during MR jobs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Trivial
    • Resolution: Won't Fix
    • 0.90.4
    • None
    • mapreduce
    • Used the 0.90 branch for the patch but code looks compatible in trunk as well (with one deprecated method)
    • mapreduce splits tableinputformat

    Description

      In regards to HBASE-5138 I am working on a patch for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jbwyme Josh Wymer
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 72h
                  72h
                  Remaining:
                  Remaining Estimate - 72h
                  72h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified