Details
Description
In regards to HBASE-5138 I am working on a patch for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome.
Attachments
Attachments
Issue Links
- relates to
-
HBASE-4063 Improve TableInputFormat to allow application to configure the number of mappers
- Closed