The revised patch allows the subarray to be defined by means of Python-style offsets:
- mapred.binary.partitioner.left.offset: left Python-style offset in array
- mapred.binary.partitioner.right.offset: right Python-style offset in array
The best way to remember how these offsets work is by thinking of them as indices pointing between the array elements, with the left edge of the first element numbered 0, e.g.:
| B | B | B | B | B |
0 1 2 3 4 5
-5 -4 -3 -2 -1
The first row of numbers gives the position of the offsets 0...5 in the array; the second row gives the corresponding negative offsets. When i and j are specified as left and right offset, respectively, then all bytes between the edges labeled i and j are taken into account for the partitioning.
More generally, the indexing logic can now be customized by specifying the BinaryPartitioner.Indexer classes to be used via the following properties:
By default, FirstIndexer and LastIndexer are used (i.e. the whole byte array is taken into account for the hashing), and the offset properties trigger the usage of PosOffsetIndexer and/or NegOffsetIndexer, which implement the indexing by means of Python-style offsets.