Can we find a better name for computeN?
The meaning of n is actually a bit complicated. For every number of bits per value, there is a minimum number of blocks (b) / values (v) you need to write in order to reach the next block boundary:
- 16 bits per value -> b=1, v=4
- 24 bits per value -> b=3, v=8
- 50 bits per value -> b=25, v=32
- 63 bits per value -> b=63, v = 64
A bulk read consists in copying n*v values that are contained in n*b blocks into a long (higher values of n are likely to yield a better throughput) => this requires n * (b + v) longs in memory, this is why I compute n as ramBudget / (8 * (b + v)) (since a long is 8 bytes). I called it n in the method name because I have no idea how to name it... "iterations", maybe?
I suspect, to use these for codecs, we will want to have versions that work on int values instead (everything we encode are ints: docIDs/deltas, term freqs, offsets, positions).
I hesitated to do this since it would involve some code duplication, but I guess it can't be avoided if we want this API to be actually used... What additional methods do you think we need?
- PackedReaderIterator.nextInts(int count)
[static computeN], [code style]
You are right, I will fix it!
Does this change the on-disk format?
No, it doesn't. I will add unit tests for that...