This looks great Uwe!
I'm a little worried about the tiny file case; you're checking for
SEGMENTS_* now, but many other files can be much smaller than 1/64th
of the estimated segment size.
I wonder if we should "improve" IOContext to hold the [rough]
estimated file size (not just overall segment size)... the thing is
that's sort of a hassle on codec impls.
Or: maybe, on closing the ROS/RAMFile, we can downsize the final
buffer (yes, this means copying the bytes, but that cost is vanishingly
small as the RAMDir grows). Then tiny files stay tiny, though they
are still [relatively] costly to create...
I don't this RAMDir.createOutput should publish the RAMFile until the
ROS is closed? Ie, you are not allowed to openInput on something
still opened with createOutput in any Lucene Dir impl..? This would
allow us to make RAMFile frozen (eg if ROS holds its own buffers and
then creates RAMFile on close), that requires no sync when reading?
I also don't think RAMFile should be public, ie, the only way to make
changes to a file stored in a RAMDir is via RAMOutputStream. We can
do this separately...
Maybe we should pursue a growing buffer size...? Ie, where each newly
added buffer is bigger than the one before (like ArrayUtil.oversize's
growth function)... I realize that adds complexity
(RAMInputStream.seek is more fun), but this would let tiny files use
tiny RAM and huge files use few buffers. Ie, RAMDir would scale up
and scale down well.
Separately: I noticed we still have IndexOutput.setLength, but, nobody
calls it anymore I think? (In 3.x we call this when creating a CFS).
Maybe we should remove it...