In a handful of cases, SegmentBufferWriter#flush has been observed to fail with an ISE without any other part of the system behaving incorrectly or suspiciously. This problem was reported before (
OAK-6452) but we were unable to reproduce it.
I investigated further and was able to reproduce the problem reliably in a test case. The trick is to craft a record of a very specific length that fools the SegmentBufferWriter into believing that it has enough space to store the record, even if the record consumes all the available space in the buffer, including the space reserved for the segment header. This is the reason why no call to SegmentBufferWriter#prepare fails, but calls to SegmentBufferWriter#flush do.
The attached test case executes the following steps.
- The test creates a new SegmentBufferWriter. The buffer is fresh, but it is immediately populated with the segment info, which is 38 bytes long. The size of the segment info is aligned to 40 bytes, so at the end of the initialization the segment contains 40 bytes of record data. Please note that at this point in time the buffer is not marked as dirty.
- The test prepares space for a block record of 262101 bytes, which is aligned to 262104 bytes. The code in in prepare() figures out that adding this record to the current segment would exceed the maximum size of 262144 bytes, so it flushes the current segment.
- The flush operation is skipped because the buffer is not dirty yet. The segment is not flushed.
- The code in prepare() resumes execution. Trusting that it's working on a fresh segment, big enough to contain the new record, prepare() updates the state of the SegmentBufferWriter. In particular, length is set to exactly 262144 and position to 0. Even if the record consumes all the available space in the buffer, prepare() is happy to go on because position remains nonnegative.
- The test flushes the SegmentBufferWriter. When the size of the resulting segment is computed, the size of the segment header is added to the size of the records. The size of the records is 262144 bytes, which is exactly the segment maximum size, and adding the size of the segment header obviously exceeds the maximum size for a segment. At this point, flush() fails with an ISE.