Discussing this further with Robert, it looks like this is a (smaller) part of a larger issue, in that SegmentInfo+FieldInfo should be made extensible and the process of reading/writing this information should be completely codec-specific. Let's make a separate issue for that part.
And the smaller issue discussed here is to record only the information about a commit point in a completely codec-independent, versioned format, whatever that format is. Let's call it CommitInfo or whatever other name fits. This part would be written to a file that is separate from the codec-dependent parts.
Regarding two-phase commit and checksums - one reason we have SegmentInfosWriter/Reader was the AppendingCodec, because we couldn't make it work for append-only filesystems. However, we could change the two-phase commit implementation to the following:
- write the data to the CommitInfo file
- write a marker indicating "end of data, checksum follows"
- finally, write the checksum
Then the reading code knows that:
- if there's a marker missing then the file is invalid
- if the marker is present then the checksum must be present too
- and the checksum must be correct.
This implementation doesn't require seek back / overwrite so it's supported on any filesystem.