Details
-
Task
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Even though in theory we only support reading indices created with version N or N-1, in practice it is possible to run a forceMerge in order to make Lucene accept to open the index since we only record the version that wrote segments and commit points. However as of Lucene 7.0, we also record the major version that was used to initially create the index, meaning we could also fail to open N-2 indices that have only been merged with version N-1.
The current state of things where we could read old data without knowing it raises issues with everything that is performed on top of the codec API such as analysis, input validation or norms encoding, especially now that we plan to change the defaults (LUCENE-7730).
For instance, we are only starting to reject broken offsets in term vectors in Lucene 7. If we do not enforce the index to be created with either Lucene 7 or 8 once we move to Lucene 8, then it means codecs could still be fed with broken offsets, which is a pity since assuming that offsets go forward makes things easier to encode and also potentially allows for better compression.