I was looking at what it would take to make this work with .nrm file as well.
I expected there will be a test that fails currently, but there is none.
So I looked into the tests and the implementation and have a few questions:
(1) under contrib, FieldNormModifier and LengthNormModifier seem quite similar, right?
The first one sets with:
- reader.setNorm(d, fieldName,
- sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d])));
The latter with:
- byte norm = sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
- reader.setNorm(d, fieldName, norm);
Do we need to keep both?
(2) TestFieldNormModifier.testFieldWithNoNorm() calls resetNorms() for a field that does not exist. Some work is done by the modifier to collect the term frequencies, and then reader.setNorm is called but it does nothing, because there are no norms. And indeed the test verifies that there are still no norms for this field. Confusing I think. For some reason I assumed that calling resetNorms() for a field that has none, would implicitly set omitNorms to false for that field and compute it - the inverse of killNorms(). Since this is not the case, perhaps resetNorms should throw an exception in this case?
(3) I would feel safer about this feature if the test was more strict - something like TestNorms - have several fields, modify some, each in a unique way, remove some others, then at the end verify that all the values of each field norms are exactly as expected.
(4) For killNorms to work, you can first revert the index to not use .nrm, and then "kill" as before. The code knows to read .fN files, for both backwards compatibility, and for reading segments created be DocumentWriter. The following steps will do this:
- read the norms using reader.norm(field)
- write into .fN files
- remove .nrm file
- modify segmentInfo to know that it has no .nrm file.
(5) It would have been more efficient to optimize (and remove the .nrm file) once in the application, so perhaps modify the public API to take an array of fields and operate on all?