Issue Details (XML | Word | Printable)

Key: LUCENE-741
Type: New Feature New Feature
Status: Reopened Reopened
Priority: Minor Minor
Assignee: Otis Gospodnetic
Reporter: Otis Gospodnetic
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

Field norm modifier (CLI tool)

Created: 11/Dec/06 09:41 PM   Updated: 12/Jan/07 07:19 AM
Component/s: Index
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works for.nrm.patch 2007-01-12 07:13 AM Doron Cohen 15 kB
Text File Licensed for inclusion in ASF works LUCENE-741.patch 2007-01-11 11:29 AM Otis Gospodnetic 11 kB
Text File Licensed for inclusion in ASF works LUCENE-741.patch 2006-12-11 09:45 PM Otis Gospodnetic 12 kB

Lucene Fields: Patch Available


 Description  « Hide
I took Chris' LengthNormModifier (contrib/misc) and modified it slightly, to allow us to set fake norms on an existing fields, effectively making it equivalent to Field.Index.NO_NORMS.

This is related to LUCENE-448 (NO_NORMS patch) and LUCENE-496 (LengthNormModifier contrib from Chris).



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Otis Gospodnetic added a comment - 20/Dec/06 10:33 PM
Committed. I'll also remove the old version of this code (+ its unit test), the one that still lives in contrib/miscellaneous/src/java/org/apache/lucene/misc/ .

Otis Gospodnetic added a comment - 11/Jan/07 05:02 AM
The norm-removing functionality was bogus - it simply "normalized the norms" to be 1 for the given field, but did not completely remove norms for a field, and did not flip the omitNorms bit for the given field, so it was never a true NO_NORMS field.

I'll upload a new patch that does this, but it does it only for Lucene 2.0.0 and Lucene 2.1-dev before the new .nrm changes from LUCENE-756 were committed.


Doron Cohen added a comment - 12/Jan/07 06:39 AM
I was looking at what it would take to make this work with .nrm file as well.
I expected there will be a test that fails currently, but there is none.
So I looked into the tests and the implementation and have a few questions:

(1) under contrib, FieldNormModifier and LengthNormModifier seem quite similar, right?
The first one sets with:

  • reader.setNorm(d, fieldName,
  • sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d])));
    The latter with:
  • byte norm = sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
  • reader.setNorm(d, fieldName, norm);
    Do we need to keep both?

(2) TestFieldNormModifier.testFieldWithNoNorm() calls resetNorms() for a field that does not exist. Some work is done by the modifier to collect the term frequencies, and then reader.setNorm is called but it does nothing, because there are no norms. And indeed the test verifies that there are still no norms for this field. Confusing I think. For some reason I assumed that calling resetNorms() for a field that has none, would implicitly set omitNorms to false for that field and compute it - the inverse of killNorms(). Since this is not the case, perhaps resetNorms should throw an exception in this case?

(3) I would feel safer about this feature if the test was more strict - something like TestNorms - have several fields, modify some, each in a unique way, remove some others, then at the end verify that all the values of each field norms are exactly as expected.

(4) For killNorms to work, you can first revert the index to not use .nrm, and then "kill" as before. The code knows to read .fN files, for both backwards compatibility, and for reading segments created be DocumentWriter. The following steps will do this:

  • read the norms using reader.norm(field)
  • write into .fN files
  • remove .nrm file
  • modify segmentInfo to know that it has no .nrm file.

(5) It would have been more efficient to optimize (and remove the .nrm file) once in the application, so perhaps modify the public API to take an array of fields and operate on all?


Doron Cohen added a comment - 12/Jan/07 07:19 AM
Attached for.nrm.patch was very noisy - so I replaced it with one created with
svn diff -x --ignore-eol-style contrib/miscellaneous
It is relative to trunk.

A test is added to TestFieldNormModifier.java - testModifiedNormValuesCombinedWithKill - that verifies exactly what are the values of norms after modification.

FieldNormModifier modified to handle .nrm file as outlined above.