Details

    • Type: New Feature New Feature
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      I took Chris' LengthNormModifier (contrib/misc) and modified it slightly, to allow us to set fake norms on an existing fields, effectively making it equivalent to Field.Index.NO_NORMS.

      This is related to LUCENE-448 (NO_NORMS patch) and LUCENE-496 (LengthNormModifier contrib from Chris).

      1. for.nrm.patch
        15 kB
        Doron Cohen
      2. LUCENE-741.patch
        11 kB
        Otis Gospodnetic
      3. LUCENE-741.patch
        12 kB
        Otis Gospodnetic

        Activity

        Hide
        Otis Gospodnetic added a comment -

        Committed. I'll also remove the old version of this code (+ its unit test), the one that still lives in contrib/miscellaneous/src/java/org/apache/lucene/misc/ .

        Show
        Otis Gospodnetic added a comment - Committed. I'll also remove the old version of this code (+ its unit test), the one that still lives in contrib/miscellaneous/src/java/org/apache/lucene/misc/ .
        Hide
        Otis Gospodnetic added a comment -

        The norm-removing functionality was bogus - it simply "normalized the norms" to be 1 for the given field, but did not completely remove norms for a field, and did not flip the omitNorms bit for the given field, so it was never a true NO_NORMS field.

        I'll upload a new patch that does this, but it does it only for Lucene 2.0.0 and Lucene 2.1-dev before the new .nrm changes from LUCENE-756 were committed.

        Show
        Otis Gospodnetic added a comment - The norm-removing functionality was bogus - it simply "normalized the norms" to be 1 for the given field, but did not completely remove norms for a field, and did not flip the omitNorms bit for the given field, so it was never a true NO_NORMS field. I'll upload a new patch that does this, but it does it only for Lucene 2.0.0 and Lucene 2.1-dev before the new .nrm changes from LUCENE-756 were committed.
        Hide
        Doron Cohen added a comment -

        I was looking at what it would take to make this work with .nrm file as well.
        I expected there will be a test that fails currently, but there is none.
        So I looked into the tests and the implementation and have a few questions:

        (1) under contrib, FieldNormModifier and LengthNormModifier seem quite similar, right?
        The first one sets with:

        • reader.setNorm(d, fieldName,
        • sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d])));
          The latter with:
        • byte norm = sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
        • reader.setNorm(d, fieldName, norm);
          Do we need to keep both?

        (2) TestFieldNormModifier.testFieldWithNoNorm() calls resetNorms() for a field that does not exist. Some work is done by the modifier to collect the term frequencies, and then reader.setNorm is called but it does nothing, because there are no norms. And indeed the test verifies that there are still no norms for this field. Confusing I think. For some reason I assumed that calling resetNorms() for a field that has none, would implicitly set omitNorms to false for that field and compute it - the inverse of killNorms(). Since this is not the case, perhaps resetNorms should throw an exception in this case?

        (3) I would feel safer about this feature if the test was more strict - something like TestNorms - have several fields, modify some, each in a unique way, remove some others, then at the end verify that all the values of each field norms are exactly as expected.

        (4) For killNorms to work, you can first revert the index to not use .nrm, and then "kill" as before. The code knows to read .fN files, for both backwards compatibility, and for reading segments created be DocumentWriter. The following steps will do this:

        • read the norms using reader.norm(field)
        • write into .fN files
        • remove .nrm file
        • modify segmentInfo to know that it has no .nrm file.

        (5) It would have been more efficient to optimize (and remove the .nrm file) once in the application, so perhaps modify the public API to take an array of fields and operate on all?

        Show
        Doron Cohen added a comment - I was looking at what it would take to make this work with .nrm file as well. I expected there will be a test that fails currently, but there is none. So I looked into the tests and the implementation and have a few questions: (1) under contrib, FieldNormModifier and LengthNormModifier seem quite similar, right? The first one sets with: reader.setNorm(d, fieldName, sim.encodeNorm(sim.lengthNorm(fieldName, termCounts [d] ))); The latter with: byte norm = sim.encodeNorm(sim.lengthNorm(fieldName, termCounts [d] )); reader.setNorm(d, fieldName, norm); Do we need to keep both? (2) TestFieldNormModifier.testFieldWithNoNorm() calls resetNorms() for a field that does not exist. Some work is done by the modifier to collect the term frequencies, and then reader.setNorm is called but it does nothing, because there are no norms. And indeed the test verifies that there are still no norms for this field. Confusing I think. For some reason I assumed that calling resetNorms() for a field that has none, would implicitly set omitNorms to false for that field and compute it - the inverse of killNorms(). Since this is not the case, perhaps resetNorms should throw an exception in this case? (3) I would feel safer about this feature if the test was more strict - something like TestNorms - have several fields, modify some, each in a unique way, remove some others, then at the end verify that all the values of each field norms are exactly as expected. (4) For killNorms to work, you can first revert the index to not use .nrm, and then "kill" as before. The code knows to read .fN files, for both backwards compatibility, and for reading segments created be DocumentWriter. The following steps will do this: read the norms using reader.norm(field) write into .fN files remove .nrm file modify segmentInfo to know that it has no .nrm file. (5) It would have been more efficient to optimize (and remove the .nrm file) once in the application, so perhaps modify the public API to take an array of fields and operate on all?
        Hide
        Doron Cohen added a comment -

        Attached for.nrm.patch was very noisy - so I replaced it with one created with
        svn diff -x --ignore-eol-style contrib/miscellaneous
        It is relative to trunk.

        A test is added to TestFieldNormModifier.java - testModifiedNormValuesCombinedWithKill - that verifies exactly what are the values of norms after modification.

        FieldNormModifier modified to handle .nrm file as outlined above.

        Show
        Doron Cohen added a comment - Attached for.nrm.patch was very noisy - so I replaced it with one created with svn diff -x --ignore-eol-style contrib/miscellaneous It is relative to trunk. A test is added to TestFieldNormModifier.java - testModifiedNormValuesCombinedWithKill - that verifies exactly what are the values of norms after modification. FieldNormModifier modified to handle .nrm file as outlined above.

          People

          • Assignee:
            Otis Gospodnetic
            Reporter:
            Otis Gospodnetic
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development