Lucene - Core
  1. Lucene - Core
  2. LUCENE-3146

IndexReader.setNorms is no op if one of the field instances omits norms

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      If I add two documents to an index w/ same field, and one of them omit norms, then IndexReader.setNorms is no-op. I'll attach a patch w/ test case

      1. LUCENE-3146.patch
        3 kB
        Shai Erera
      2. LUCENE-3146.patch
        2 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch w/ test that reproduces the problem. If you uncomment the commit() between addDoc() calls, the test passes. Could be related to recent changes w/ DWPT?

        Show
        Shai Erera added a comment - Patch w/ test that reproduces the problem. If you uncomment the commit() between addDoc() calls, the test passes. Could be related to recent changes w/ DWPT?
        Hide
        Shai Erera added a comment -

        Forgot to mention that the same test passes on 3x, even w/o the commit() in between addDoc() calls

        Show
        Shai Erera added a comment - Forgot to mention that the same test passes on 3x, even w/o the commit() in between addDoc() calls
        Hide
        Michael McCandless added a comment -

        I think this behavior, while spooky, is correct. LUCENE-2846 flipped the norms merging behavior so that omitNorm now "spreads", in 4.0 (not 3.x).

        But the javadoc in Field.NOT_ANALYZED_NO_NORMS needs to be fixed, on trunk.

        I think, also, SegmentReader.doSetNorm should throw an exc if you attempt to set norm on a field that does not have norms... it's bad that we silently ignore that... makes the app think the call succeeded when in fact nothing happened.

        Show
        Michael McCandless added a comment - I think this behavior, while spooky, is correct. LUCENE-2846 flipped the norms merging behavior so that omitNorm now "spreads", in 4.0 (not 3.x). But the javadoc in Field.NOT_ANALYZED_NO_NORMS needs to be fixed, on trunk. I think, also, SegmentReader.doSetNorm should throw an exc if you attempt to set norm on a field that does not have norms... it's bad that we silently ignore that... makes the app think the call succeeded when in fact nothing happened.
        Hide
        Michael McCandless added a comment -

        Also, in trunk, we now store a separate global field infos file (.fnx), which in theory could be used to catch this case when app attempts to use different omitNorms for different docs. However, this only tracks field name -> number mapping, not settings in the schema like omitNorms... if we added things like omitNorms, omitTF to this, then we'd have more consistent behavior here, and could fix this trap (if you uncomment the commit in the test case) where app thinks it succeeded in setting a norm but in fact on future merging it did not succeed.

        Show
        Michael McCandless added a comment - Also, in trunk, we now store a separate global field infos file (.fnx), which in theory could be used to catch this case when app attempts to use different omitNorms for different docs. However, this only tracks field name -> number mapping, not settings in the schema like omitNorms... if we added things like omitNorms, omitTF to this, then we'd have more consistent behavior here, and could fix this trap (if you uncomment the commit in the test case) where app thinks it succeeded in setting a norm but in fact on future merging it did not succeed.
        Hide
        Shai Erera added a comment -

        Adding that info to .fnx is a good idea. But I'm not sure it will solve the problem demonstrated in the test, only in different order.

        Say the app adds field "a" w/ omitNorms=false and commit(). Then .fnx records that field "a" stores norms and you can call IR.setNorms. Afterwards it adds field "a" w/ omitNorms=true – now what? Do we fail the addDocument() (I think we should)? If we don't, then norms will be merged away, according to trunk's semantics.

        So I think we should fix the jdocs + throw the ex from IR.setNorms in this issue, and open a separate one for tracking norms in .fnx + fixing the scenario I've described above?

        Do we have a ready-to-use exception? If not, how about IllegalArgEx (cause the field is an illegal arg, but weak), IllegalStateEx (cause the field does not track norms, but that ex is usually associated w/ Threads' states), IllegalOpEx (is there such ex)? At any rate, we must throw a RuntimeEx (at least in 3x) to not break apps, compile-wise.

        Show
        Shai Erera added a comment - Adding that info to .fnx is a good idea. But I'm not sure it will solve the problem demonstrated in the test, only in different order. Say the app adds field "a" w/ omitNorms=false and commit(). Then .fnx records that field "a" stores norms and you can call IR.setNorms. Afterwards it adds field "a" w/ omitNorms=true – now what? Do we fail the addDocument() (I think we should)? If we don't, then norms will be merged away, according to trunk's semantics. So I think we should fix the jdocs + throw the ex from IR.setNorms in this issue, and open a separate one for tracking norms in .fnx + fixing the scenario I've described above? Do we have a ready-to-use exception? If not, how about IllegalArgEx (cause the field is an illegal arg, but weak), IllegalStateEx (cause the field does not track norms, but that ex is usually associated w/ Threads' states), IllegalOpEx (is there such ex)? At any rate, we must throw a RuntimeEx (at least in 3x) to not break apps, compile-wise.
        Hide
        Michael McCandless added a comment -

        I agree we should first fix IR.setNorm to throw exc today... I like IllegalStateEx.

        And then open a separate issue; I agree add/updateDocument should fail. Or, maybe, we allow the case you described (and it means all norms will now be omitted for this field), but the reverse case (you have been omitting norms up until now and suddenly you try to add a doc not omitting norms) we throw exc?

        Show
        Michael McCandless added a comment - I agree we should first fix IR.setNorm to throw exc today... I like IllegalStateEx. And then open a separate issue; I agree add/updateDocument should fail. Or, maybe, we allow the case you described (and it means all norms will now be omitted for this field), but the reverse case (you have been omitting norms up until now and suddenly you try to add a doc not omitting norms) we throw exc?
        Hide
        Shai Erera added a comment -

        maybe, we allow the case you described (and it means all norms will now be omitted for this field)

        Though I don't like the idea of someone mistakenly adding a field w/ omitNorms=true, and suddenly all norms disappear, I think we must allow that scenario in order to apps to be able to disable all norms for a field. Basically, I hope that apps don't do crazy things and are consistent . I'll open a separate issue to track that.

        Show
        Shai Erera added a comment - maybe, we allow the case you described (and it means all norms will now be omitted for this field) Though I don't like the idea of someone mistakenly adding a field w/ omitNorms=true, and suddenly all norms disappear, I think we must allow that scenario in order to apps to be able to disable all norms for a field. Basically, I hope that apps don't do crazy things and are consistent . I'll open a separate issue to track that.
        Hide
        Michael McCandless added a comment -

        It's true that's a risk but... omitNorms (and omitTF) are effectively today a "global schema" because on merge Lucene cannot preserve different settings here, unlike, eg, term vectors or stored fields or the upcoming doc values, where every document is free to have its own "schema".

        The fact that apps now see different behavior depending on when Lucene happened to merge segments is awful.

        Catching/enforcing this global schema up front, now made possible because we have the fnx file, is cleaner because merge selection would no longer be "visible" in the API.

        Show
        Michael McCandless added a comment - It's true that's a risk but... omitNorms (and omitTF) are effectively today a "global schema" because on merge Lucene cannot preserve different settings here, unlike, eg, term vectors or stored fields or the upcoming doc values, where every document is free to have its own "schema". The fact that apps now see different behavior depending on when Lucene happened to merge segments is awful. Catching/enforcing this global schema up front, now made possible because we have the fnx file, is cleaner because merge selection would no longer be "visible" in the API.
        Hide
        Shai Erera added a comment -

        Patch against trunk:

        • fixes jdocs
        • SR.doSetNorms throws IllegalStateEx
        • CHANGES
        Show
        Shai Erera added a comment - Patch against trunk: fixes jdocs SR.doSetNorms throws IllegalStateEx CHANGES
        Hide
        Shai Erera added a comment -

        Committed revision 1130039 (trunk).
        Committed revision 1130041 (3x).

        Thanks Mike !

        Show
        Shai Erera added a comment - Committed revision 1130039 (trunk). Committed revision 1130041 (3x). Thanks Mike !
        Hide
        Robert Muir added a comment -

        bulk close for 3.3

        Show
        Robert Muir added a comment - bulk close for 3.3

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development