Solr
  1. Solr
  2. SOLR-3241

Document boost fail if a field copy omit the norms

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      After https://issues.apache.org/jira/browse/LUCENE-3796, it is not possible to set a boost to a field that has the "omitNorms" set to true. This is making Solr's document index-time boost to fail when a field that doesn't omit norms is copied (with copyField) to a field that does omit them and document boost is used. For example:

      <field name="author" type="text" indexed="true" stored="false" omitNorms="false"/>
      <field name="author_display" type="string" indexed="true" stored="true" omitNorms="true"/>
      <copyField source="author" dest="author_display"/>

      I'm attaching a possible fix.

      1. SOLR-3241.patch
        11 kB
        Robert Muir
      2. SOLR-3241.patch
        8 kB
        Tomás Fernández Löbbe
      3. SOLR-3241.patch
        6 kB
        Robert Muir
      4. SOLR-3241.patch
        3 kB
        Robert Muir
      5. SOLR-3241.patch
        3 kB
        Tomás Fernández Löbbe

        Activity

        Hide
        Tomás Fernández Löbbe added a comment -

        This still doesn't work for Types with subtypes, like LatLonType and CurrencyType

        Show
        Tomás Fernández Löbbe added a comment - This still doesn't work for Types with subtypes, like LatLonType and CurrencyType
        Hide
        Hoss Man added a comment -

        part of me things we should just remove the error checking for omitNorms && boost != 1.0F from DocumentBuilder.toDocument (added in /LUCENE-3796) and just silently ignore any boost on a SolrInputField where omitNorms==true (ie: maybe log a warning, but don't throw an Exception). This would be consistent with the behavior in past releases (except for the warning log if we add that), and wouldn't cause any confusing errors for things like LatLonType (even if they come from third-party plugins we can't contro/test)

        On the other hand... that feels really dirty, and it would be nice to fail fast and loud if the client tries to set a boost on an omitNorms field

        Perhaps a better fix would be to leave DocumentBuilder exactly as it is today, and instead change FieldType.createField to (silently) ignore the boost if omitNorms==true for that SchemaField. if i'm thinking about this right, that would mean the error checking of the SolrInputDocument (and all it's SolrInputFields) in DocumentBuilder.toDocument would still work as designed – so you'd get an error if any client or "high level" plugin like an UpdateProcessor tried to use a field boost on an omitNorms field; but any fields added at a lower level (ie: by copyField or a poly field) would silently ignore those boosts if they were copied/cloned to a field where omitNorms==true.

        Show
        Hoss Man added a comment - part of me things we should just remove the error checking for omitNorms && boost != 1.0F from DocumentBuilder.toDocument (added in / LUCENE-3796 ) and just silently ignore any boost on a SolrInputField where omitNorms==true (ie: maybe log a warning, but don't throw an Exception). This would be consistent with the behavior in past releases (except for the warning log if we add that), and wouldn't cause any confusing errors for things like LatLonType (even if they come from third-party plugins we can't contro/test) On the other hand... that feels really dirty, and it would be nice to fail fast and loud if the client tries to set a boost on an omitNorms field Perhaps a better fix would be to leave DocumentBuilder exactly as it is today, and instead change FieldType.createField to (silently) ignore the boost if omitNorms==true for that SchemaField. if i'm thinking about this right, that would mean the error checking of the SolrInputDocument (and all it's SolrInputFields) in DocumentBuilder.toDocument would still work as designed – so you'd get an error if any client or "high level" plugin like an UpdateProcessor tried to use a field boost on an omitNorms field; but any fields added at a lower level (ie: by copyField or a poly field) would silently ignore those boosts if they were copied/cloned to a field where omitNorms==true.
        Hide
        Robert Muir added a comment -

        Thanks Tomás (and thanks for the patch!) I'll look into this.

        The concept of document boost doesn't really exist anymore, and I don't think we should fail for an omitNorms field
        just because of the document boost. It just won't be applied to that field.

        The point is only to fail if someone explicitly boosts an omitNorms field itself.

        Show
        Robert Muir added a comment - Thanks Tomás (and thanks for the patch!) I'll look into this. The concept of document boost doesn't really exist anymore, and I don't think we should fail for an omitNorms field just because of the document boost. It just won't be applied to that field. The point is only to fail if someone explicitly boosts an omitNorms field itself.
        Hide
        Robert Muir added a comment -

        Perhaps a better fix would be to leave DocumentBuilder exactly as it is today, and instead change FieldType.createField to (silently) ignore the boost if omitNorms==true for that SchemaField. if i'm thinking about this right, that would mean the error checking of the SolrInputDocument (and all it's SolrInputFields) in DocumentBuilder.toDocument would still work as designed – so you'd get an error if any client or "high level" plugin like an UpdateProcessor tried to use a field boost on an omitNorms field; but any fields added at a lower level (ie: by copyField or a poly field) would silently ignore those boosts if they were copied/cloned to a field where omitNorms==true.

        Hoss: this sounds good to me? we can add tests that the basic case is still working.

        The reason the logic was somewhat complicated in DocumentBuilder is because, from the lucene indexer its easy to detect this case, but:

        1. As mentioned before, "document boost" is something only in solr now, so the idea was not to fail just because you had omitNorms field and solr multiplied document boost into field boosts. But this copyField/polyField is an exception as you mention
        2. explicit handling was added to DocumentBuilder because its better for Solr to throw an exception here than rely upon lucene, lucene's exception
          is "nastier" in the sense its a lower-level non-aborting exception that will produce a deleted document.
        Show
        Robert Muir added a comment - Perhaps a better fix would be to leave DocumentBuilder exactly as it is today, and instead change FieldType.createField to (silently) ignore the boost if omitNorms==true for that SchemaField. if i'm thinking about this right, that would mean the error checking of the SolrInputDocument (and all it's SolrInputFields) in DocumentBuilder.toDocument would still work as designed – so you'd get an error if any client or "high level" plugin like an UpdateProcessor tried to use a field boost on an omitNorms field; but any fields added at a lower level (ie: by copyField or a poly field) would silently ignore those boosts if they were copied/cloned to a field where omitNorms==true. Hoss: this sounds good to me? we can add tests that the basic case is still working. The reason the logic was somewhat complicated in DocumentBuilder is because, from the lucene indexer its easy to detect this case, but: As mentioned before, "document boost" is something only in solr now, so the idea was not to fail just because you had omitNorms field and solr multiplied document boost into field boosts. But this copyField/polyField is an exception as you mention explicit handling was added to DocumentBuilder because its better for Solr to throw an exception here than rely upon lucene, lucene's exception is "nastier" in the sense its a lower-level non-aborting exception that will produce a deleted document.
        Hide
        Hoss Man added a comment -

        The reason the logic was somewhat complicated in DocumentBuilder is because, from the lucene indexer its easy to detect this case, but:

        sure ... but i think it's not actually just "Document Boost" is it? if field "foo" is declared with omitNorms==false, and a client sends a doc with a field "foo" using a fieldBoost then that should be totally fine – but if the schema says to copyField from foo->bar where bar has omitNorms==true then i think that will currently cause an from the lucene low level check, corret? (i haven't tried it, i'm going based on tomas's path) likewise if "foo" is a LatLonField (or any other polyfield) and the underlying dynamic field used has omitNorms==true then won't that same low level lucene code throw an error there?

        so multiple error paths from totally sane usage none of which has anything to do with doc boost, right?

        (Truth be told, i didn't even notice the "Document boost" part of the summary, i was just looking at tomas's patch and skimming the summary)

        Show
        Hoss Man added a comment - The reason the logic was somewhat complicated in DocumentBuilder is because, from the lucene indexer its easy to detect this case, but: sure ... but i think it's not actually just "Document Boost" is it? if field "foo" is declared with omitNorms==false, and a client sends a doc with a field "foo" using a fieldBoost then that should be totally fine – but if the schema says to copyField from foo->bar where bar has omitNorms==true then i think that will currently cause an from the lucene low level check, corret? (i haven't tried it, i'm going based on tomas's path) likewise if "foo" is a LatLonField (or any other polyfield) and the underlying dynamic field used has omitNorms==true then won't that same low level lucene code throw an error there? so multiple error paths from totally sane usage none of which has anything to do with doc boost, right? (Truth be told, i didn't even notice the "Document boost" part of the summary, i was just looking at tomas's patch and skimming the summary)
        Hide
        Robert Muir added a comment -

        but if the schema says to copyField from foo->bar where bar has omitNorms==true then i think that will currently cause an from the lucene low level check, corret?

        I have no idea.. does copyField actually copy field boosts too?

        The attached test in the patch is a case of document boosting + copyField only.

        Show
        Robert Muir added a comment - but if the schema says to copyField from foo->bar where bar has omitNorms==true then i think that will currently cause an from the lucene low level check, corret? I have no idea.. does copyField actually copy field boosts too? The attached test in the patch is a case of document boosting + copyField only.
        Hide
        Tomás Fernández Löbbe added a comment -

        I found the issue with copyfields as you mentioned Robert. foo is omitNorms=false, and bar is omitNorms=true. I have a copyfield foo->bar and I add a document like:
        <document boost=X>
        <field name=foo>AAAAAA</field>
        </document>

        This case is fixed by the patch. Testing it, i found a similar situation where a field1 is a poly type with omitNorms=false, and the subtype if it has omitNorms=true. In this case, it fails even without a copyfield just by adding a document like:

        <document boost=X>
        <field name=poly>AAA,BBB</field>
        </document>

        I don't know if it makes sense to have a poly field where the subtype have a different value in the "omitNorms" attribute, probably this should fail even before the document is added.

        Show
        Tomás Fernández Löbbe added a comment - I found the issue with copyfields as you mentioned Robert. foo is omitNorms=false, and bar is omitNorms=true. I have a copyfield foo->bar and I add a document like: <document boost=X> <field name=foo>AAAAAA</field> </document> This case is fixed by the patch. Testing it, i found a similar situation where a field1 is a poly type with omitNorms=false, and the subtype if it has omitNorms=true. In this case, it fails even without a copyfield just by adding a document like: <document boost=X> <field name=poly>AAA,BBB</field> </document> I don't know if it makes sense to have a poly field where the subtype have a different value in the "omitNorms" attribute, probably this should fail even before the document is added.
        Hide
        Robert Muir added a comment -

        updating the patch from Tomás to include an additional test for the field boost+copyField.

        we still need to add tests for this polyField case, and any other possibilities (can you polyField+copyField?)

        Show
        Robert Muir added a comment - updating the patch from Tomás to include an additional test for the field boost+copyField. we still need to add tests for this polyField case, and any other possibilities (can you polyField+copyField?)
        Hide
        Robert Muir added a comment -

        Updated patch with fixes for the polyfield case (untested!)

        After reviewing the code: Tomás had the correct fix for the copyField case, his patch fixes a logic bug, nothing more!

        The polyField case is different: its too late in DocumentBuilder to do anything here after the creation of IndexableFields: moreover we cannot nuke the whole boost for the field because we cannot assume anything just because isPolyField() == true, for example a custom field type might not even be instanceof AbstractSubTypeField!

        Because of this I think these fieldtypes should really treat the fact they use 'real fields' as an impl detail. so I added the logic to their subfield creation.

        Show
        Robert Muir added a comment - Updated patch with fixes for the polyfield case (untested!) After reviewing the code: Tomás had the correct fix for the copyField case, his patch fixes a logic bug, nothing more! The polyField case is different: its too late in DocumentBuilder to do anything here after the creation of IndexableFields: moreover we cannot nuke the whole boost for the field because we cannot assume anything just because isPolyField() == true, for example a custom field type might not even be instanceof AbstractSubTypeField! Because of this I think these fieldtypes should really treat the fact they use 'real fields' as an impl detail. so I added the logic to their subfield creation.
        Hide
        Tomás Fernández Löbbe added a comment -

        I'm attaching another patch with some more tests.
        Also updated the DocumentBuilder to use the existing logic instead of replicating it where the fix is applied.

        Show
        Tomás Fernández Löbbe added a comment - I'm attaching another patch with some more tests. Also updated the DocumentBuilder to use the existing logic instead of replicating it where the fix is applied.
        Hide
        Robert Muir added a comment -

        Thanks, patch looks good!

        I'll wait a bit for any other input.

        Show
        Robert Muir added a comment - Thanks, patch looks good! I'll wait a bit for any other input.
        Hide
        Hoss Man added a comment -

        patch looks fine ... i wish there was a way to make it easier for poly fields so they wouldn't have do do the check themselves, but when i tried the idea i had it didn't work, so better to go with this for now and maybe refactor a helper method later.

        the few changes i would make:

        1) make the new tests grab the IndexSchema obejct and assert that every field (that the cares about) has the expected omitNorms value – future proof ourselves against someone nuetering the test w/o realizing by tweaking the test schema because they don't know that there is a specific reason for those omitNorm settings

        2) add a test that explicitly verifies the failure case of someone setting field boost on a field with omitNorms==true, assert that we get the expected error mesg (doesn't look like this was added when LUCENE-3796 was commited, and we want to make sure we don't inadvertantly break that error check)

        Show
        Hoss Man added a comment - patch looks fine ... i wish there was a way to make it easier for poly fields so they wouldn't have do do the check themselves, but when i tried the idea i had it didn't work, so better to go with this for now and maybe refactor a helper method later. the few changes i would make: 1) make the new tests grab the IndexSchema obejct and assert that every field (that the cares about) has the expected omitNorms value – future proof ourselves against someone nuetering the test w/o realizing by tweaking the test schema because they don't know that there is a specific reason for those omitNorm settings 2) add a test that explicitly verifies the failure case of someone setting field boost on a field with omitNorms==true, assert that we get the expected error mesg (doesn't look like this was added when LUCENE-3796 was commited, and we want to make sure we don't inadvertantly break that error check)
        Hide
        Robert Muir added a comment -

        updated patch with hossman's suggested test improvements. I'll commit soon.

        Show
        Robert Muir added a comment - updated patch with hossman's suggested test improvements. I'll commit soon.
        Hide
        Robert Muir added a comment -

        Thanks Tomás!

        Show
        Robert Muir added a comment - Thanks Tomás!

          People

          • Assignee:
            Robert Muir
            Reporter:
            Tomás Fernández Löbbe
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development