Lucene - Core
  1. Lucene - Core
  2. LUCENE-2235

implement PerFieldAnalyzerWrapper.getOffsetGap

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 "PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap")

      1. PerFieldAnalyzerWrapper.patch
        0.7 kB
        Javier Godoy
      2. LUCENE-2235.patch
        1 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Thanks, Nick!

          Show
          Uwe Schindler added a comment - Thanks, Nick!
          Hide
          Nick Pellow added a comment -

          Thanks for the clarification, Uwe. I wasn't sure if null Analyzers were meant to be accepted or not. I was upgrading some existing code from 3.0.2 to 3.0.3 and stumbled across that, so its good to know.

          I've created LUCENE-2801 to track the real reason the check should be done too!

          Show
          Nick Pellow added a comment - Thanks for the clarification, Uwe. I wasn't sure if null Analyzers were meant to be accepted or not. I was upgrading some existing code from 3.0.2 to 3.0.3 and stumbled across that, so its good to know. I've created LUCENE-2801 to track the real reason the check should be done too!
          Hide
          Uwe Schindler added a comment -

          To come back to the original issue:

          Should this be checking that a field is indeed analyzed before calling getOffsetGap ?

          In my opinion this should be done (and so this issue would disappear). Can you open another issue requesting this check and link it to this one?

          One problem coming from not checking for "analyzed" is this:
          You add a field indexed and it gets analyzed by PFAW - After that you add the same field name stored-only (which is perfectly legal and often used, e.g. when the stored value is binary or in some other format and does not correspond to the indexed text), the positionIncrement is increased. After that you again add another instance of the same field as indexed-only, which also increases posIncr. So you have 2 times the gap between both indexed sub-fields. This is definitely wrong.

          Show
          Uwe Schindler added a comment - To come back to the original issue: Should this be checking that a field is indeed analyzed before calling getOffsetGap ? In my opinion this should be done (and so this issue would disappear). Can you open another issue requesting this check and link it to this one? One problem coming from not checking for "analyzed" is this: You add a field indexed and it gets analyzed by PFAW - After that you add the same field name stored-only (which is perfectly legal and often used, e.g. when the stored value is binary or in some other format and does not correspond to the indexed text), the positionIncrement is increased. After that you again add another instance of the same field as indexed-only, which also increases posIncr. So you have 2 times the gap between both indexed sub-fields. This is definitely wrong.
          Hide
          Uwe Schindler added a comment -

          Hi Nick,
          thanks for reporting this. Your problem only occurs since the missing method was added (before PFAW only returned some default, now it throws NPE) in that case.

          In general, Lucene does not support null analyzers anywhere (not as ctor argument in IW/IWC) or e.g. here. You should always add a "simple" analyzer to IndexWriter (WhitespaceAnalyzer, SimpleAnalyzer, KeywordAnalyzer) or other methods taking Analyzer.

          To really fix this, we have to review all places that don't need to call Analyzers. There are e.g. other places, like when you directly pass the TokenStream to the Field with "new Field(name, TokenStream), it also calls the analyzer, so you have to implement it.

          Show
          Uwe Schindler added a comment - Hi Nick, thanks for reporting this. Your problem only occurs since the missing method was added (before PFAW only returned some default, now it throws NPE) in that case. In general, Lucene does not support null analyzers anywhere (not as ctor argument in IW/IWC) or e.g. here. You should always add a "simple" analyzer to IndexWriter (WhitespaceAnalyzer, SimpleAnalyzer, KeywordAnalyzer) or other methods taking Analyzer. To really fix this, we have to review all places that don't need to call Analyzers. There are e.g. other places, like when you directly pass the TokenStream to the Field with "new Field(name, TokenStream), it also calls the analyzer, so you have to implement it.
          Hide
          Nick Pellow added a comment -

          I just upgraded to 3.0.3 and we started getting NullPointerExceptions coming from PerFieldAnalyzerWrapper.
          We have a PerFieldAnalyzerWrapper that has a null defaultAnalyzer:

          private final PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(null);
          

          We add analyzers to all fields that are analyzed. ie: field.isAnalyzed() == true.
          getOffsetGap on PerFieldAnalyzerWrapper is being called, even for these non-analyzed fields. Is this expected behaviour?

          Lines 200-203 of DocInverterPerField are:

                  if (anyToken)
                    fieldState.offset += docState.analyzer.getOffsetGap(field);
                  fieldState.boost *= field.getBoost();
                }
          
          

          Should this be checking that a field is indeed analyzed before calling getOffsetGap ?

          Show
          Nick Pellow added a comment - I just upgraded to 3.0.3 and we started getting NullPointerExceptions coming from PerFieldAnalyzerWrapper. We have a PerFieldAnalyzerWrapper that has a null defaultAnalyzer: private final PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper( null ); We add analyzers to all fields that are analyzed. ie: field.isAnalyzed() == true. getOffsetGap on PerFieldAnalyzerWrapper is being called, even for these non-analyzed fields. Is this expected behaviour? Lines 200-203 of DocInverterPerField are: if (anyToken) fieldState.offset += docState.analyzer.getOffsetGap(field); fieldState.boost *= field.getBoost(); } Should this be checking that a field is indeed analyzed before calling getOffsetGap ?
          Hide
          Uwe Schindler added a comment -

          Committed 3.0 branch revision: 1028915
          Committed 2.9 branch revision: 1028918

          Show
          Uwe Schindler added a comment - Committed 3.0 branch revision: 1028915 Committed 2.9 branch revision: 1028918
          Hide
          Robert Muir added a comment -

          reopening for possible 2.9.4/3.0.3 backport.

          Show
          Robert Muir added a comment - reopening for possible 2.9.4/3.0.3 backport.
          Hide
          Uwe Schindler added a comment -

          Committed revision: 903368

          Thanks Javier!

          Show
          Uwe Schindler added a comment - Committed revision: 903368 Thanks Javier!
          Hide
          Uwe Schindler added a comment -

          Updates patch, please create the svn diff against the trunk (root of checkout) directory next time!

          Also added missing import statement and changes.txt entry.

          Show
          Uwe Schindler added a comment - Updates patch, please create the svn diff against the trunk (root of checkout) directory next time! Also added missing import statement and changes.txt entry.
          Hide
          Uwe Schindler added a comment -

          Looks good! Will commit this in a day or two.

          Show
          Uwe Schindler added a comment - Looks good! Will commit this in a day or two.
          Hide
          Javier Godoy added a comment -

          Patch

          Show
          Javier Godoy added a comment - Patch

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Javier Godoy
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development