Solr
  1. Solr
  2. SOLR-532

WordDelimiterFilter ignores payloads

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload. I believe this is a bug. My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text).

        Issue Links

          Activity

          Hide
          Tricia Jenkins added a comment -

          Quick fix. Does this need a unit test to go with it?

          Show
          Tricia Jenkins added a comment - Quick fix. Does this need a unit test to go with it?
          Hide
          Tricia Jenkins added a comment -

          LUCENE-1350 contains a survey of Classes that may be effected by Payloads. This is one of the Classes in Solr proper that may be effected by Payloads.

          Show
          Tricia Jenkins added a comment - LUCENE-1350 contains a survey of Classes that may be effected by Payloads. This is one of the Classes in Solr proper that may be effected by Payloads.
          Hide
          Grant Ingersoll added a comment - - edited

          I consolidated this down to take advantage of Lucene's new clone method:

          Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
          ===================================================================
          --- src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (revision 706648)
          +++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (working copy)
          @@ -236,11 +236,7 @@
                 startOff += start;     
               }
           
          -    Token newTok = new Token(startOff,
          -            endOff,
          -            orig.type());
          -    newTok.setTermBuffer(orig.termBuffer(), start, (end - start));
          -    return newTok;
          +    return (Token)orig.clone(orig.termBuffer(), start, (end - start), startOff, endOff);
             }
          

          I will likely commit today or tomorrow. Let me know if this works for you, Tricia. The tests pass for me.

          Show
          Grant Ingersoll added a comment - - edited I consolidated this down to take advantage of Lucene's new clone method: Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java =================================================================== --- src/java/org/apache/solr/analysis/WordDelimiterFilter.java (revision 706648) +++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java (working copy) @@ -236,11 +236,7 @@ startOff += start; } - Token newTok = new Token(startOff, - endOff, - orig.type()); - newTok.setTermBuffer(orig.termBuffer(), start, (end - start)); - return newTok; + return (Token)orig.clone(orig.termBuffer(), start, (end - start), startOff, endOff); } I will likely commit today or tomorrow. Let me know if this works for you, Tricia. The tests pass for me.
          Hide
          Tricia Jenkins added a comment -

          Thanks Grant. That's much cleaner using the new clone method. It works for me after catching up with the new slf4j logging. Thanks too for committing it!

          Show
          Tricia Jenkins added a comment - Thanks Grant. That's much cleaner using the new clone method. It works for me after catching up with the new slf4j logging. Thanks too for committing it!
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Tricia Jenkins
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development