Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 7.0
    • Fix Version/s: 7.0, 6.5
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Using an EdgeNGramTokenFilter after a DelimitedPayloadTokenFilter discards the payloads, where as most other filters copy the payload to the new tokens.

      I added a test for this issue and a possible fix at https://github.com/xabbu42/lucene-solr/tree/edgepayloads

      Greetings
      Nathan Gass

        Issue Links

          Activity

          Hide
          thetaphi Uwe Schindler added a comment -

          Hi, could you create a Pull Request and add the link here?

          About your branch: I would not use cloneAttributes() because thats slow for this simple case. cloneAttributes() only helps if you want to modify the attributes in the AttributeSource that was created, but is not useful for simple save/restore use cases.

          For your case, you should simple use captureState(), save the state object and then call restorestate() instead of clearAttributes(). After restoring you can adapt term text and positions/offsets. In addition when you clone or capture state, the call to clearAttributes() is useless and also slows down. When restoring states, everything is restored, so the additional clearing before is not needed.

          Show
          thetaphi Uwe Schindler added a comment - Hi, could you create a Pull Request and add the link here? About your branch: I would not use cloneAttributes() because thats slow for this simple case. cloneAttributes() only helps if you want to modify the attributes in the AttributeSource that was created, but is not useful for simple save/restore use cases. For your case, you should simple use captureState(), save the state object and then call restorestate() instead of clearAttributes(). After restoring you can adapt term text and positions/offsets. In addition when you clone or capture state, the call to clearAttributes() is useless and also slows down. When restoring states, everything is restored, so the additional clearing before is not needed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user xabbu42 opened a pull request:

          https://github.com/apache/lucene-solr/pull/138

          EdgeNGramTokenFilter drops payloads

          Test and fix for https://issues.apache.org/jira/browse/LUCENE-7630.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/xabbu42/lucene-solr edgepayloads

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/138.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #138


          commit 61e45283061ae486acc5882c5a770025c8291222
          Author: Nathan Gass <gass@search.ch>
          Date: 2017-01-09T13:59:31Z

          add test that EdgeNGram filter keeps payloads

          commit 6570e6ecc2b14a28da9873948083791ba47145d0
          Author: Nathan Gass <gass@search.ch>
          Date: 2017-01-09T14:00:21Z

          copy all attributes including payload to new tokens

          commit 01f2a87c67392a86b533d0c76ba7666845d1945f
          Author: Nathan Gass <gass@search.ch>
          Date: 2017-01-13T14:54:07Z

          use captureState and restoreState instead of cloneAttributes


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user xabbu42 opened a pull request: https://github.com/apache/lucene-solr/pull/138 EdgeNGramTokenFilter drops payloads Test and fix for https://issues.apache.org/jira/browse/LUCENE-7630 . You can merge this pull request into a Git repository by running: $ git pull https://github.com/xabbu42/lucene-solr edgepayloads Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #138 commit 61e45283061ae486acc5882c5a770025c8291222 Author: Nathan Gass <gass@search.ch> Date: 2017-01-09T13:59:31Z add test that EdgeNGram filter keeps payloads commit 6570e6ecc2b14a28da9873948083791ba47145d0 Author: Nathan Gass <gass@search.ch> Date: 2017-01-09T14:00:21Z copy all attributes including payload to new tokens commit 01f2a87c67392a86b533d0c76ba7666845d1945f Author: Nathan Gass <gass@search.ch> Date: 2017-01-13T14:54:07Z use captureState and restoreState instead of cloneAttributes
          Hide
          xabbu42 Nathan Gass added a comment -

          I commited the suggested improvements and made a pull request https://github.com/apache/lucene-solr/pull/138.

          The NGramTokenFilter probably has the same issue. I can port the fix to that class when everything is correct.

          Show
          xabbu42 Nathan Gass added a comment - I commited the suggested improvements and made a pull request https://github.com/apache/lucene-solr/pull/138 . The NGramTokenFilter probably has the same issue. I can port the fix to that class when everything is correct.
          Hide
          thetaphi Uwe Schindler added a comment -

          The NGramTokenFilter probably has the same issue. I can port the fix to that class when everything is correct.

          Please do! You can update the current PR. Otheriwise PR looks fine.

          Show
          thetaphi Uwe Schindler added a comment - The NGramTokenFilter probably has the same issue. I can port the fix to that class when everything is correct. Please do! You can update the current PR. Otheriwise PR looks fine.
          Hide
          xabbu42 Nathan Gass added a comment -

          done

          Show
          xabbu42 Nathan Gass added a comment - done
          Hide
          thetaphi Uwe Schindler added a comment -

          Thanks, I will merge and commit this after some testing!

          Show
          thetaphi Uwe Schindler added a comment - Thanks, I will merge and commit this after some testing!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c64a01158e972176256e257d6c1d4629b05783a2 in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c64a011 ]

          LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads and preserve all attributes
          [merge branch 'edgepayloads' from Nathan Gass https://github.com/xabbu42/lucene-solr]

          Signed-off-by: Uwe Schindler <uschindler@apache.org>

          Show
          jira-bot ASF subversion and git services added a comment - Commit c64a01158e972176256e257d6c1d4629b05783a2 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c64a011 ] LUCENE-7630 : Fix (Edge)NGramTokenFilter to no longer drop payloads and preserve all attributes [merge branch 'edgepayloads' from Nathan Gass https://github.com/xabbu42/lucene-solr] Signed-off-by: Uwe Schindler <uschindler@apache.org>
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/lucene-solr/pull/138

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/138
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a69c632aa54d064515152145bcbcbe1e869d7061 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a69c632 ]

          LUCENE-7630: Fix (Edge)NGramTokenFilter to no longer drop payloads and preserve all attributes
          [merge branch 'edgepayloads' from Nathan Gass https://github.com/xabbu42/lucene-solr]

          Signed-off-by: Uwe Schindler <uschindler@apache.org>

          Show
          jira-bot ASF subversion and git services added a comment - Commit a69c632aa54d064515152145bcbcbe1e869d7061 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a69c632 ] LUCENE-7630 : Fix (Edge)NGramTokenFilter to no longer drop payloads and preserve all attributes [merge branch 'edgepayloads' from Nathan Gass https://github.com/xabbu42/lucene-solr] Signed-off-by: Uwe Schindler <uschindler@apache.org>
          Hide
          thetaphi Uwe Schindler added a comment -

          Thanks Nathan!

          Show
          thetaphi Uwe Schindler added a comment - Thanks Nathan!

            People

            • Assignee:
              thetaphi Uwe Schindler
              Reporter:
              xabbu42 Nathan Gass
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development