Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 7.0, 6.2
    • Fix Version/s: 7.0, 6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Originally reported to the mailing list: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201607.mbox/%3cCAJ0VynnMAH7N7byPevTV9Htxo-Nk-B7mwUwRgP4X8gN=V4pYBg@mail.gmail.com%3e

      LUCENE-7355 made a change to CustomAnalyzer.createComponents() such that it uses a different AttributeFactory. https://github.com/apache/lucene-solr/commit/e92a38af90d12e51390b4307ccbe0c24ac7b6b4e#diff-b39a076156e10aa7a4ba86af0357a0feL122

      The previous default was TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY which uses PackedTokenAttributeImpl while the new default is now AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY which does not use PackedTokenAttributeImpl.

      Uwe Schindler Asked me to open an issue for this.

      1. LUCENE-7382.patch
        1.0 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Terry,
          thanks for opening the issue. The default used by LUCENE-7355 is just wrong. I did not review the change closely. As 6.2 was not yet released , we can change this easily. I will post a patch later.

          Show
          thetaphi Uwe Schindler added a comment - Hi Terry, thanks for opening the issue. The default used by LUCENE-7355 is just wrong. I did not review the change closely. As 6.2 was not yet released , we can change this easily. I will post a patch later.
          Hide
          shebiki Terry Smith added a comment -

          Thanks, I didn't realize this would hit 6.2. I have nightly builds that follow the 6.2.0-SNAPSHOT and 7.0.0-SNAPSHOT artifacts on the ASF snapshot maven repo and this didn't hit my 6.2 branch yet.

          Show
          shebiki Terry Smith added a comment - Thanks, I didn't realize this would hit 6.2. I have nightly builds that follow the 6.2.0-SNAPSHOT and 7.0.0-SNAPSHOT artifacts on the ASF snapshot maven repo and this didn't hit my 6.2 branch yet.
          Hide
          thetaphi Uwe Schindler added a comment -

          Simple patch.

          Show
          thetaphi Uwe Schindler added a comment - Simple patch.
          Hide
          thetaphi Uwe Schindler added a comment -

          This problem affected all Tokenizers which would now suddenly used the "slower" default factory.

          Show
          thetaphi Uwe Schindler added a comment - This problem affected all Tokenizers which would now suddenly used the "slower" default factory.
          Hide
          thetaphi Uwe Schindler added a comment -

          I think the maven artifacts are not yet uptodate. This was commited not long ago.

          Show
          thetaphi Uwe Schindler added a comment - I think the maven artifacts are not yet uptodate. This was commited not long ago.
          Hide
          dsmiley David Smiley added a comment -

          Why do we have both; why the "slow" one?

          Show
          dsmiley David Smiley added a comment - Why do we have both; why the "slow" one?
          Hide
          thetaphi Uwe Schindler added a comment -

          Sorry, it's not realy slow, it just uses more memory and produces more objects. We have the "generic" one for all use cases of AttributeFactory, where we don't handle with Tokens, e.g. FuzzyQuery's term enums or other use cases. And there are many!

          The Token-specific one is just more efficient memory and speed-wise for TokenStreams - and because of that it is defined there. It just optimizes the case of standard token attributes like term, offsets, positions,... Otherwise it inherits/delegates to the default - so we still need the default.

          Show
          thetaphi Uwe Schindler added a comment - Sorry, it's not realy slow, it just uses more memory and produces more objects. We have the "generic" one for all use cases of AttributeFactory, where we don't handle with Tokens, e.g. FuzzyQuery's term enums or other use cases. And there are many! The Token-specific one is just more efficient memory and speed-wise for TokenStreams - and because of that it is defined there. It just optimizes the case of standard token attributes like term, offsets, positions,... Otherwise it inherits/delegates to the default - so we still need the default.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2585c9f3ff750b8e551f261412625aef0e7d4a4b in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2585c9f ]

          LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the wrong default AttributeFactory for new Tokenizers

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2585c9f3ff750b8e551f261412625aef0e7d4a4b in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2585c9f ] LUCENE-7382 : Fix bug introduced by LUCENE-7355 that used the wrong default AttributeFactory for new Tokenizers
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d71a358601ad7438d9052861b816d151d11d471b in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d71a358 ]

          LUCENE-7382: Fix bug introduced by LUCENE-7355 that used the wrong default AttributeFactory for new Tokenizers

          Show
          jira-bot ASF subversion and git services added a comment - Commit d71a358601ad7438d9052861b816d151d11d471b in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d71a358 ] LUCENE-7382 : Fix bug introduced by LUCENE-7355 that used the wrong default AttributeFactory for new Tokenizers
          Hide
          thetaphi Uwe Schindler added a comment -

          Thanks Terry for reporting!

          Show
          thetaphi Uwe Schindler added a comment - Thanks Terry for reporting!
          Hide
          jpountz Adrien Grand added a comment -

          Thanks Uwe and Terry!

          Show
          jpountz Adrien Grand added a comment - Thanks Uwe and Terry!
          Hide
          mikemccand Michael McCandless added a comment -

          Bulk close resolved issues after 6.2.0 release.

          Show
          mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

            People

            • Assignee:
              thetaphi Uwe Schindler
              Reporter:
              shebiki Terry Smith
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development