Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      While working on LUCENE-6652, I figured out that there were so many test with wrongly implemented TermsToBytesRefAttribute. In addition, the whole concept back from Lucene 4.0 was no longer correct:

      • We don't return the hash code anymore; it is calculated by BytesRefHash
      • The interface is horrible to use. It tends to reuse the BytesRef instance but the whole thing is not correct.

      Instead we should remove the fillBytesRef() method from the interface and let getBytesRef() populate and return the BytesRef. It does not matter if the attribute reuses the BytesRef or returns a new one. It just get consumed like a standard CharTermAttribute. You get a BytesRef and can use it until you call incrementToken().

      As the TermsToBytesRefAttribute is marked experimental, I see no reason why we should not change the semantics to be more easy to understand and behave like all other attributes. I will add a note to the backwards incompatible changes in Lucene 5.3.

      1. LUCENE-6653.patch
        68 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Patch with the cleanup of the attribute.

          For LUCENE-6653, it also contains the removal of test's various variants of Byte(s)TermAttributes. I moved a common variant of the attribute to the lucene/core/.../tokenattributes/BytesTermAttribute

          Show
          Uwe Schindler added a comment - Patch with the cleanup of the attribute. For LUCENE-6653 , it also contains the removal of test's various variants of Byte(s)TermAttributes. I moved a common variant of the attribute to the lucene/core/.../tokenattributes/BytesTermAttribute
          Hide
          Uwe Schindler added a comment -

          All tests pass (Lucene + Solr).

          Show
          Uwe Schindler added a comment - All tests pass (Lucene + Solr).
          Hide
          Michael McCandless added a comment -

          +1, thanks for cleaning up all those dup'd binary token streams Uwe Schindler!

          Show
          Michael McCandless added a comment - +1, thanks for cleaning up all those dup'd binary token streams Uwe Schindler !
          Hide
          Uwe Schindler added a comment -

          Mike, are you also fine with the changes to the TermToBytesRefAttribute? I would backport those and mention the change of "workflow" in the backwards incompatible changes. People will get a compile error in any case if they define own attributes using this interface, but it will for sure not affect many users (maybe only those who wnated to get binary terms), which is now easy

          Show
          Uwe Schindler added a comment - Mike, are you also fine with the changes to the TermToBytesRefAttribute? I would backport those and mention the change of "workflow" in the backwards incompatible changes. People will get a compile error in any case if they define own attributes using this interface, but it will for sure not affect many users (maybe only those who wnated to get binary terms), which is now easy
          Hide
          Robert Muir added a comment -

          +1, this patch is great!

          Show
          Robert Muir added a comment - +1, this patch is great!
          Hide
          Michael McCandless added a comment -

          Mike, are you also fine with the changes to the TermToBytesRefAttribute?

          Yes, big +1 to the new simpler API and to backport hard break to 5.x.

          Show
          Michael McCandless added a comment - Mike, are you also fine with the changes to the TermToBytesRefAttribute? Yes, big +1 to the new simpler API and to backport hard break to 5.x.
          Hide
          ASF subversion and git services added a comment -

          Commit 1688830 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1688830 ]

          LUCENE-6653, LUCENE-6652: Refactor TermToBytesRefAttribute; add oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in tests

          Show
          ASF subversion and git services added a comment - Commit 1688830 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1688830 ] LUCENE-6653 , LUCENE-6652 : Refactor TermToBytesRefAttribute; add oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in tests
          Hide
          ASF subversion and git services added a comment -

          Commit 1688845 from Uwe Schindler in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1688845 ]

          Merged revision(s) 1688830 from lucene/dev/trunk:
          LUCENE-6653, LUCENE-6652: Refactor TermToBytesRefAttribute; add oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in tests

          Show
          ASF subversion and git services added a comment - Commit 1688845 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1688845 ] Merged revision(s) 1688830 from lucene/dev/trunk: LUCENE-6653 , LUCENE-6652 : Refactor TermToBytesRefAttribute; add oal.analysis.tokenattributes.BytesTermAttribute; remove code duplication in tests
          Hide
          Uwe Schindler added a comment -

          Thanks to Robert Muir and Michael McCandless for review!

          Show
          Uwe Schindler added a comment - Thanks to Robert Muir and Michael McCandless for review!
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development