Lucene - Core
  1. Lucene - Core
  2. LUCENE-5097

Add utility method to Analyzer: public final TokenStream tokenStream(String fieldName,String text)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.3.1
    • Fix Version/s: 4.4, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It might be a good idea to remove tons of useless code from tests:
      Most people use TokenStreams and Analyzers by only passing a String, wrapped by a StringReader. It would make life easier, if Analyzer would have an additional public (and final!!!) method that simply does the wrapping with StringReader by itsself. It might maybe not even needed to throw IOException (not sure)

      1. LUCENE-5097.patch
        110 kB
        Uwe Schindler
      2. LUCENE-5097.patch
        15 kB
        Uwe Schindler
      3. LUCENE-5097.patch
        13 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        Uwe Schindler added a comment -

        Another suggestion here:
        Currently we have a crazy reuseable reader in Field.java. This one could go away, instead the Analyzer would store a resuseable reader in TokenStreamComponents/the TS cache. Field.java would be simplier as it would just call this method to get the TS from a String field.

        Show
        Uwe Schindler added a comment - Another suggestion here: Currently we have a crazy reuseable reader in Field.java. This one could go away, instead the Analyzer would store a resuseable reader in TokenStreamComponents/the TS cache. Field.java would be simplier as it would just call this method to get the TS from a String field.
        Hide
        Uwe Schindler added a comment -

        Quick patch for demonstration purposes:

        • Moved the ReusableStringReader out of Field.java to the analysis package (pkg-private - could also be a inner class in Analyzer; I did this because I wanted a separate test)
        • added a second tokenStream method that lazy inits the reusable reader and stores it in a hidden transient field of TokenStreamComponents

        This is all still a little bit hackish, but shows my idea. By this you can reuse the StringReader (without synchronization cost) and we dont need extra code in Field.java handling the field reuse.

        Show
        Uwe Schindler added a comment - Quick patch for demonstration purposes: Moved the ReusableStringReader out of Field.java to the analysis package (pkg-private - could also be a inner class in Analyzer; I did this because I wanted a separate test) added a second tokenStream method that lazy inits the reusable reader and stores it in a hidden transient field of TokenStreamComponents This is all still a little bit hackish, but shows my idea. By this you can reuse the StringReader (without synchronization cost) and we dont need extra code in Field.java handling the field reuse.
        Hide
        Uwe Schindler added a comment -

        New patch making BaseTokenStreamTestcase use this method

        Show
        Uwe Schindler added a comment - New patch making BaseTokenStreamTestcase use this method
        Hide
        Robert Muir added a comment -

        I just took a quick glance and this looks fantastic

        Show
        Robert Muir added a comment - I just took a quick glance and this looks fantastic
        Hide
        Uwe Schindler added a comment -

        Patch removing all new StringReader(...= where not needed. It got huge, so I want to commit this asap, once tests are done.

        Show
        Uwe Schindler added a comment - Patch removing all new StringReader(...= where not needed. It got huge, so I want to commit this asap, once tests are done.
        Hide
        ASF subversion and git services added a comment -

        Commit 1500862 from Uwe Schindler
        [ https://svn.apache.org/r1500862 ]

        LUCENE-5097: Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. This method uses an internal reuseable reader, which was previously only used by the Field class.

        Show
        ASF subversion and git services added a comment - Commit 1500862 from Uwe Schindler [ https://svn.apache.org/r1500862 ] LUCENE-5097 : Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. This method uses an internal reuseable reader, which was previously only used by the Field class.
        Hide
        ASF subversion and git services added a comment -

        Commit 1500864 from Uwe Schindler
        [ https://svn.apache.org/r1500864 ]

        Merged revision(s) 1500862 from lucene/dev/trunk:
        LUCENE-5097: Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. This method uses an internal reuseable reader, which was previously only used by the Field class.

        Show
        ASF subversion and git services added a comment - Commit 1500864 from Uwe Schindler [ https://svn.apache.org/r1500864 ] Merged revision(s) 1500862 from lucene/dev/trunk: LUCENE-5097 : Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. This method uses an internal reuseable reader, which was previously only used by the Field class.
        Hide
        Uwe Schindler added a comment -

        Thanks Robert for help & discussion!

        Show
        Uwe Schindler added a comment - Thanks Robert for help & discussion!
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development