Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2145

TokenStream.close() is called multiple times per TokenStream instance

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.9, 2.9.1, 3.0
    • None
    • None
    • Solr 1.4.0

    • New

    Description

      I have a Tokenizer that uses an external resource. I wrote this Tokenizer so that the external resource is released in its close() method.
      This should work because close() is supposed to be called when the caller is done with the TokenStream of which Tokenizer is a subclass. TokenStream's API document <http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/analysis/TokenStream.html> states:

      6. The consumer calls close() to release any resource when finished using the TokenStream. 
      

      When I used my Tokenizer from Solr 1.4.0, it did not work as expected. An error analysis suggests an instance of my Tokenizer is used even after close() is called and the external resource is released. After a further analysis it seems that it is not Solr but Lucene itself that is breaking the contract.

      This is happening in two places.

      src/java/org/apache/lucene/queryParser/QueryParser.java:

      protected Query getFieldQuery(String field, String queryText) throws ParseException {
      // Use the analyzer to get all the tokens, and then build a TermQuery,
      // PhraseQuery, or nothing based on the term count

      TokenStream source;
      try {
      source = analyzer.reusableTokenStream(field, new StringReader(queryText));
      source.reset();
      .
      .
      .
      try

      { // rewind the buffer stream buffer.reset(); // close original stream - all tokens buffered source.close(); // <---- HERE }

      src/java/org/apache/lucene/index/DocInverterPerField.java

      public void processFields(final Fieldable[] fields,
      final int count) throws IOException

      { ... }

      finally

      { stream.close(); }

      Calling close() would be good if the TokenStream is not reusable one. But when it is reusable, it might be used again, so the resource associated with the TokenStream instance should not be released. close() needs to be called selectively only when it know it is not going to be reused.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tkurosaka Kuro Kurosaka
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: