Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6682

StandardTokenizer performance bug: buffer is unnecessarily copied when maxTokenLength doesn't change

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      From Piotr Idzikowski on java-user mailing list http://markmail.org/message/af26kr7fermt2tfh:

      I am developing own analyzer based on StandardAnalyzer.
      I realized that tokenizer.setMaxTokenLength is called many times.

      protected TokenStreamComponents createComponents(final String fieldName,
      final Reader reader) {
          final StandardTokenizer src = new StandardTokenizer(getVersion(),
      reader);
          src.setMaxTokenLength(maxTokenLength);
          TokenStream tok = new StandardFilter(getVersion(), src);
          tok = new LowerCaseFilter(getVersion(), tok);
          tok = new StopFilter(getVersion(), tok, stopwords);
          return new TokenStreamComponents(src, tok) {
            @Override
            protected void setReader(final Reader reader) throws IOException {
              src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
              super.setReader(reader);
            }
          };
        }
      

      Does it make sense if length stays the same? I see it finally calls this
      one( in StandardTokenizerImpl ):

      public final void setBufferSize(int numChars) {
           ZZ_BUFFERSIZE = numChars;
           char[] newZzBuffer = new char[ZZ_BUFFERSIZE];
           System.arraycopy(zzBuffer, 0, newZzBuffer, 0,
      Math.min(zzBuffer.length, ZZ_BUFFERSIZE));
           zzBuffer = newZzBuffer;
         }
      

      So it just copies old array content into the new one.

        Attachments

          Activity

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              steve_rowe Steve Rowe
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: