[SOLR-1662] BufferedTokenStream incorrect cloning - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4
Fix Version/s: 1.4.1, 1.5, 3.1, 4.0-ALPHA
Component/s: Schema and Analysis
Labels:
None

Description

As part of writing tests for ~~SOLR-1657~~, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety.

 public static String tsToString(TokenStream in) throws IOException {
    StringBuilder out = new StringBuilder();
    TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class);
    // extra safety to enforce, that the state is not preserved and also
    // assign bogus values
    in.clearAttributes();
    termAtt.setTermBuffer("bogusTerm");
    while (in.incrementToken()) {
      if (out.length() > 0)
        out.append(' ');
      out.append(termAtt.term());
      in.clearAttributes();
      termAtt.setTermBuffer("bogusTerm");
    }

    in.close();
    return out.toString();
  }

Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B -> A A B but does not clone, so the values get overwritten.

This can be fixed in two ways:

BufferedTokenStream does the cloning
subclasses are responsible for the cloning

The question is which one should it be?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-1662.patch
17/Dec/09 13:06
2 kB
Robert Muir

Issue Links

blocks

SOLR-1657 convert the rest of solr to use the new tokenstream API

Closed

Activity

People

Assignee:: Shalin Shekhar Mangar

Reporter:: Robert Muir

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Dec/09 13:51

Updated:: 08/Jul/10 19:55

Resolved:: 17/Dec/09 20:48