Index: src/java/org/apache/lucene/analysis/TokenStream.java
===================================================================
--- src/java/org/apache/lucene/analysis/TokenStream.java (revision 829452)
+++ src/java/org/apache/lucene/analysis/TokenStream.java (working copy)
@@ -31,14 +31,14 @@
* A TokenStream enumerates the sequence of tokens, either from
* {@link Field}s of a {@link Document} or from query text.
*
- * This is an abstract class. Concrete subclasses are: + * This is an abstract class; concrete subclasses are: *
TokenStream whose input is a Reader; and
* TokenStream whose input is another
* TokenStream.
* TokenStream API has been introduced with Lucene 2.9. This API
- * has moved from being {@link Token} based to {@link Attribute} based. While
+ * has moved from being {@link Token}-based to {@link Attribute}-based. While
* {@link Token} still exists in 2.9 as a convenience class, the preferred way
* to store the information of a {@link Token} is to use {@link AttributeImpl}s.
* @@ -54,14 +54,14 @@ *
TokenStream/{@link TokenFilter}s which add/get
* attributes to/from the {@link AttributeSource}.
* TokenStream
+ * using the TokenStream.
*
* To make sure that filters and consumers know which attributes are available,
* the attributes must be added during instantiation. Filters and consumers are
@@ -72,7 +72,7 @@
* Javadoc.
*
* Sometimes it is desirable to capture a current state of a
- This is an abstract class.
+ This is an abstract class; subclasses must override {@link #incrementToken()}
- NOTE: subclasses must override
- {@link #incrementToken()} if the new TokenStream API is used
- and {@link #next(Token)} or {@link #next()} if the old
- TokenStream API is used.
-
NOTE: Subclasses overriding {@link #incrementToken()} must
call {@link AttributeSource#clearAttributes()} before
setting attributes.
- Subclasses overriding {@link #next(Token)} must call
+ Subclasses overriding {@link #incrementToken()} must call
{@link Token#clear()} before setting Token attributes.
*/
-
public abstract class Tokenizer extends TokenStream {
/** The text source for this Tokenizer. */
protected Reader input;
Index: src/java/org/apache/lucene/analysis/Token.java
===================================================================
--- src/java/org/apache/lucene/analysis/Token.java (revision 829452)
+++ src/java/org/apache/lucene/analysis/Token.java (working copy)
@@ -37,7 +37,7 @@
The start and end offsets permit applications to re-associate a token with
its source text, e.g., to display highlighted query terms in a document
- browser, or to show matching text fragments in a KWIC (KeyWord In Context)
+ browser, or to show matching text fragments in a KWIC
display, etc.
The type is a string, assigned by a lexical analyzer
@@ -59,9 +59,9 @@
Tokenizers and filters should try to re-use a Token
+ Tokenizers and TokenFilters should try to re-use a Token
instance when possible for best performance, by
- implementing the {@link TokenStream#next(Token)} API.
+ implementing the {@link TokenStream#incrementToken()} API.
Failing that, to create a new Token you should first use
one of the constructors that starts with null text. To load
the token from a char[] use {@link #setTermBuffer(char[], int, int)}.
@@ -75,30 +75,30 @@
set the length of the term text. See LUCENE-969
for details. Typical reuse patterns:
+ Typical Token reuse patterns:
TokenStream,
- * e.g. for buffering purposes (see {@link CachingTokenFilter},
+ * e.g., for buffering purposes (see {@link CachingTokenFilter},
* {@link TeeSinkTokenFilter}). For this usecase
* {@link AttributeSource#captureState} and {@link AttributeSource#restoreState}
* can be used.
@@ -101,7 +101,7 @@
}
/**
- * Consumers (ie {@link IndexWriter}) use this method to advance the stream to
+ * Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to
* the next token. Implementing classes must implement this method and update
* the appropriate {@link AttributeImpl}s with the attributes of the next
* token.
Index: src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java
===================================================================
--- src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java (revision 829452)
+++ src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java (working copy)
@@ -53,7 +53,7 @@
d.add(new Field("f3", final3));
d.add(new Field("f4", final4));
*
- * In this example, sink1 and sink2 will both get tokens from both
+ * In this example, sink1 and sink2 will both get tokens from both
* reader1 and reader2 after whitespace tokenizer
* and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired.
* It is important, that tees are consumed before sinks (in the above example, the field names must be
Index: src/java/org/apache/lucene/analysis/Tokenizer.java
===================================================================
--- src/java/org/apache/lucene/analysis/Tokenizer.java (revision 829452)
+++ src/java/org/apache/lucene/analysis/Tokenizer.java (working copy)
@@ -24,20 +24,14 @@
/** A Tokenizer is a TokenStream whose input is a Reader.
-
-
+
return reusableToken.reinit(string, startOffset, endOffset[, type]);
+
return reusableToken.reinit(string, 0, string.length(), startOffset, endOffset[, type]);
return reusableToken.reinit(buffer, 0, buffer.length, startOffset, endOffset[, type]);
return reusableToken.reinit(buffer, start, end - start, startOffset, endOffset[, type]);
return reusableToken.reinit(source.termBuffer(), 0, source.termLength(), source.startOffset(), source.endOffset()[, source.type()]);
@@ -108,7 +108,7 @@
TokenStreams can be chained, one cannot assume that the Token's current type is correct.- This is an abstract class. - NOTE: subclasses must override - {@link #incrementToken()} if the new TokenStream API is used - and {@link #next(Token)} or {@link #next()} if the old - TokenStream API is used. -
- See {@link TokenStream} + This is an abstract class; subclasses must override {@link #incrementToken()}. + @see TokenStream */ public abstract class TokenFilter extends TokenStream { /** The source of tokens for this filter. */ Index: src/java/org/apache/lucene/analysis/CharArraySet.java =================================================================== --- src/java/org/apache/lucene/analysis/CharArraySet.java (revision 829452) +++ src/java/org/apache/lucene/analysis/CharArraySet.java (working copy) @@ -32,7 +32,7 @@ * is in the set without the necessity of converting it * to a String first. *
- * Please note: This class implements {@link Set} but + * Please note: This class implements {@link java.util.Set Set} but * does not behave like it should in all cases. The generic type is * {@code Set