+ An alternative set of classes for highlighting matching terms in search results. +
+ The + repository for the Highlighter2 contribution. +Javascript library to support client-side query-building. Provides support for a user interface similar to Index: docs/benchmarks.html =================================================================== --- docs/benchmarks.html (revision 791907) +++ docs/benchmarks.html (working copy) @@ -50,7 +50,7 @@ +-->
+ An alternative set of classes for highlighting matching terms in search results. +
+The + repository for the Highlighter2 contribution.Javascript library to support client-side query-building. Provides support for a user interface similar to @@ -394,7 +406,7 @@ repository for the Javascript Query Constructor files.
- +Javascript library to support client-side query validation. Lucene doesn't like malformed queries and tends to @@ -408,7 +420,7 @@ repository for the Javascript Query Validator files.
- +The miscellaneous package is for classes that don't fit anywhere else. The only class in it right now determines @@ -422,7 +434,7 @@ repository for miscellaneous classes.
- +RAM-based index that enables much faster searching than RAMDirectory. Index: docs/scoring.html =================================================================== --- docs/scoring.html (revision 791907) +++ docs/scoring.html (working copy) @@ -52,7 +52,7 @@ +-->
FieldTermStack is a stack that keeps query terms in the specified field
+ * of the document to be highlighted.
+ */
+public class FieldTermStack {
+
+ private final String fieldName;
+ LinkedListTo explain the algorithm, let's use the following sample text + (to be highlighted) and user query:
+ +| Sample Text | +Lucene is a search engine library. | +
| User Query | +Lucene^2 OR "search library"~1 | +
The user query is a BooleanQuery that consists of TermQuery("Lucene") +with boost of 2 and PhraseQuery("search library") with slop of 1.
+For your convenience, here is the offsets and positions info of the +sample text.
+ +++--------+-----------------------------------+ +| | 1111111111222222222233333| +| offset|01234567890123456789012345678901234| ++--------+-----------------------------------+ +|document|Lucene is a search engine library. | ++--------*-----------------------------------+ +|position|0 1 2 3 4 5 | ++--------*-----------------------------------+ ++ +
In Step 1, H2 generates {@link org.apache.lucene.search.highlight2.FieldQuery.QueryPhraseMap} from the user query.
+QueryPhraseMap consists of the following members:
+public class QueryPhraseMap {
+ boolean terminal;
+ int slop; // valid if terminal == true and phraseHighlight == true
+ float boost; // valid if terminal == true
+ Map<String, QueryPhraseMap> subMap;
+}
+
+QueryPhraseMap has subMap. The key of the subMap is a term
+text in the user query and the value is a subsequent QueryPhraseMap.
+If the query is a term (not phrase), then the subsequent QueryPhraseMap
+is marked as terminal. If the query is a phrase, then the subsequent QueryPhraseMap
+is not a terminal and it has the next term text in the phrase.
From the sample user query, the following QueryPhraseMap
+will be generated:
+ QueryPhraseMap ++--------+-+ +-------+-+ +|"Lucene"|o+->|boost=2|*| * : terminal ++--------+-+ +-------+-+ + ++--------+-+ +---------+-+ +-------+------+-+ +|"search"|o+->|"library"|o+->|boost=1|slop=1|*| ++--------+-+ +---------+-+ +-------+------+-+ ++ +
In Step 2, H2 generates {@link org.apache.lucene.search.highlight2.FieldTermStack}. H2 uses {@link org.apache.lucene.index.TermFreqVector} data
+(must be stored {@link org.apache.lucene.document.Field.TermVector#WITH_POSITIONS_OFFSETS})
+to generate it. FieldTermStack keeps the terms in the user query.
+Therefore, in this sample case, H2 generates the following FieldTermStack:
+ FieldTermStack ++------------------+ +|"Lucene"(0,6,0) | ++------------------+ +|"search"(12,18,3) | ++------------------+ +|"library"(26,33,5)| ++------------------+ +where : "termText"(startOffset,endOffset,position) ++
In Step 3, H2 generates {@link org.apache.lucene.search.highlight2.FieldPhraseList}
+by reference to QueryPhraseMap and FieldTermStack.
+ FieldPhraseList ++----------------+-----------------+---+ +|"Lucene" |[(0,6)] |w=2| ++----------------+-----------------+---+ +|"search library"|[(12,18),(26,33)]|w=1| ++----------------+-----------------+---+ ++
The type of each entry is WeightedPhraseInfo that consists of
+an array of terms offsets and weight. The weight (H2 uses query boost to
+calculate the weight) will be taken into account when H2 creates
+{@link org.apache.lucene.search.highlight2.FieldFragList} in the next step.
In Step 4, H2 creates FieldFragList by reference to
+FieldPhraseList. In this sample case, the following
+FieldFragList will be generated:
+ FieldFragList ++---------------------------------+ +|"Lucene"[(0,6)] | +|"search library"[(12,18),(26,33)]| +|totalBoost=3 | ++---------------------------------+ ++
In Step 5, by using FieldFragList and the field stored data,
+H2 creates highlighted snippets!