Index: lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/WeightedFragListBuilderTest.java
===================================================================
--- lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/WeightedFragListBuilderTest.java (revision 0)
+++ lucene/highlighter/src/test/org/apache/lucene/search/vectorhighlight/WeightedFragListBuilderTest.java (revision 0)
@@ -0,0 +1,35 @@
+package org.apache.lucene.search.vectorhighlight;
+
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+public class WeightedFragListBuilderTest extends AbstractTestCase {
+
+ public void test2WeightedFragList() throws Exception {
+
+ makeIndexLongMV();
+
+ FieldQuery fq = new FieldQuery( pqF( "the", "both" ), true, true );
+ FieldTermStack stack = new FieldTermStack( reader, 0, F, fq );
+ FieldPhraseList fpl = new FieldPhraseList( stack, fq );
+ WeightedFragListBuilder wflb = new WeightedFragListBuilder();
+ FieldFragList ffl = wflb.createFieldFragList( fpl, 100 );
+ assertEquals( 1, ffl.getFragInfos().size() );
+ assertEquals( "subInfos=(theboth((195,203)))/0.86791086(189,289)", ffl.getFragInfos().get( 0 ).toString() );
+ }
+
+}
Index: lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/WeightedFieldFragList.java
===================================================================
--- lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/WeightedFieldFragList.java (revision 0)
+++ lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/WeightedFieldFragList.java (revision 0)
@@ -0,0 +1,76 @@
+package org.apache.lucene.search.vectorhighlight;
+
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+
+import org.apache.lucene.search.vectorhighlight.FieldFragList.WeightedFragInfo.SubInfo;
+import org.apache.lucene.search.vectorhighlight.FieldPhraseList.WeightedPhraseInfo;
+import org.apache.lucene.search.vectorhighlight.FieldTermStack.TermInfo;
+
+/**
+ * A weighted implementation of {@link FieldFragList}.
+ */
+public class WeightedFieldFragList extends FieldFragList {
+
+ /**
+ * a constructor.
+ *
+ * @param fragCharSize the length (number of chars) of a fragment
+ */
+ public WeightedFieldFragList( int fragCharSize ) {
+ super( fragCharSize );
+ }
+
+ /* (non-Javadoc)
+ * @see org.apache.lucene.search.vectorhighlight.FieldFragList#add( int startOffset, int endOffset, List The type of each entry is WeightedPhraseInfo that consists of
-an array of terms offsets and weight. The weight (Fast Vector Highlighter uses query boost to
-calculate the weight) will be taken into account when Fast Vector Highlighter creates
-{@link org.apache.lucene.search.vectorhighlight.FieldFragList} in the next step.
In Step 4, Fast Vector Highlighter creates FieldFragList by reference to
FieldPhraseList. In this sample case, the following
@@ -137,6 +136,62 @@
|totalBoost=3 |
+---------------------------------+
+
+
+The calculation for each FieldFragList.WeightedFragInfo.totalBoost (weight)
+depends on the implementation of FieldFragList.add( ... ):
+
+
+ public void add( int startOffset, int endOffset, List<WeightedPhraseInfo> phraseInfoList ) {
+ float totalBoost = 0;
+ List<SubInfo> subInfos = new ArrayList<SubInfo>();
+ for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
+ subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
+ totalBoost += phraseInfo.getBoost();
+ }
+ getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
+ }
+
+
+The used implementation of FieldFragList is noted in BaseFragListBuilder.createFieldFragList( ... ):
+
+
+ public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
+ return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
+ }
+
+
++Currently there are basically to approaches available: +
+SimpleFragListBuilder using SimpleFieldFragList: sum-of-boosts-approach. The totalBoost is calculated by summarizing the query-boosts per term. Per default a term is boosted by 1.0WeightedFragListBuilder using WeightedFieldFragList: sum-of-distinct-weights-approach. The totalBoost is calculated by summarizing the IDF-weights of distinct terms.Comparison of the two approaches:
+| Terms in fragment | sum-of-distinct-weights | sum-of-boosts |
|---|---|---|
| das alte testament | 5.339621 | 3.0 |
| das alte testament | 5.339621 | 3.0 |
| das testament alte | 5.339621 | 3.0 |
| das alte testament | 5.339621 | 3.0 |
| das testament | 2.9455688 | 2.0 |
| das alte | 2.4759595 | 2.0 |
| das das das das | 1.5015357 | 4.0 |
| das das das | 1.3003681 | 3.0 |
| das das | 1.061746 | 2.0 |
| alte | 1.0 | 1.0 |
| alte | 1.0 | 1.0 |
| das | 0.7507678 | 1.0 |
| das | 0.7507678 | 1.0 |
| das | 0.7507678 | 1.0 |
| das | 0.7507678 | 1.0 |
| das | 0.7507678 | 1.0 |
In Step 5, by using FieldFragList and the field stored data,
Fast Vector Highlighter creates highlighted snippets!