Issue Details (XML | Word | Printable)

Key: LUCENE-1285
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Otis Gospodnetic
Reporter: Andrzej Bialecki
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types

Created: 15/May/08 01:01 PM   Updated: 11/Oct/08 12:49 PM
Return to search
Component/s: contrib/highlighter
Affects Version/s: 2.4
Fix Version/s: 2.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works highlighter-test.patch 2008-05-15 01:51 PM Mark Miller 1 kB
Text File Licensed for inclusion in ASF works highlighter.patch 2008-05-15 01:14 PM Andrzej Bialecki 3 kB
Issue Links:
Reference
 

Lucene Fields: New, Patch Available
Resolution Date: 27/May/08 04:10 PM


 Description  « Hide
Given a BooleanQuery with multiple clauses, if a term occurs both in a Span / Phrase query, and in a TermQuery, the results of term extraction are unpredictable and depend on the order of clauses. Concequently, the result of highlighting are incorrect.

Example text: t1 t2 t3 t4 t2
Example query: t2 t3 "t1 t2"
Current highlighting: [t1 t2] [t3] t4 t2
Correct highlighting: [t1 t2] [t3] t4 [t2]

The problem comes from the fact that we keep a Map<termText, WeightedSpanTerm>, and if the same term occurs in a Phrase or Span query the resulting WeightedSpanTerm will have a positionSensitive=true, whereas terms added from TermQuery have positionSensitive=false. The end result for this particular term will depend on the order in which the clauses are processed.

My fix is to use a subclass of Map, which on put() always sets the result to the most lax setting, i.e. if we already have a term with positionSensitive=true, and we try to put() a term with positionSensitive=false, we set the result positionSensitive=false, as it will match both cases.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Andrzej Bialecki added a comment - 15/May/08 01:14 PM
A patch to fix the issue.

Mark Miller added a comment - 15/May/08 01:32 PM
Nice catch and the fix looks great.

Thanks Andrzej.


Mark Miller added a comment - 15/May/08 01:51 PM
Test that exposes the problem. The posted patch makes the test pass.
  • Mark

Otis Gospodnetic added a comment - 20/May/08 07:22 PM
Mark, are you done with this/would you like to commit this? Or should I? (Asking because of SOLR-553)

Mark Miller added a comment - 25/May/08 11:40 AM
Just had a go at committing this. Looks good to me.

Otis Gospodnetic added a comment - 27/May/08 04:10 PM
It looks like Mark already committed this, but forgot resolve this issue, so I'm marking it as Fixed.