Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
8.8.2
-
None
-
None
Description
Intro
in multi-field search where the text analysis per field produces a different amount of tokens:
sow=true causes the minimum should match to be "per document"
i.e a document to be a match must contain all the mm query terms anywhere at least once
sow=false causes the minimum should match to be "per field"
i.e a document to be a match must contain all the mm query terms in a single field at least once
When the query parsed moves from being term centric(sow=true) to field centric(sow=false and different text analysis), mm means two different things:
sow = true mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united | subjects_as_same_term:united) (author:kingdom | subjects_as_same_term:kingdom))~2)"
"response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[ { "id":"888888", "author":"united", "subjects":["kingdom"], "score":7.757958}, { "id":"77777", "author":"united kingdom", "score":5.874222}] },
mimimum of query terms matched within the same field (i.e. all query terms required must be found in one of the fields)
“PER FIELD”
sow = false mm=2 qf = author subjects_as_same_term q = united kingdom defType = edismax "parsedquery_toString": "+(((author:united author:kingdom)~2) | (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1))"
This (author:united author:kingdom)~2 means we need both the clauses to match to have a good candidate, in disjunction with
(subjects_as_same_term:uk subjects_as_same_term:”united kingdom” subjects_as_same_term:england subjects_as_same_term:london subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we need at least one clause to match (because synonyms expanded the two original terms into a single one)
"response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[ { "id":"77777", "author":"united kingdom", "score":5.874222}] }
Problem
When a field text analysis is incompatible with the query text, mm is not fully respected:
sow = false mm=100% qf = text numeric_i q = terminator 100 defType = edismax "parsedquery_toString": "+(((text:terminator text:100)~2) | (numeric_i:100)~1))"
A document just containing '100' in the field numeric_i is returned as a good search result but it actually doesn't respect the mm=100%
Attachments
Issue Links
- links to