Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
3.1, 3.2, 4.0-ALPHA
-
None
-
None
Description
Currently, if you boost a value in a multivalue field during index time, the boosts are consolidated for every field, and the individual values are lost.
So, for example, given a list of photos with a multivalue field "keywords", and a boost for a keyword assigned to a photo corresponds to the number of times that photo was downloaded after searching for that particular keyword, we have documents like this:
photo1: Photo of a cat by itself keywords: [ cat:600 feline:100 ] => boost total = 700 photo2: Photo of a cat driving a truck keywords: [ cat:100 feline:90 animal:80 truck:1000 ] => boost total = 1270
If you search for "cat feline", photo2 will rank higher, since the boost of "cat-like" words was consolidated with the "truck" boost anomaly. Whereas photo1, which has more downloads for "cat" and "feline", ranks lower with a lower consolidated boost, even though the total boost for the relevant keywords is higher than for photo1.
Intuitively, the boosts should be separate, so only the boosts for the terms searched will be counted.
Given the current behaviour, you are forced to do one of the following:
1. Assemble all of the multi-values into a string, and use payloads in place of boosts.
2. Use dynamic fields, such as keyword_*, and boost them independently.
Neither of these solutions are ideal, as using payloads requires writing your own BoostingTermQuery, and defining a new dynamic field per multi-value makes searching more difficult than with multivalue fields.
There's a blog entry that describes the current behaviour:
http://blog.kapilchhabra.com/2008/01/solr-index-time-boost-facts-2