[SOLR-13056] SortableTextField is trappy for faceting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 7.6
Fix Version/s: None
Component/s: search
Labels:
None

Description

Using SortableTextField for distributed faceting can lead to wrong results. This can be demonstrated by installing the cloud-version of the gettingstarted sample with

./solr -e cloud

using defaults all the way, except for shards which should be 3. After that a corpus can be indexed with

( echo '[' ; for J in $(seq 0 99); do ID=$((J)) ; echo "\{\"id\":\"$ID\",\"facet_t_sort\":\"a b $J\"},"; done ; echo '\{"id":"duplicate_1","facet_t_sort":"a b"},\{"id":"duplicate_2","facet_t_sort":"a b"}]' ) | curl s -d @ -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/gettingstarted/update?commit=true'

This will index 100 documents with a single-valued field facet_t_sort:"a b X" where X is the document number + 2 documents with facet_t_sort:"a b". The call

curl 'http://localhost:8983/solr/gettingstarted/select?facet.field=facet_t_sort&facet.limit=5&facet=on&q=*:*&rows=0'

should return "a b" as the top facet term with count 2, but returns

{
{{ "responseHeader":{}}
{{ "zkConnected":true,}}
{{ "status":0,}}
{{ "QTime":13,}}
{{ "params":{}}
{{ "facet.limit":"5",}}
{{ "q":":",}}
{{ "facet.field":"facet_t_sort",}}
{{ "rows":"0",}}
{{ "facet":"on"} },}}
{{ "response":{"numFound":102,"start":0,"maxScore":1.0,"docs":[]}}
{{ },}}
{{ "facet_counts":{}}
{{ "facet_queries":{},}}
{{ "facet_fields":{}}
{{ "facet_t_sort":[}}
{{ "a b",36,}}
{{ "a b 0",1,}}
{{ "a b 1",1,}}
{{ "a b 10",1,}}
{{ "a b 11",1]},}}
{{ "facet_ranges":{},}}
{{ "facet_intervals":{},}}
{{ "facet_heatmaps":{} } } }}

The problem is the second phase of simple faceting, where the fine-counting happens. In the first phase, "a b" is returned from 1 or 2 of the 3 shards. It wins the popularity contest as there are 2 "a b"-terms and only 1 of all the other terms. The 1 or 2 shards that did not deliver "a b" in the first phase are then queried for the count for "a b", which happens in the form of a facet_t_sort:"a b"-lookup. It seems that this lookup uses the analyzer chain and thus matches all the documents in that shard (approximately 102/3).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-13056.patch
01/Mar/19 21:03
2 kB
Michael Gibney

Issue Links

is caused by

SOLR-11916 new SortableTextField using docValues built from the original string input

Closed

relates to

SOLR-16139 [Regression] JSON stat facet functions not working on analysed String (SortableTextField)

Open

Activity

People

Assignee:: Unassigned

Reporter:: Toke Eskildsen

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Dec/18 08:25

Updated:: 01/Apr/22 18:36