Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0
-
None
-
None
Description
In some cases, the Solr spell checker improperly reports query terms as being misspelled.
Using the Solr example for 4.0, I added these mini documents:
curl http://localhost:8983/solr/update?commit=true -H 'Content-type:application/csv' -d '
id,name
spel-1,aardvark abacus ball bill cat cello
spel-2,abate accord band bell cattle check
spel-3,adorn border clean clock'
I then issued this request:
curl "http://localhost:8983/solr/spell/?q=check&indent=true"
The spell checker falsely concluded that "check" was misspelled and improperly corrected it to "clock":
<lst name="spellcheck"> <lst name="suggestions"> <lst name="check"> <int name="numFound">1</int> <int name="startOffset">0</int> <int name="endOffset">5</int> <int name="origFreq">1</int> <arr name="suggestion"> <lst> <str name="word">clock</str> <int name="freq">1</int> </lst> </arr> </lst> <bool name="correctlySpelled">false</bool> <lst name="collation"> <str name="collationQuery">clock</str> <int name="hits">1</int> <lst name="misspellingsAndCorrections"> <str name="check">clock</str> </lst> </lst> </lst> </lst>
And if I query for "clock", it gets corrected to "check"!
curl "http://localhost:8983/solr/spell/?q=clock&indent=true"
<lst name="suggestions"> <lst name="clock"> <int name="numFound">1</int> <int name="startOffset">0</int> <int name="endOffset">5</int> <int name="origFreq">1</int> <arr name="suggestion"> <lst> <str name="word">check</str> <int name="freq">1</int> </lst> </arr> </lst> <bool name="correctlySpelled">false</bool> <lst name="collation"> <str name="collationQuery">check</str> <int name="hits">1</int> <lst name="misspellingsAndCorrections"> <str name="clock">check</str> </lst> </lst> </lst>
Note: This appears to be only because "clock" is so close to "check". With other terms I don't see the problem:
curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true"
<lst name="suggestions"> <lst name="check"> <int name="numFound">1</int> <int name="startOffset">13</int> <int name="endOffset">18</int> <int name="origFreq">1</int> <arr name="suggestion"> <lst> <str name="word">clock</str> <int name="freq">1</int> </lst> </arr> </lst> <bool name="correctlySpelled">false</bool> <lst name="collation"> <str name="collationQuery">cattle abate clock</str> <int name="hits">2</int> <lst name="misspellingsAndCorrections"> <str name="cattle">cattle</str> <str name="abate">abate</str> <str name="check">clock</str> </lst> </lst> </lst>
Although, it inappropriately lists "cattle" and "abate" in the "misspellings" section even though no suggestions were offered.
Finally, I can workaround this issue by removing the following line from solrconfig.xml:
<str name="spellcheck.alternativeTermCount">5</str>
Which responds to the previous request with:
<lst name="suggestions"> <bool name="correctlySpelled">false</bool> </lst>
Which makes the original problem go away. Although, it does beg the question as to why my 100% correct query is still tagged as "correctlySpelled" = "false", but that's a separate Jira.