Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.8.1, 4.9, 5.3.1, 6.6.2, 7.1
-
None
-
None
Description
Setup:
- Schema version is 1.5
- Field config:
<fieldType name="words_ngram" class="solr.TextField" omitNorms="false" autoGeneratePhraseQueries="true"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" /> <filter class="solr.StopFilterFactory" words="url_stopwords.txt" ignoreCase="true" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType>
- Stop words:
http https ftp www
So very simple. In the index I have:
- twitter.com/testuser
All these queries do match:
- twitter.com/testuser
- com/testuser
- testuser
But none of these does:
- https://twitter.com/testuser
- https://www.twitter.com/testuser
- www.twitter.com/testuser
Debug output shows:
"parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
But we need:
"parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
Complete debug outputs:
- a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
- an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature.
Attachments
Attachments
Issue Links
- is related to
-
SOLR-8089 Support query parsers being able to set enablePositionIncrements
- Open
- relates to
-
LUCENE-8036 ShingleFilter should have an option to skip filler tokens (e.g. stop words)
- Patch Available