Solr
  1. Solr
  2. SOLR-1321

Support for efficient leading wildcards search

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      This patch is an implementation of the "reversed tokens" strategy for efficient leading wildcards queries.

      ReversedWildcardsTokenFilter reverses tokens and returns both the original token (optional) and the reversed token (with positionIncrement == 0). Reversed tokens are prepended with a marker character to avoid collisions between legitimate tokens and the reversed tokens - e.g. "DNA" would become "and", thus colliding with the regular term "and", but with the marker character it becomes "\u0001and".

      This TokenFilter can be added to the analyzer chain that it used during indexing.

      SolrQueryParser has been modified to detect the presence of such fields in the current schema, and treat them in a special way. First, SolrQueryParser examines the schema and collects a map of fields where these reversed tokens are indexed. If there is at least one such field, it also sets QueryParser.setAllowLeadingWildcards(true). When building a wildcard query (in getWildcardQuery) the term text may be optionally reversed to put wildcards further along the term text. This happens when the field uses the reversing filter during indexing (as detected above), AND if the wildcard characters are either at 0-th or 1-st position in the term. Otherwise the term text is processed as before, i.e. turned into a regular wildcard query.

      Unit tests are provided to test the TokenFilter and the query parsing.

      1. SOLR-1321.patch
        23 kB
        Robert Muir
      2. SOLR-1321.patch
        23 kB
        Grant Ingersoll
      3. SOLR-1321.patch
        22 kB
        Grant Ingersoll
      4. wildcards.patch
        15 kB
        Andrzej Bialecki
      5. wildcards-2.patch
        19 kB
        Andrzej Bialecki
      6. wildcards-3.patch
        20 kB
        Andrzej Bialecki

        Issue Links

          Activity

          Hide
          Andrzej Bialecki added a comment -

          Patch containing the new filter, example schema and unit tests.

          Show
          Andrzej Bialecki added a comment - Patch containing the new filter, example schema and unit tests.
          Hide
          Robert Muir added a comment -

          andrej, i really like this feature.

          one question though, the way you have overridden getWildcardQuery() i am not certain how I can have both this feature and say, override getWildCardQuery to produce an AutomatonWildcardQuery: LUCENE-1606
          what I want is both , because AutomatonWildCardQuery cannot help at all with leading *, (it can with leading ? for example).

          but don't let this really get in your way, just a side note.

          Show
          Robert Muir added a comment - andrej, i really like this feature. one question though, the way you have overridden getWildcardQuery() i am not certain how I can have both this feature and say, override getWildCardQuery to produce an AutomatonWildcardQuery: LUCENE-1606 what I want is both , because AutomatonWildCardQuery cannot help at all with leading *, (it can with leading ? for example). but don't let this really get in your way, just a side note.
          Hide
          Andrzej Bialecki added a comment -

          If you follow the logic in getWildcardQuery, a field has to meet specific requirements for this reversal to occur - namely, it needs to declare in its indexing analysis chain that it uses ReversedWildcardFilter. This filter does very special kind of reversal (prepending the marker) so it's unlikely that anyone would use it for other purpose than to explicitly support leading wildcards. So for now I'd say that users should consciously choose between this method of supporting leading wildcards and the automaton wildcard query.

          Show
          Andrzej Bialecki added a comment - If you follow the logic in getWildcardQuery, a field has to meet specific requirements for this reversal to occur - namely, it needs to declare in its indexing analysis chain that it uses ReversedWildcardFilter. This filter does very special kind of reversal (prepending the marker) so it's unlikely that anyone would use it for other purpose than to explicitly support leading wildcards. So for now I'd say that users should consciously choose between this method of supporting leading wildcards and the automaton wildcard query.
          Hide
          Robert Muir added a comment - - edited

          andrzej i see what you are saying. I think its a great feature the way it is!

          In the future I will take a look at finding a way to do both, this way complex cases like *abcde?f get reversed by this feature into \u0001f?edcba*, 
          but then implemented with automaton so that it doesn't have to enumerate all tokens that start with \u0001f. 
          

          this is bad example hope you see what i mean. the biggest challenge would be preventing suboptimal cases, like reversing g?abcde* into \u2001*edcba?g, (at least I think).
          the first is actually more efficient, I think regardless of the wildcard impl.

          I wonder if in your patch you could have an additional check, if something is in the 1st position but the last character is also a wildcard, not to reverse it?
          in the example above even with the default lucene wildcard query, at least it would only enumerate the tokens starting with g, so its better not to reverse it.

          if its in the 0th position it doesnt matter if you reverse it or not but I think that one case can be optimized.

          Thanks,
          Robert

          Show
          Robert Muir added a comment - - edited andrzej i see what you are saying. I think its a great feature the way it is! In the future I will take a look at finding a way to do both, this way complex cases like *abcde?f get reversed by this feature into \u0001f?edcba*, but then implemented with automaton so that it doesn't have to enumerate all tokens that start with \u0001f. this is bad example hope you see what i mean. the biggest challenge would be preventing suboptimal cases, like reversing g?abcde* into \u2001*edcba?g, (at least I think). the first is actually more efficient, I think regardless of the wildcard impl. I wonder if in your patch you could have an additional check, if something is in the 1st position but the last character is also a wildcard, not to reverse it? in the example above even with the default lucene wildcard query, at least it would only enumerate the tokens starting with g, so its better not to reverse it. if its in the 0th position it doesnt matter if you reverse it or not but I think that one case can be optimized. Thanks, Robert
          Hide
          Andrzej Bialecki added a comment -

          I think your example of g?abcde* could be handled if we assigned different "costs" of expanding ? and *, the latter being more costly. There could be also a rule that prevents the reversing if a trailing "costly" wildcard is used.

          This quickly gets more and more complicated, so ultimately we may want to put this logic elsewhere, in a class that knows the best how to make such decisions (ReversedWildcardFilter ?). I'll try to modify the patch in this direction.

          Show
          Andrzej Bialecki added a comment - I think your example of g?abcde* could be handled if we assigned different "costs" of expanding ? and *, the latter being more costly. There could be also a rule that prevents the reversing if a trailing "costly" wildcard is used. This quickly gets more and more complicated, so ultimately we may want to put this logic elsewhere, in a class that knows the best how to make such decisions (ReversedWildcardFilter ?). I'll try to modify the patch in this direction.
          Hide
          Robert Muir added a comment -

          Andrzej, with the costs logic, you wouldn't have to limit the reversing to just 0th or 1st position wildcards either, right?

          for example if a term was 10 characters long with a wildcard in the 3rd position, reversing it is helpful.

          Show
          Robert Muir added a comment - Andrzej, with the costs logic, you wouldn't have to limit the reversing to just 0th or 1st position wildcards either, right? for example if a term was 10 characters long with a wildcard in the 3rd position, reversing it is helpful.
          Hide
          Andrzej Bialecki added a comment -

          Exactly, that's the reason to put this logic in an isolated well-defined place, with some configurable knobs. One parameter would be the max. position of the leading wildcard, another would be relative cost of ? and *, or whether we allow wildcards at any positions except the 0-th (pure suffix search).

          Show
          Andrzej Bialecki added a comment - Exactly, that's the reason to put this logic in an isolated well-defined place, with some configurable knobs. One parameter would be the max. position of the leading wildcard, another would be relative cost of ? and *, or whether we allow wildcards at any positions except the 0-th (pure suffix search).
          Hide
          Robert Muir added a comment -

          sounds perfect, great idea. Thanks!

          Show
          Robert Muir added a comment - sounds perfect, great idea. Thanks!
          Hide
          Andrzej Bialecki added a comment -

          Updated patch with more configurable knobs. See javadoc of ReversedWildcardsFilterFactory and unit tests.

          Show
          Andrzej Bialecki added a comment - Updated patch with more configurable knobs. See javadoc of ReversedWildcardsFilterFactory and unit tests.
          Hide
          Robert Muir added a comment -

          Andrzej, did you accidentally leave out ReversedWildcardsFilterFactory.java in the patch?

          Show
          Robert Muir added a comment - Andrzej, did you accidentally leave out ReversedWildcardsFilterFactory.java in the patch?
          Hide
          Andrzej Bialecki added a comment -

          Previous patch mistakenly included other stuff instead of ReversedWildcardFilterFactory.

          Show
          Andrzej Bialecki added a comment - Previous patch mistakenly included other stuff instead of ReversedWildcardFilterFactory.
          Hide
          Robert Muir added a comment -

          andrzej, thanks, I like this design.

          Show
          Robert Muir added a comment - andrzej, thanks, I like this design.
          Hide
          Grant Ingersoll added a comment -

          Since this is a new filter, we might as well use the new incrementToken capability and reusable stuff as well as avoiding other deprecated analysis calls.

          Show
          Grant Ingersoll added a comment - Since this is a new filter, we might as well use the new incrementToken capability and reusable stuff as well as avoiding other deprecated analysis calls.
          Hide
          Grant Ingersoll added a comment -

          Also no need to do the string round trip in the reverse method, right? See the ReverseStringFilter in Lucene contrib/analysis. Perhaps we should just patch that and add some config options to it? Then all Solr would need is the QP change and the FilterFactory change, no?

          Show
          Grant Ingersoll added a comment - Also no need to do the string round trip in the reverse method, right? See the ReverseStringFilter in Lucene contrib/analysis. Perhaps we should just patch that and add some config options to it? Then all Solr would need is the QP change and the FilterFactory change, no?
          Hide
          Robert Muir added a comment -

          i do have one comment on the reverse() present here: it is not unicode-safe (it will create unpaired surrogates).
          We need to think thru the implications of this, fyi i have a patch for lucene's ReverseStringFilter (similar problem) sitting in LUCENE-1689

          Show
          Robert Muir added a comment - i do have one comment on the reverse() present here: it is not unicode-safe (it will create unpaired surrogates). We need to think thru the implications of this, fyi i have a patch for lucene's ReverseStringFilter (similar problem) sitting in LUCENE-1689
          Hide
          Andrzej Bialecki added a comment -

          Since this is a new filter, we might as well use the new incrementToken capability and reusable stuff as well as avoiding other deprecated analysis calls.

          Indeed, I'll fix this.

          Also no need to do the string round trip in the reverse method, right? See the ReverseStringFilter in Lucene contrib/analysis.

          Roundtrip ... you mean the allocation of new char[] buffer, or conversion to String? I assume the latter - the former is needed because we add the marker char in front. Yeah, I can return char[] and convert to String only in QP.

          Perhaps we should just patch that and add some config options to it? Then all Solr would need is the QP change and the FilterFactory change, no?

          Hmm. After adding the marker-related stuff the code in ReverseStringFilter won't be so nice as it is now. I'd keep in mind the specific use case of this filter ...

          Show
          Andrzej Bialecki added a comment - Since this is a new filter, we might as well use the new incrementToken capability and reusable stuff as well as avoiding other deprecated analysis calls. Indeed, I'll fix this. Also no need to do the string round trip in the reverse method, right? See the ReverseStringFilter in Lucene contrib/analysis. Roundtrip ... you mean the allocation of new char[] buffer, or conversion to String? I assume the latter - the former is needed because we add the marker char in front. Yeah, I can return char[] and convert to String only in QP. Perhaps we should just patch that and add some config options to it? Then all Solr would need is the QP change and the FilterFactory change, no? Hmm. After adding the marker-related stuff the code in ReverseStringFilter won't be so nice as it is now. I'd keep in mind the specific use case of this filter ...
          Hide
          Grant Ingersoll added a comment -

          After adding the marker-related stuff the code in ReverseStringFilter won't be so nice as it is now

          Of course, it would be optional.

          Yeah, I can return char[] and convert to String only in QP

          Right, no need to materialize a String if we don't need to. FWIW, it also seemed like the reverse code in ReverseStringFilter was faster than the patch, but I didn't compare quantitatively.

          Show
          Grant Ingersoll added a comment - After adding the marker-related stuff the code in ReverseStringFilter won't be so nice as it is now Of course, it would be optional. Yeah, I can return char[] and convert to String only in QP Right, no need to materialize a String if we don't need to. FWIW, it also seemed like the reverse code in ReverseStringFilter was faster than the patch, but I didn't compare quantitatively.
          Hide
          Andrzej Bialecki added a comment -

          FWIW, it also seemed like the reverse code in ReverseStringFilter was faster than the patch, but I didn't compare quantitatively.

          It better be - it can reverse in-place, while we have to allocate a new buffer because of the marker char in front. That's what I meant by messy code - we would need both the in-place and the out-of-place method depending on an option.

          Show
          Andrzej Bialecki added a comment - FWIW, it also seemed like the reverse code in ReverseStringFilter was faster than the patch, but I didn't compare quantitatively. It better be - it can reverse in-place, while we have to allocate a new buffer because of the marker char in front. That's what I meant by messy code - we would need both the in-place and the out-of-place method depending on an option.
          Hide
          Yonik Seeley added a comment -

          Use resizeTermBuffer() first... and then do it in place. The buffer may already be big enough and so resizeTermBuffer would just be a check.

          Show
          Yonik Seeley added a comment - Use resizeTermBuffer() first... and then do it in place. The buffer may already be big enough and so resizeTermBuffer would just be a check.
          Hide
          Robert Muir added a comment -

          btw, i found apache harmony has a nice impl of in-place reversing that is unicode safe (AbstractStringBuilder.reverse0)
          it will treat surrogate pairs as one character for the op just like java 5 StringBuilder.reverse()

          since lucene wildcard does not properly handle these, it would probably be bad to be "unicode correct" at this point.
          but perhaps a TODO or comment is desirable, as I hope to improve this situation in the future!

          Show
          Robert Muir added a comment - btw, i found apache harmony has a nice impl of in-place reversing that is unicode safe (AbstractStringBuilder.reverse0) it will treat surrogate pairs as one character for the op just like java 5 StringBuilder.reverse() since lucene wildcard does not properly handle these, it would probably be bad to be "unicode correct" at this point. but perhaps a TODO or comment is desirable, as I hope to improve this situation in the future!
          Hide
          Grant Ingersoll added a comment -

          So, now that this is in Lucene, we likely just need to update the Factory in Solr, right? As well as the Lucene libs, if they haven't been already.

          Show
          Grant Ingersoll added a comment - So, now that this is in Lucene, we likely just need to update the Factory in Solr, right? As well as the Lucene libs, if they haven't been already.
          Hide
          Andrzej Bialecki added a comment -

          I'll update the patch, assuming the presence of the updated filter in Lucene, but I'd rather leave updating the libs to someone more intimate with Solr internals ...

          Show
          Andrzej Bialecki added a comment - I'll update the patch, assuming the presence of the updated filter in Lucene, but I'd rather leave updating the libs to someone more intimate with Solr internals ...
          Hide
          Grant Ingersoll added a comment -

          I can take care of the libs.

          Show
          Grant Ingersoll added a comment - I can take care of the libs.
          Hide
          Andrzej Bialecki added a comment -

          Updated patch that uses new TokenAttribute API and uses (as much as possible) the new ReverseStringFilter.

          Show
          Andrzej Bialecki added a comment - Updated patch that uses new TokenAttribute API and uses (as much as possible) the new ReverseStringFilter.
          Hide
          Grant Ingersoll added a comment -

          Added ASL headers.

          I don't understand, in the Test, the comment:

          // XXX note: this should be false, but for now we return true for any field,
          // XXX if at least one field uses the reversing
          assertTrue(parserThree.getAllowLeadingWildcard());

          Seems like this needs to be fixed before committing.

          Show
          Grant Ingersoll added a comment - Added ASL headers. I don't understand, in the Test, the comment: // XXX note: this should be false, but for now we return true for any field, // XXX if at least one field uses the reversing assertTrue(parserThree.getAllowLeadingWildcard()); Seems like this needs to be fixed before committing.
          Hide
          Andrzej Bialecki added a comment -

          This comment refers to the limitation of Lucene's QueryParser - there is only a single flag there to decide whether it accepts leading wildcards or not, regardless of field. Consequently, after checking the schema in SolrQueryParser we turn on this flag if any field type supports leading wildcards. The end effect of this is that parsers for any field, which are created with IndexSchema.getSolrQueryParser(), will return true if any field type supports leading wildcards, not neccessarily the one for which the parser was created..

          I don't see a way to fix this. I can clarify the comment, though, so that it's clear that this is a limitation in Lucene QueryParser.

          Show
          Andrzej Bialecki added a comment - This comment refers to the limitation of Lucene's QueryParser - there is only a single flag there to decide whether it accepts leading wildcards or not, regardless of field. Consequently, after checking the schema in SolrQueryParser we turn on this flag if any field type supports leading wildcards. The end effect of this is that parsers for any field, which are created with IndexSchema.getSolrQueryParser(), will return true if any field type supports leading wildcards, not neccessarily the one for which the parser was created.. I don't see a way to fix this. I can clarify the comment, though, so that it's clear that this is a limitation in Lucene QueryParser.
          Hide
          Yonik Seeley added a comment -

          I've always been in favor of just allowing leading wildcards for all fields, but I think others disagreed on that point.

          Show
          Yonik Seeley added a comment - I've always been in favor of just allowing leading wildcards for all fields, but I think others disagreed on that point.
          Hide
          Grant Ingersoll added a comment -

          OK, Andrzej. That makes sense. We may want to push down into Lucene eventually. The only outcome of this is that people who are expecting "efficient" wildcard searches on fields that don't use the Reverse stuff would see slower perf.

          I feel comfortable committing this patch now.

          Show
          Grant Ingersoll added a comment - OK, Andrzej. That makes sense. We may want to push down into Lucene eventually. The only outcome of this is that people who are expecting "efficient" wildcard searches on fields that don't use the Reverse stuff would see slower perf. I feel comfortable committing this patch now.
          Hide
          Robert Muir added a comment -

          one comment, i think you should be able to use char[] versus String here pretty easily. it might have some impact on indexing speed.

          but this could be a later improvement.

          Show
          Robert Muir added a comment - one comment, i think you should be able to use char[] versus String here pretty easily. it might have some impact on indexing speed. but this could be a later improvement.
          Hide
          Grant Ingersoll added a comment -

          Hmm, anyone else having troubles with the other tests? I don't think this is handling dynamic fields properly.

          Show
          Grant Ingersoll added a comment - Hmm, anyone else having troubles with the other tests? I don't think this is handling dynamic fields properly.
          Hide
          Grant Ingersoll added a comment -

          Notwithstanding the fact that other tests that use the QP fail with this patch (and the old one), here's a patch that uses char[] instead of Strings.

          Show
          Grant Ingersoll added a comment - Notwithstanding the fact that other tests that use the QP fail with this patch (and the old one), here's a patch that uses char[] instead of Strings.
          Hide
          Robert Muir added a comment - - edited

          Hmm, anyone else having troubles with the other tests?

          yes, I can confirm this... btw shouldnt the qp be looking at .getQueryAnalyzer() versus .getAnalyzer()?

          nevermind, since you dont want to reverse at query time...

          Show
          Robert Muir added a comment - - edited Hmm, anyone else having troubles with the other tests? yes, I can confirm this... btw shouldnt the qp be looking at .getQueryAnalyzer() versus .getAnalyzer()? nevermind, since you dont want to reverse at query time...
          Hide
          Grant Ingersoll added a comment -

          Problem seems to stem from : queries.

          Show
          Grant Ingersoll added a comment - Problem seems to stem from : queries.
          Hide
          Yonik Seeley added a comment -

          btw shouldnt the qp be looking at .getQueryAnalyzer() versus .getAnalyzer()?

          The QueryParser should be using the query analyzer in general, but in this case, it's looking at the index analyzer to see if it indexed reverse tokens.

          Show
          Yonik Seeley added a comment - btw shouldnt the qp be looking at .getQueryAnalyzer() versus .getAnalyzer()? The QueryParser should be using the query analyzer in general, but in this case, it's looking at the index analyzer to see if it indexed reverse tokens.
          Hide
          Robert Muir added a comment -

          grant's patch, but with the logic to handle matchalldocs

          Show
          Robert Muir added a comment - grant's patch, but with the logic to handle matchalldocs
          Hide
          Grant Ingersoll added a comment -

          Only annoying thing about this solution is now that * check gets done twice, once in the SolrQueryParser method and once in the QueryParser method.

          Also, why not just && the two clauses (I realize it is a cut and paste from the parent).

          I'll fix and commit.

          Show
          Grant Ingersoll added a comment - Only annoying thing about this solution is now that * check gets done twice, once in the SolrQueryParser method and once in the QueryParser method. Also, why not just && the two clauses (I realize it is a cut and paste from the parent). I'll fix and commit.
          Hide
          Grant Ingersoll added a comment -

          Committed revision 813830.

          Show
          Grant Ingersoll added a comment - Committed revision 813830.
          Hide
          Ravi Kiran Bhaskar added a comment -

          While using ReversedWildcardFilterFactory with KeywordTokenizerFactory I get the following error for the fieldType

          <fieldType name="keywordText" class="solr.TextField" sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
          <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.TrimFilterFactory" />
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
          </analyzer>
          <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.TrimFilterFactory" />
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
          </analyzer>
          </fieldType>

          ERROR
          --------
          HTTP Status 500 - org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; java.lang.AbstractMethodError: org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:153) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:807) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:794) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1425) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1230) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176) at org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:195) at org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:158) at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:297) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:206) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:150) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:272) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:637) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:568) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:813) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214) at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265) at com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:106)

          Show
          Ravi Kiran Bhaskar added a comment - While using ReversedWildcardFilterFactory with KeywordTokenizerFactory I get the following error for the fieldType <fieldType name="keywordText" class="solr.TextField" sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.TrimFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.TrimFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> ERROR -------- HTTP Status 500 - org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; java.lang.AbstractMethodError: org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer; at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:153) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:807) at org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:794) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1425) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1230) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176) at org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:195) at org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:158) at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:297) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:206) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:150) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:272) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:637) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:568) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:813) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263) at com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214) at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265) at com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:106)
          Hide
          Yonik Seeley added a comment -

          Works for me in the latest trunk - perhaps you have some old class files laying around somewhere? That's normally the case with something like AbstractMethodError.

          Show
          Yonik Seeley added a comment - Works for me in the latest trunk - perhaps you have some old class files laying around somewhere? That's normally the case with something like AbstractMethodError.
          Hide
          Ravi Kiran Bhaskar added a comment -

          Thanks Yonik for the very prompt reply. Yes, you were right. The lucene related jars had a 'dev' suffix previously and now the 'dev' was removed and hence my eclipse had jar resolution problems while building. I tested it and works great as always. BTW you folks did a wonderful job with solr

          Show
          Ravi Kiran Bhaskar added a comment - Thanks Yonik for the very prompt reply. Yes, you were right. The lucene related jars had a 'dev' suffix previously and now the 'dev' was removed and hence my eclipse had jar resolution problems while building. I tested it and works great as always. BTW you folks did a wonderful job with solr
          Hide
          Ravi Kiran Bhaskar added a comment -

          Is there a way to get only the actual instead of both actual and reversed texts (I see that we can just get the reserved without the actual. Is the other way round possible ?).

          The reason being, I tried to facet the field and it returned be both as follows

          <int name="#1;ssergnoC">78</int>
          <int name="Congress">78</int>
          <int name="#1;.W egroeG ,hsuB">28</int>
          <int name="Bush, George W.">28</int>
          <int name="#1;detinU">263</int>
          <int name="United">263</int>

          Show
          Ravi Kiran Bhaskar added a comment - Is there a way to get only the actual instead of both actual and reversed texts (I see that we can just get the reserved without the actual. Is the other way round possible ?). The reason being, I tried to facet the field and it returned be both as follows <int name="#1;ssergnoC">78</int> <int name="Congress">78</int> <int name="#1;.W egroeG ,hsuB">28</int> <int name="Bush, George W.">28</int> <int name="#1;detinU">263</int> <int name="United">263</int>
          Hide
          Grant Ingersoll added a comment -

          Bulk close Solr 1.4 issues

          Show
          Grant Ingersoll added a comment - Bulk close Solr 1.4 issues

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Andrzej Bialecki
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development