Solr
  1. Solr
  2. SOLR-3143

Supply a phrase-oriented QueryConverter for Suggesters

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: spellchecker
    • Labels:
      None

      Description

      The supplied QueryConverter makes sense for Spellcheckers:
      it tries to parse out the 'meat' of the query (using e.g. identifier rules),
      and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).

      words[] = splitByIdentifierRules();
      for (each word) {
       tokenstream ts = analyzer.tokenStream(word)
       for (each analyzedWord from tokenstream) {
         tokens.add(analyzedWord)
       }
      }
      

      However, for Suggesters this is not really optimal, because in the general
      case they do not work one word at a time: they aren't really suggesting
      individual words but instead an entire 'query' that matches a prefix.

      so instead here, I think we just want a QueryConverter that creates a
      single string containing all the 'meat', and we pass the whole thing to
      the analyzer, then the suggester.

      The current workaround on the wiki to this problem, is to ask the user to write custom
      code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not
      great since this phrase-based suggesting is really the primary use case for
      suggesters.

      1. SOLR-3143.patch
        19 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Wow, phrase suggestions are ridiculously complicated to get working.

        I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are:

        1. You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!)
        2. You need to make sure onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance!
        3. (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer.

        Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer.

        In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc.

        But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever.

        Show
        Robert Muir added a comment - Wow, phrase suggestions are ridiculously complicated to get working. I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are: You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!) You need to make sure onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance! (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer. Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer. In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc. But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development