Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5379

Query-time multi-word synonym expansion

    XMLWordPrintableJSON

Details

    Description

      While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons:

      • First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion
      • Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words.

      For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605.

      For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym.

      Attachments

        1. synonym-expander-4_8_1.patch
          25 kB
          Jeremy Anderson
        2. synonym-expander.patch
          16 kB
          Tien Nguyen Manh
        3. solr-5379-version-4.10.3.patch
          57 kB
          Rafał Kuć
        4. quoted-4_8_1.patch
          21 kB
          Jeremy Anderson
        5. quoted.patch
          21 kB
          Tien Nguyen Manh
        6. conf-test-files-4_8_1.patch
          6 kB
          Jeremy Anderson

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tiennm Tien Nguyen Manh
              Votes:
              20 Vote for this issue
              Watchers:
              40 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: