Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-908

Port of Nutch CommonGrams filter to Solr

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.4
    • Schema and Analysis
    • None

    Description

      Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example "to be or not to be", "the who", "man in the moon" vs "man on the moon" etc.)

      Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on.

      It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml.

      "Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid."
      http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html

      Attachments

        1. CommonGramsPort.zip
          14 kB
          Tom Burton-West
        2. SOLR-908.patch
          38 kB
          Jason Rutherglen
        3. SOLR-908.patch
          37 kB
          Jason Rutherglen
        4. SOLR-908.patch
          36 kB
          Jason Rutherglen
        5. SOLR-908.patch
          38 kB
          Jason Rutherglen
        6. SOLR-908.patch
          43 kB
          Jason Rutherglen
        7. SOLR-908.patch
          45 kB
          Jason Rutherglen
        8. SOLR-908.patch
          91 kB
          Jason Rutherglen
        9. SOLR-908.patch
          47 kB
          Tom Burton-West
        10. SOLR-908.patch
          46 kB
          Tom Burton-West

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yseeley@gmail.com Yonik Seeley
            tbw Tom Burton-West
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment