Solr
  1. Solr
  2. SOLR-744

Patch to make ShingleFilter.outputUnigramsIfNoShingles (LUCENE-1370) available in Solr schema files

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1, 4.0-ALPHA
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      1. SOLR-744.patch
        1 kB
        Chris Harris
      2. SOLR-744.patch
        3 kB
        Steve Rowe

        Issue Links

          Activity

          Chris Harris created issue -
          Chris Harris made changes -
          Field Original Value New Value
          Attachment SOLR-744.patch [ 12389261 ]
          Chris Harris made changes -
          Link This issue relates to LUCENE-1370 [ LUCENE-1370 ]
          Hide
          Tom Burton-West added a comment -

          I applied both this and LUCENE-1370 and there seems to be some problem with passing arguments from the ShingleFilterFactory to the ShingleFilter. The admin analyzer says that outputUnigramIfNoNgram=true

          org.apache.solr.analysis.ShingleFilterFactory

          {outputUnigrams=false, outputUnigramIfNoNgram=true}

          However, this does not seem to be getting set within the ShingleFilter and the admin analyzer shows nothing coming out of the ShingleFilterFactory when analyzing a query with a single word.
          when using the admin interface to query a single word, I also get no results.

          If I hack the patch by always setting outputUnigramsIfNoNgrams to true, everything works fine.
          (see below)

          If I am missing something or obviously doing something wrong, please let me know. In the meantime I will try to write a unit test and track down the problem. Is there an already existing unit test I could use as a model?

          Tom Burton-West
          ------------------------------------------------------

          Hack

          public void init(Map<String, String> args)

          { super.init(args); maxShingleSize = getInt("maxShingleSize", ShingleFilter.DEFAULT_MAX_SHINGLE_SIZE); outputUnigrams = getBoolean("outputUnigrams", true); outputUnigramIfNoNgrams = true; /** tbw lets always set it to true above * comment out the original code below getBoolean("outputUnigramIfNoNgram", false); **/ }
          Show
          Tom Burton-West added a comment - I applied both this and LUCENE-1370 and there seems to be some problem with passing arguments from the ShingleFilterFactory to the ShingleFilter. The admin analyzer says that outputUnigramIfNoNgram=true org.apache.solr.analysis.ShingleFilterFactory {outputUnigrams=false, outputUnigramIfNoNgram=true} However, this does not seem to be getting set within the ShingleFilter and the admin analyzer shows nothing coming out of the ShingleFilterFactory when analyzing a query with a single word. when using the admin interface to query a single word, I also get no results. If I hack the patch by always setting outputUnigramsIfNoNgrams to true, everything works fine. (see below) If I am missing something or obviously doing something wrong, please let me know. In the meantime I will try to write a unit test and track down the problem. Is there an already existing unit test I could use as a model? Tom Burton-West ------------------------------------------------------ Hack public void init(Map<String, String> args) { super.init(args); maxShingleSize = getInt("maxShingleSize", ShingleFilter.DEFAULT_MAX_SHINGLE_SIZE); outputUnigrams = getBoolean("outputUnigrams", true); outputUnigramIfNoNgrams = true; /** tbw lets always set it to true above * comment out the original code below getBoolean("outputUnigramIfNoNgram", false); **/ }
          Hide
          Chris Harris added a comment -

          Tom,

          The Lucene half of this patch pair adds unit tests to src/test/org/apache/lucene/analysis/shingle/ShingleFilterTest.java. Do those tests pass when you run them on your custom lucene build, after applying LUCENE-1370? (cd to the top-level of lucene and then run "ant test -Dtestcase=ShingleFilterTest".) I didn't add any tests for the Solr half of the patch pair, but I also don't know how you would test it in a productive manner.

          Show
          Chris Harris added a comment - Tom, The Lucene half of this patch pair adds unit tests to src/test/org/apache/lucene/analysis/shingle/ShingleFilterTest.java. Do those tests pass when you run them on your custom lucene build, after applying LUCENE-1370 ? (cd to the top-level of lucene and then run "ant test -Dtestcase=ShingleFilterTest".) I didn't add any tests for the Solr half of the patch pair, but I also don't know how you would test it in a productive manner.
          Hide
          Tom Burton-West added a comment -

          Hi Chris,

          Thanks for your kind reply. The lucene unit tests passed. It turns out that we had a configuration error that left an unpatched version of ShingleFilter on the classpath when Solr started up. Once we made sure that the patched version was loading, everything has been working just fine.

          Tom

          Show
          Tom Burton-West added a comment - Hi Chris, Thanks for your kind reply. The lucene unit tests passed. It turns out that we had a configuration error that left an unpatched version of ShingleFilter on the classpath when Solr started up. Once we made sure that the patched version was loading, everything has been working just fine. Tom
          Steve Rowe made changes -
          Assignee Steven Rowe [ steve_rowe ]
          Steve Rowe made changes -
          Fix Version/s 3.1 [ 12314371 ]
          Fix Version/s 4.0 [ 12314992 ]
          Affects Version/s 3.1 [ 12314371 ]
          Affects Version/s 4.0 [ 12314992 ]
          Hide
          Steve Rowe added a comment -

          Updated patch to reflect changed option name from LUCENE-1370 (outputUnigramIfNoNgram -> outputUnigramsIfNoShingles. Added a simple test to TestShingleFilterFactory.java for the single input token case. Added a solr/CHANGES.txt entry.

          Unless there are objections, I will commit this in a couple of days, after LUCENE-1370 has been committed.

          Show
          Steve Rowe added a comment - Updated patch to reflect changed option name from LUCENE-1370 ( outputUnigramIfNoNgram -> outputUnigramsIfNoShingles . Added a simple test to TestShingleFilterFactory.java for the single input token case. Added a solr/CHANGES.txt entry. Unless there are objections, I will commit this in a couple of days, after LUCENE-1370 has been committed.
          Steve Rowe made changes -
          Attachment SOLR-744.patch [ 12456518 ]
          Steve Rowe made changes -
          Component/s Schema and Analysis [ 12312520 ]
          Hide
          Steve Rowe added a comment -

          Committed: trunk revision 1006191, branch_3x revision 1006199

          Show
          Steve Rowe added a comment - Committed: trunk revision 1006191, branch_3x revision 1006199
          Steve Rowe made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Steve Rowe made changes -
          Summary Patch to make ShingleFilter.outputUnigramIfNoNgrams (LUCENE-1370) available in Solr schema files Patch to make ShingleFilter.outputUnigramIfNoShingles (LUCENE-1370) available in Solr schema files
          Steve Rowe made changes -
          Summary Patch to make ShingleFilter.outputUnigramIfNoShingles (LUCENE-1370) available in Solr schema files Patch to make ShingleFilter.outputUnigramsIfNoShingles (LUCENE-1370) available in Solr schema files
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Steve Rowe
              Reporter:
              Chris Harris
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development