Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1903

Incorrect ShingleFilter behavior when outputUnigrams == false

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      ShingleFilter isn't working as expected when outputUnigrams == false. In particular, it is outputting unigrams at least some of the time when outputUnigrams==false.

      I'll attach a patch to ShingleFilterTest.java that adds some test cases that demonstrate the problem.

      I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false got changed when the class was upgraded to the new TokenStream API?

        Attachments

        1. LUCENE-1903_testcases_lucene2_4_1_version.patch
          5 kB
          Chris Harris
        2. LUCENE-1903_testcases.patch
          5 kB
          Chris Harris
        3. LUCENE-1903.patch
          8 kB
          Chris Harris
        4. LUCENE-1903.patch
          7 kB
          Uwe Schindler
        5. TEST-org.apache.lucene.analysis.shingle.ShingleFilterTest.xml
          15 kB
          Chris Harris

          Issue Links

            Activity

              People

              • Assignee:
                thetaphi Uwe Schindler
                Reporter:
                ryguasu Chris Harris
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: