Lucene - Core
  1. Lucene - Core
  2. LUCENE-2199

ShingleFilter skips over trie-shingles if outputUnigram is set to false

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 3.0
    • Fix Version/s: 2.9.2, 3.0.1, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff from http://lucene.markmail.org/message/uq4xdjk26yduvnpa

      I noticed that if I set outputUnigrams to false it gives me the same output for
      maxShingleSize=2 and maxShingleSize=3.

      please divide divide this this sentence

      when i set maxShingleSize to 4 output is:

      please divide please divide this sentence divide this this sentence

      I was expecting the output as follows with maxShingleSize=3 and
      outputUnigrams=false :

      please divide this divide this sentence

      1. LUCENE-2199.patch
        6 kB
        Simon Willnauer
      2. LUCENE-2199.patch
        6 kB
        Simon Willnauer

        Activity

        Simon Willnauer created issue -
        Simon Willnauer made changes -
        Field Original Value New Value
        Attachment LUCENE-2199.patch [ 12429778 ]
        Simon Willnauer made changes -
        Attachment LUCENE-2199.patch [ 12429787 ]
        Simon Willnauer made changes -
        Assignee Simon Willnauer [ simonw ]
        Robert Muir made changes -
        Assignee Simon Willnauer [ simonw ] Robert Muir [ rcmuir ]
        Robert Muir made changes -
        Fix Version/s 2.9.2 [ 12314342 ]
        Fix Version/s 3.0.1 [ 12314401 ]
        Affects Version/s 2.4 [ 12312681 ]
        Affects Version/s 2.4.1 [ 12313516 ]
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Mark Thomas made changes -
        Workflow jira [ 12486421 ] Default workflow, editable Closed status [ 12563474 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563474 ] jira [ 12585071 ]
        Shai Erera made changes -
        Component/s modules/analysis [ 12310230 ]
        Component/s contrib/analyzers [ 12312333 ]

          People

          • Assignee:
            Robert Muir
            Reporter:
            Simon Willnauer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development