Lucene - Core
  1. Lucene - Core
  2. LUCENE-3717

Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Recently lots of issues have been fixed about broken offsets, but it would be nice to improve the
      test coverage and test that they work across the board (especially with charfilters).

      in BaseTokenStreamTestCase.checkRandomData, we can sometimes pass the analyzer a reader wrapped
      in a "MockCharFilter" (the one in the patch sometimes doubles characters). If the analyzer does
      not call correctOffsets or does incorrect "offset math" (LUCENE-3642, etc) then eventually
      this will create offsets and the test will fail.

      Other than tests bugs, this found 2 real bugs: ICUTokenizer did not call correctOffset() in its end(),
      and ThaiWordFilter did incorrect offset math.

      1. LUCENE-3717_ngram.patch
        22 kB
        Robert Muir
      2. LUCENE-3717_more.patch
        39 kB
        Robert Muir
      3. LUCENE-3717.patch
        14 kB
        Robert Muir

        Activity

        Robert Muir created issue -
        Robert Muir made changes -
        Field Original Value New Value
        Attachment LUCENE-3717.patch [ 12511456 ]
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Robert Muir made changes -
        Attachment LUCENE-3717_more.patch [ 12511473 ]
        Robert Muir made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Robert Muir made changes -
        Attachment LUCENE-3717_ngram.patch [ 12511656 ]
        Robert Muir made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development