Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2207

CJKTokenizer generates tokens with incorrect offsets

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      If I index a Japanese multi-valued document with CJKTokenizer and highlight a term with FastVectorHighlighter, the output snippets have incorrect highlighted string. I'll attach a program that reproduces the problem soon.

        Attachments

        1. LUCENE-2207.patch
          6 kB
          Koji Sekiguchi
        2. LUCENE-2207.patch
          4 kB
          Robert Muir
        3. LUCENE-2207.patch
          2 kB
          Robert Muir
        4. LUCENE-2207.patch
          0.7 kB
          Robert Muir
        5. TestCJKOffset.java
          7 kB
          Koji Sekiguchi

          Issue Links

            Activity

              People

              • Assignee:
                rcmuir Robert Muir
                Reporter:
                koji Koji Sekiguchi
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: