Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2207

CJKTokenizer generates tokens with incorrect offsets

Details

    • New, Patch Available

    Description

      If I index a Japanese multi-valued document with CJKTokenizer and highlight a term with FastVectorHighlighter, the output snippets have incorrect highlighted string. I'll attach a program that reproduces the problem soon.

      Attachments

        1. TestCJKOffset.java
          7 kB
          Koji Sekiguchi
        2. LUCENE-2207.patch
          0.7 kB
          Robert Muir
        3. LUCENE-2207.patch
          2 kB
          Robert Muir
        4. LUCENE-2207.patch
          4 kB
          Robert Muir
        5. LUCENE-2207.patch
          6 kB
          Koji Sekiguchi

        Issue Links

          Activity

            People

              rcmuir Robert Muir
              koji Koji Sekiguchi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: