Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-973

Token of "" returns in CJKTokenizer + new TestCJKTokenizer

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.3
    • 2.9
    • modules/analysis
    • None
    • New, Patch Available

    Description

      The "" string returns as Token in the boundary of two byte character and one byte character.

      There is no problem in CJKAnalyzer.
      When CJKTokenizer is used with the unit, it becomes a problem. (Use it with
      Solr etc.)

      Attachments

        1. LUCENE-973.patch
          12 kB
          Michael McCandless
        2. LUCENE-973.patch
          12 kB
          Koji Sekiguchi
        3. LUCENE-973.patch
          11 kB
          Steven Rowe
        4. with-patch.jpg
          52 kB
          Koji Sekiguchi
        5. without-patch.jpg
          56 kB
          Koji Sekiguchi
        6. CJKTokenizer20070807.patch
          6 kB
          Toru Matsuzawa

        Activity

          People

            mikemccand Michael McCandless
            toru Toru Matsuzawa
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: