Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-324

org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 1.9
    • modules/analysis
    • None
    • Operating System: All
      Platform: All

    • 32687

    Description

      Apparently, in ChineseTokenizer, offset should be decremented like bufferIndex
      when Character is OTHER_LETTER. This directly affects startOffset and endOffset
      values.

      This is critical to have Highlighter working correctly because Highlighter marks
      matching text based on these offset values.

      Attachments

        Activity

          People

            Unassigned Unassigned
            saturnism@gmail.com Ray Tsang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: