Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10059

Assertion error in JapaneseTokenizer / KoreanTokenizer backtrace

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 8.8
    • 8.x, 9.0, 9.2
    • None
    • None
    • New

    Description

      There is a rare case which causes an AssertionError in the backtrace step of JapaneseTokenizer that we (Amazon Product Search) found in our tests.

      If there is a text span of length 1024 (determined by MAX_BACKTRACE_GAP) where the regular backtrace is not called, a forced backtrace will be applied. If the partially best path at this point happens to end at the last pos, and since there is always a final backtrace applied at the end, the final backtrace will try to backtrace from and to the same position, causing an AssertionError in RollingCharBuffer.get() when it tries to generate an empty buffer.

      We are fixing it by returning prematurely in the backtrace() method when the from and to pos are the same:

          if (endPos == lastBackTracePos) {
            return;
          }
      

      The backtrace() method is essentially no-op when this condition happens, thus when -ea is not enabled, it can still output the correct tokens.

      We will open a PR for this issue.

      Attachments

        1. LUCENE-10059-nori-9x.patch
          0.9 kB
          Tomoko Uchida

        Activity

          People

            Unassigned Unassigned
            dungba Anh Dung Bui
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 40m
                1h 40m