Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8175

ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • trunk, 7.4
    • None
    • None
    • New

    Description

      I was digging some test failures with testRandomHugeStrings that occurred since the upgrade to ICU4J 60.2 which happen to boil down to this bug: http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released yet.

      In short an int[] is shared across several threads while it shouldn't. As a consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on the issue to know when a release fixing this bug is expected.

      Attachments

        1. LUCENE-8175.patch
          7 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            jpountz Adrien Grand
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: