Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8175

ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was digging some test failures with testRandomHugeStrings that occurred since the upgrade to ICU4J 60.2 which happen to boil down to this bug: http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released yet.

      In short an int[] is shared across several threads while it shouldn't. As a consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on the issue to know when a release fixing this bug is expected.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jpountz Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: