Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1622

TextNormalize init not thread-safe, may lead to infinite loop

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.8.3, 2.0.0
    • Component/s: Utilities
    • Labels:
      None

      Description

      TextNormalize fills a static HashMap (DIACHASH) from a method (populateDiacHash) called by the TextNormalize constructor.

      If the constructor is called from two different threads at the same time, then the HashMap may be written by two concurrent threads which may and will cause infinite loops.

      We see the CPU at 100% and jstack shows 4 threads all stuck at:

      "Thread-2" prio=10 tid=0x00007f6e94499000 nid=0x347 runnable [0x00007f6e925d6000]
      java.lang.Thread.State: RUNNABLE
      at java.util.HashMap.put(HashMap.java:391)
      at org.apache.pdfbox.util.TextNormalize.populateDiacHash(TextNormalize.java:82)
      at org.apache.pdfbox.util.TextNormalize.<init>(TextNormalize.java:41)
      at org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:193)

      A patch to fix this is attached, it just moves the initialization to a static block.

      Please apply to the 1.8.3 and 2.0.0 branches.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                fguillaume Florent Guillaume
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: