Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2812

NPE when parsing text with write limit set on IBM JDK

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.20
    • Fix Version/s: None
    • Component/s: core
    • Labels:
    • Environment:

      IBM JDK 8

      Description

      We have updated Tika from version 1.14 to recently released 1.20 and are now experiencing an issue with parsing of texts when write limit is set (we are using WriteOutContentHandler) on IBM JDK 8.

      Test class TikaTest.java and test file test.txt are attached.

      The issue is present on IBM JDK 8 output-ibm-jdk-tika-1.20.txt, but not on Oracle output-oracle-jdk-tika-1.20.txt or Open JDK 8 output-open-jdk-tika-1.20.txt.

      With Tika 1.14 we had no this issue output-ibm-jdk-tika-1.14.txt.

      Analysis:
      With the fix in TIKA-2668 (https://github.com/apache/tika/commit/89a588e4d8d2aa44a9d3c965d514c18c7d3c134d#diff-5a28529cf32968d35a5036172cd8f74fL41) a line was removed from the constructor of the TaggedSAXException class:

      initCause(original); // SAXException has it's own chaining mechanism!
      

      Bringing the line back, solves our issue with JDK 8, but breaks the things on JDK 11 output-oracle-jdk-11-tika-1.20.txt.

      Is there any chance the class TaggedSAXException can be made compatible with JDK 8 and JDK 11 (both Oracle/OpenJDK and IBM one)?

      Thank you in advance!

      Kind regards
      Sergiy Shyrkov

        Attachments

        1. output-ibm-jdk-tika-1.20.txt
          0.9 kB
          Sergiy Shyrkov
        2. output-ibm-jdk-tika-1.14.txt
          0.3 kB
          Sergiy Shyrkov
        3. test.txt
          0.0 kB
          Sergiy Shyrkov
        4. output-oracle-jdk-tika-1.20.txt
          0.3 kB
          Sergiy Shyrkov
        5. output-open-jdk-tika-1.20.txt
          0.3 kB
          Sergiy Shyrkov
        6. TikaTest.java
          2 kB
          Sergiy Shyrkov
        7. output-oracle-jdk-11-tika-1.20.txt
          4 kB
          Sergiy Shyrkov

          Activity

            People

            • Assignee:
              tallison@apache.org Tim Allison
              Reporter:
              shyrkov Sergiy Shyrkov
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: