Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1956

NPE in WordParser when trying to getPicOffset

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11
    • Fix Version/s: 2.0, 1.13
    • Component/s: parser
    • Labels:
      None
    • Environment:

      Ubuntu 14.04, tika-server-1.11.jar,

      Description

      Tika-server gives 422 error:
      /rmeta throws 422 error,
      /tika gives text but its partial,
      The text is parsed till beginning of an image.
      This is the last text which is parsed.
      <h4>17.5.7.1
      BM-SC Initiated Multicast Service Deactivation
      </h4>
      <p>

        Activity

        Hide
        ramitwadhwa Ramit Wadhwa added a comment -

        Sample File can be downloaded & unziped from "http://www.3gpp.org/ftp/Specs/2015-03/Rel-10/29_series/29061-ac0.zip"

        Show
        ramitwadhwa Ramit Wadhwa added a comment - Sample File can be downloaded & unziped from "http://www.3gpp.org/ftp/Specs/2015-03/Rel-10/29_series/29061-ac0.zip"
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Thank you for raising this issue and supplying a triggering document!

        We were trusting that POI would return a non-null value from field.getMarkEndCharacterRun(r) on which we called .getPicOffset() to label the attachment.

        I added a null check, and the full document is parsed. I'll commit this shortly.

        Thank you, again.

        Show
        tallison@mitre.org Tim Allison added a comment - Thank you for raising this issue and supplying a triggering document! We were trusting that POI would return a non-null value from field.getMarkEndCharacterRun(r) on which we called .getPicOffset() to label the attachment. I added a null check, and the full document is parsed. I'll commit this shortly. Thank you, again.
        Hide
        tallison@mitre.org Tim Allison added a comment -

        Fixed. Thank you Ramit Wadhwa , again, for opening this and sharing a triggering document.

        Show
        tallison@mitre.org Tim Allison added a comment - Fixed. Thank you Ramit Wadhwa , again, for opening this and sharing a triggering document.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-trunk-jdk1.7 #959 (See https://builds.apache.org/job/tika-trunk-jdk1.7/959/)
        TIKA-1956 – prevent NPE when trying to get embedded image offset in (tallison: rev dab10395ab59a36be52529d3afa7ed370ce60eef)

        • tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
        • CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-trunk-jdk1.7 #959 (See https://builds.apache.org/job/tika-trunk-jdk1.7/959/ ) TIKA-1956 – prevent NPE when trying to get embedded image offset in (tallison: rev dab10395ab59a36be52529d3afa7ed370ce60eef) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in tika-2.x #85 (See https://builds.apache.org/job/tika-2.x/85/)
        TIKA-1956 – prevent NPE when trying to get embedded image offset in (tallison: rev 20834d0b09c85ab680e868317fdec017dfb96061)

        • tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
        • CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in tika-2.x #85 (See https://builds.apache.org/job/tika-2.x/85/ ) TIKA-1956 – prevent NPE when trying to get embedded image offset in (tallison: rev 20834d0b09c85ab680e868317fdec017dfb96061) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java CHANGES.txt

          People

          • Assignee:
            tallison@mitre.org Tim Allison
            Reporter:
            ramitwadhwa Ramit Wadhwa
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development