Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2258

Unable to parse .pub files -java.lang.ArrayIndexOutOfBoundsException: 88

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13
    • None
    • core, parser
    • None
    • Windows 7

    Description

      When i try to parse the attached .pub file, it fails with the below exception

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 88
      at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:343)
      at org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.<init>(QCPLCBit.java:215)
      at org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.<init>(QCPLCBit.java:176)
      at org.apache.poi.hpbf.model.qcbits.QCPLCBit.createQCPLCBit(QCPLCBit.java:90)
      at org.apache.poi.hpbf.model.QuillContents.<init>(QuillContents.java:71)
      at org.apache.poi.hpbf.HPBFDocument.<init>(HPBFDocument.java:67)
      at org.apache.poi.hpbf.extractor.PublisherTextExtractor.<init>(PublisherTextExtractor.java:45)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:141)
      at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      ... 28 more

      Attachments

        1. Roc.pub
          117 kB
          Sharath Kumar

        Activity

          People

            Unassigned Unassigned
            mnsk07 Sharath Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: