Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4521

Missing Info value from file trailer: org.apache.pdfbox.cos.COSName cannot be cast to org.apache.pdfbox.cos.COSDictionary

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.15
    • Fix Version/s: 2.0.16, 3.0.0 PDFBox
    • Component/s: Parsing
    • Labels:
      None

      Description

      The following exception

      Cause: java.lang.ClassCastException: org.apache.pdfbox.cos.COSName cannot be cast to org.apache.pdfbox.cos.COSDictionary at org.apache.pdfbox.pdmodel.PDDocument.getDocumentInformation(PDDocument.java:740) at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:242) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)

      is generated by PDF documents that have no value in the file trailer for the Info key, eg:

      << /Size 50/Root 8 0 R/Info /ID >>
      

      According to the PDF spec the Info key is optional. PDFBox correctly handles the case when there is no Info key and no value is present, but in this case, the key is present but without a value.

        Attachments

        1. Editathon_cheat_sheet_(EN)_MetaDefender.pdf
          162 kB
          Oliver Mannion
        2. Editathon_cheat_sheet_(EN).pdf
          164 kB
          Oliver Mannion

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              oliman Oliver Mannion
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified