Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1606

NonSequentialPDFParser produces garbage text in document info

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.8.1
    • 1.8.3, 2.0.0
    • Parsing
    • None
    • Windows 7, JRE 1.7.0_15-b03

    Description

      For some documents, NonSequentialPDFParser produces PDDocumentInformation with binary garbage in its fields (title/author/producer/etc). Invocation of PDDocumentInformation.getXXXDate() methods fails with "IOException:Error converting date" for those documents.

      Classic PDFParser does not have problems with the same documents.

      Attachments

        1. PDFBOX-1606.patch
          2 kB
          Sebastian Nagel
        2. 00-214 EU Data Protection Directive Update 12-1.pdf
          30 kB
          Alex Alishevskikh

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              alexeya Alex Alishevskikh
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: