Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1557

NonSequentialPDFParser incorrectly parsing document info

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.8.0
    • 1.8.1
    • Parsing
    • None
    • Mac OS X 10.6.8, Eclipse Version: Juno Service Release 2 (Build id: 20130225-0426), Java SE 6 (1.6.0)

    Description

      When using the NonSequentialPDFParser, the PDDocumentInformation returned by getDocumentInformation() seems to contain all null entries, which does not occur when using the standard PDFParser. I have a large batch of PDF files which have random and strange issues that cause them to occasionally fail with the standard parser, so I was experimenting with the NonSequential parser and came across this issue.

      I'll attempt to attach some test code & a test PDF file for which I can replicate the issue.

      Attachments

        1. TestParsers.java
          1.0 kB
          Robert Bartlett-Schneider
        2. JIRA-1557.patch
          1 kB
          Eric Leleu
        3. aa.pdf
          404 kB
          Robert Bartlett-Schneider

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              tullisar Robert Bartlett-Schneider
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: