Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-465

invalid date formats

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0-incubator
    • 1.8.3, 2.0.0
    • Parsing
    • None

    Description

      This is with the latest from svn, Revision: 773978

      From a sample of 13304 pdf documents generated in a very wide variety of ways, I got 94 invalid date formats,

      It seems that all of these have the stack trace of,

      Caused by: java.io.IOException: Error converting date:Friday, July 11, 2008
      at org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:240)
      at org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:120)
      at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:783)
      at org.apache.pdfbox.pdmodel.PDDocumentInformation.getCreationDate(PDDocumentInformation.java:218)
      at message_analyzer.extractor.PDFExtractor.getContent(PDFExtractor.java:50)

      Some examples of invalid dates are,

      20070430193647+713'00'
      Tue Aug 21 10:35:22 2007
      Tuesday, November 04, 2008
      200712172:2:3
      Unknown
      20090319 200122
      9:47 5/12/2008

      i don't think there is any hope of parsing all these date formats. If would be nice if this was not a fatal error, and the parser could continue without a creation date.

      Is the policy of pdfbox to be as forgiving as possible when reading pdf documents? Maybe toCalendar should return a new Calendar() if parsing fails, rather than throwing.

      Attachments

        1. SimpleDateParsingTest.java
          5 kB
          Peter_Lenahan@ibi.com

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lehmi Andreas Lehmkühler
            sgbridges Sean Bridges
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment