Description
PDFBOX's date parser can throw a StringIndexOutOfBoundsException if a date string for parsing is empty or contains only spaces. A few of my test pdfs have this "feature."
Until PDFBOX-1803 is resolved, we can add an extra catch to prevent this from causing problems in TIKA
@@ -171,6 +171,9 @@
addMetadata(metadata, TikaCoreProperties.CREATED, info.getCreationDate());
} catch (IOException e) {
// Invalid date format, just ignore
+ } catch (StringIndexOutOfBoundsException e){
+ //remove after PDFBOX-1883 is fixed
+ // Invalid date format, just ignore
}
try {
Calendar modified = info.getModificationDate();
@@ -178,6 +181,9 @@
addMetadata(metadata, TikaCoreProperties.MODIFIED, modified);
} catch (IOException e) {
// Invalid date format, just ignore
+ } catch (StringIndexOutOfBoundsException e){
+ //remove after PDFBOX-1883 is fixed
+ // Invalid date format, just ignore
}
Attachments
Issue Links
- depends upon
-
PDFBOX-2823 StringIndexOutOfBoundsException when doing DateConverter.parseDate()
-
- Closed
-
-
PDFBOX-1803 StringIndexOutOfBound on DateConverter.toCalendar
-
- Closed
-