Description
PDF's created by ScanSoft's PDF driver have '-' where a number is needed, as in 1 0 0 1 - 783 Tm. See http://markmail.org/message/r63jfd5wybejzbkr for details.
Proposal: interpret - as 0. Patch:
— /tmp/x/pdfbox-0.8.0-incubating/src/main/java/org/apache/pdfbox/pdfparser/PDFStreamParser.java 2009-09-14 19:39:44.000000000 -0400
+++ src/main/java/org/apache/pdfbox/pdfparser/PDFStreamParser.java 2010-01-07 00:14:45.000000000 -0500
@@ -252,7 +252,12 @@
dotNotRead = false;
}
}
- retval = COSNumber.get( buf.toString() );
+ String number = buf.toString();
+ /* accommodate PDF files (such as ScanSoft-created ones that output '-'
+ * where a number is expected. Substitute a 0 */
+ if ("-".equals(number))
+ number = "0";
+ retval = COSNumber.get( number );
break;
}
case 'B':