Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
I'm about a quarter of the way through the run against govdocs1 with 2.0.0 trunk. The log file (threshold=error) for pdfbox alone weighs in at 2 GB. It looks like there is quite a bit of logging along the lines of:
1488429 2015-07-08 19:44:49,460 [pool-2-thread-14] ERROR org.apache.pdfbox.pdmodel.font.FontMapper - Using last-resort fallback for TTF font 'Times-Roman'
Are these truly "error" level events? If so, should they be happening this often? I realize govdocs1 is an aging corpus...
On a recent run with 1.8.9, the pdfbox's error log file was 6.5MB.
Attachments
Issue Links
- relates to
-
TIKA-1285 Upgrade to PDFBox 2.0.0 when available
- Closed