Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Implemented
-
0.7.3, 0.8.0-incubator
-
None
-
None
-
Tried in Java 5 and 6
Description
Having the same problem with 0.7.3, 0.7.4-dev and 0.8.0 - in 0.7.3 I get text with nulls, e.g. "Dermoapo made 'interactive updates' a key part onullits stratenull nullr launnull chinnulla new skincare rannull in a competitive market. nulle resultnullIncreased sales nullr pharmacies that used the updates." while in 0.8.0 it appears as "Dermoapo made 'interactive updates' a key part o?its strate? ?r laun?
chin?a new skincare ran? in a competitive market. ?e result?Increased
sales ?r pharmacies that used the updates."
Maybe this is a font problem? Or encoding? I debugged the code in PDFTextStripper and and these appear in the charactersByArticle field even before normalization.
In 0.8.0 I get some info logs from the engine:
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: re
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: W
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: n
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: cs
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: scn
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: f
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: CS
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: SCN
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: M
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: m
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: l
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: S
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: BDC
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: c
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: v
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: y
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: h
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: g
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: G
SP INFO 20:52:12 PDFStreamEngine - unsupported/disabled operation: EMC
I got the same error with icu4j 3.6.1 and 4.2.1
Attachments
Attachments
Issue Links
- requires
-
PDFBOX-619 Adobe CFF/Type2 font encoding enhancements
- Closed
-
PDFBOX-542 Support for Adobe CFF/Type2 fonts
- Closed