-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.0.12, 2.0.13
-
Fix Version/s: 2.0.14
-
Component/s: Text extraction
-
Labels:None
This was detected by looking at the result of a regression test thankfully done by [~tallison@apache.org] (see at the end of PDFBOX-4371) for his work in TIKA-2779, there were many new words but some didn't have the spaces. This is the result of a bad angle (180 instead of 0), because the font matrix hasn't been considered, for type 3 fonts this is often a rotation or a flip.
- relates to
-
PDFBOX-4371 Improve ExtractText utility so that it can extract rotated text automatically
-
- Closed
-
-
TIKA-2779 Integrate/parameterize new rotated text handling in PDFBox
-
- Resolved
-