Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.12, 3.0.0 PDFBox
-
None
Description
In a first step, detect all rotations by analyzing the effective text rendering matrix. In a second step, do a text extraction for each rotation by prepending an appropriate transform to the page content stream (so that our text has angle == 0) and then filtering any rotated text. Test file: the file fromĀ PDFBOX-4368.
Attachments
Attachments
Issue Links
- is duplicated by
-
PDFBOX-878 Incorrect text extraction when text rotation is not 0,90,180,270
- Resolved
- is related to
-
PDFBOX-4390 ExtractText loses spaces when rotationMagic option is used
- Closed
-
TIKA-2779 Integrate/parameterize new rotated text handling in PDFBox
- Resolved