Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4390

ExtractText loses spaces when rotationMagic option is used

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.12, 2.0.13
    • Fix Version/s: 2.0.14
    • Component/s: Text extraction
    • Labels:
      None

      Description

      This was detected by looking at the result of a regression test thankfully done by Tim Allison (see at the end of PDFBOX-4371) for his work in TIKA-2779, there were many new words but some didn't have the spaces. This is the result of a bad angle (180 instead of 0), because the font matrix hasn't been considered, for type 3 fonts this is often a rotation or a flip.

        Attachments

        1. PDFBOX-4390-082220-p1.pdf
          137 kB
          Tilman Hausherr

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                tilman Tilman Hausherr
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: