Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4390

ExtractText loses spaces when rotationMagic option is used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.12, 2.0.13
    • 2.0.14
    • Text extraction
    • None

    Description

      This was detected by looking at the result of a regression test thankfully done by tallison@apache.org (see at the end of PDFBOX-4371) for his work in TIKA-2779, there were many new words but some didn't have the spaces. This is the result of a bad angle (180 instead of 0), because the font matrix hasn't been considered, for type 3 fonts this is often a rotation or a flip.

      Attachments

        1. PDFBOX-4390-082220-p1.pdf
          137 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: