Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4387

Parsing typographic ligatures

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.9, 2.0.12
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None

      Description

      Hello everybody. I tried to parse following pdf, however have a problem with ligatures. Pdf box add extraspace after each of them

      Attached pdf has issue in word flüssig under Persil powder

      however other ligatures are affected too

       

        Attachments

        1. test.pdf
          7.20 MB
          Oleksandr Skoryi

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                AlexFaster Oleksandr Skoryi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: