Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4387

Parsing typographic ligatures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.0.9, 2.0.12
    • None
    • Text extraction
    • None

    Description

      Hello everybody. I tried to parse following pdf, however have a problem with ligatures. Pdf box add extraspace after each of them

      Attached pdf has issue in word flüssig under Persil powder

      however other ligatures are affected too

       

      Attachments

        1. test.pdf
          7.20 MB
          Oleksandr Skoryi

        Issue Links

          Activity

            People

              Unassigned Unassigned
              AlexFaster Oleksandr Skoryi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: