Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1017

Some Ligatures in a PDF file are not recognised.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.6.0
    • None
    • Text extraction
    • Mac OS X 10.6.7, java version "1.6.0_24"

    Description

      In the attached file, some ligatures (Qu, Th, ch, ck, fft, ft, tt) are not transformed but remain in the text with Unicode characters in the private range UE0xx: "...im rabbinisen Sritum in untersiedlien Kontexten und dort,..."

      Attachments

        1. Ligatures.pdf
          295 kB
          Thomas Fischer
        2. Ligatures.txt
          38 kB
          Thomas Fischer

        Activity

          People

            Unassigned Unassigned
            thomas_gb Thomas Fischer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: