Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-970

TeX-created ligatures and umlauts are not recognised

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.5.0
    • None
    • Text extraction
    • Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)

    Description

      Ligatures in a TeX-created document are lost, which are regognised by v. 1.4, e.g.
      1.4 1.5
      official ocial
      effort e ort
      fields elds
      first rst
      In addition, German umlauts (ä, ö, ü) are represented as ( a, o, u),

      Attachments

        1. A Python Library for Provenance Recording and Querying.txt
          28 kB
          Thomas Fischer
        2. A Python Library for Provenance Recording and Querying.txt
          28 kB
          Thomas Fischer
        3. Test.pdf
          58 kB
          Thomas Fischer
        4. Test.pdf
          58 kB
          Thomas Fischer
        5. Test2.1.4.txt
          25 kB
          Thomas Fischer
        6. Test2.pdf
          329 kB
          Thomas Fischer
        7. Test2-1.6.txt
          28 kB
          Thomas Fischer

        Activity

          People

            Unassigned Unassigned
            thomas_gb Thomas Fischer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: