Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-970

TeX-created ligatures and umlauts are not recognised

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
    • Environment:
      Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)

      Description

      Ligatures in a TeX-created document are lost, which are regognised by v. 1.4, e.g.
      1.4 1.5
      official ocial
      effort e ort
      fields elds
      first rst
      In addition, German umlauts (ä, ö, ü) are represented as ( a, o, u),

        Attachments

        1. A Python Library for Provenance Recording and Querying.txt
          28 kB
          Thomas Fischer
        2. A Python Library for Provenance Recording and Querying.txt
          28 kB
          Thomas Fischer
        3. Test.pdf
          58 kB
          Thomas Fischer
        4. Test.pdf
          58 kB
          Thomas Fischer
        5. Test2.1.4.txt
          25 kB
          Thomas Fischer
        6. Test2.pdf
          329 kB
          Thomas Fischer
        7. Test2-1.6.txt
          28 kB
          Thomas Fischer

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              thomas_gb Thomas Fischer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: