Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3438

only garbage extracted, lots of warnings "No Unicode mapping..."

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.0.2
    • None
    • Text extraction
    • None

    Description

      When I try to extract text from this PDF, I get lots of warnings "No Unicode mapping for ...", and as output I only get garbage.
      PDF file displays fine in Acrobat Reader, and pdftotext.exe will extract the text just fine.
      PDF file seems to have a Type-1 font embedded with a custom encoding.

      Attachments

        1. PDFBOX-3438.diff
          0.8 kB
          Tilman Hausherr
        2. PDFBOX-3438.txt
          2 kB
          Tilman Hausherr
        3. test.pdf
          43 kB
          Oliver Steinau

        Activity

          People

            Unassigned Unassigned
            OliverSteinau Oliver Steinau
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: