Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3438

only garbage extracted, lots of warnings "No Unicode mapping..."

    Details

    • Type: Wish
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.0.2
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None

      Description

      When I try to extract text from this PDF, I get lots of warnings "No Unicode mapping for ...", and as output I only get garbage.
      PDF file displays fine in Acrobat Reader, and pdftotext.exe will extract the text just fine.
      PDF file seems to have a Type-1 font embedded with a custom encoding.

        Attachments

        1. PDFBOX-3438.diff
          0.8 kB
          Tilman Hausherr
        2. PDFBOX-3438.txt
          2 kB
          Tilman Hausherr
        3. test.pdf
          43 kB
          Oliver Steinau

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              OliverSteinau Oliver Steinau
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: