Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3792

Getting lots of warnings "No Unicode mapping for..." when extract text

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.0.5
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None
    • Flags:
      Important

      Description

      When I use PDFbox to extract text, I get lots of warnings and as output I only get garbage. But when I use Abode Acrobat to export the attached PDF file to text, it works fine. I have attached the original PDF file, the text output and the log with warnings. And besides, PDF file seems to have a Type-1 font embedded with a custom encoding.I have checked lots of reports on JIRA issue tracker, still find no way to solve it.

        Attachments

        1. OutputText.txt
          0.3 kB
          sunny xia
        2. IssueLog.txt
          10 kB
          sunny xia
        3. FileWithIssue.pdf
          21 kB
          sunny xia

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sunny1992 sunny xia
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: