Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3792

Getting lots of warnings "No Unicode mapping for..." when extract text

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 2.0.5
    • None
    • Text extraction
    • None
    • Important

    Description

      When I use PDFbox to extract text, I get lots of warnings and as output I only get garbage. But when I use Abode Acrobat to export the attached PDF file to text, it works fine. I have attached the original PDF file, the text output and the log with warnings. And besides, PDF file seems to have a Type-1 font embedded with a custom encoding.I have checked lots of reports on JIRA issue tracker, still find no way to solve it.

      Attachments

        1. OutputText.txt
          0.3 kB
          sunny xia
        2. IssueLog.txt
          10 kB
          sunny xia
        3. FileWithIssue.pdf
          21 kB
          sunny xia

        Activity

          People

            Unassigned Unassigned
            sunny1992 sunny xia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: