Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4806

Trying to extract the text from this PDF, getting unicodes.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Not A Bug
    • 2.0.19
    • None
    • Text extraction
    • None
    • Java

    Description

      Trying to extract the text from this PDF, getting unicodes.. Need help

       

      ,"word_identifier":"\u0015\u0013\u0014\u001c"} 

      "word_identifier":"(LQJDQJVVWHPSHO"}

       

      Apr 02, 2020 4:39:05 PM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap

      WARNING: Invalid ToUnicode CMap in font DVWIYK+font00000000242bcd5a

      Apr 02, 2020 4:39:05 PM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap

      WARNING: Invalid ToUnicode CMap in font BBTJPM+font00000000242bcd5a

      Attachments

        1. ESt_1_A_2019.pdf
          75 kB
          Tilman Hausherr

        Activity

          People

            Unassigned Unassigned
            cherrysri Cherry Sri
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: