Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4265

Not able to extract text from Japanese PDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not A Bug
    • 2.0.2
    • None
    • Text extraction
    • None
    • Windows 10, Region settings set to Japanese

    Description

      Not able to extract text from Japanese PDF attached(jpn.pdf).

      Although, it works well with another Japanese PDF.

       

      Also, Is there any overloaded method that accepts Encoding for text extraction? If yes, please let us know.

       

      Thank you.

      Attachments

        1. jpn.pdf
          293 kB
          Viral Valand
        2. jpn.txt
          5 kB
          Viral Valand
        3. CommandLine.txt
          66 kB
          Viral Valand

        Activity

          People

            Unassigned Unassigned
            viralvaland Viral Valand
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: