Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4265

Not able to extract text from Japanese PDF

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Not A Bug
    • Affects Version/s: 2.0.2
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None
    • Environment:
      Windows 10, Region settings set to Japanese

      Description

      Not able to extract text from Japanese PDF attached(jpn.pdf).

      Although, it works well with another Japanese PDF.

       

      Also, Is there any overloaded method that accepts Encoding for text extraction? If yes, please let us know.

       

      Thank you.

        Attachments

        1. jpn.pdf
          293 kB
          Viral Valand
        2. jpn.txt
          5 kB
          Viral Valand
        3. CommandLine.txt
          66 kB
          Viral Valand

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              viralvaland Viral Valand
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: