Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4834

Wrong read characters for Hindi conjuncts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.0.19
    • None
    • Text extraction
    • None
    • Windows 10, Java 9.

    Description

      When reading this Hindi PDF book using PDFBox 2.0.19:

      https://dl.dropboxusercontent.com/s/laixlb5omvjqr7y/Hindi%20Book.pdf?dl=0

       

      It reads it with some wrong characters for conjuncts as it appears in this file:

      https://dl.dropboxusercontent.com/s/efyxz2eg37gvn4c/Text%20read%20by%20PDFBox%202.0.19.txt?dl=0

      Attachments

        1. PDFBOX-4834-Hindi.pdf
          168 kB
          Tilman Hausherr

        Activity

          People

            Unassigned Unassigned
            hesham Hesham
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: