Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3445

Can not read PDF correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.0.2
    • None
    • FontBox, Text extraction
    • None

    Description

      Hi Team,
      I have two PDF in Gujarati language but font is Different, 1st PDF have Shruti font and 2nd PDF have LMG-RUPE font, Shruti read correctly in tika parser and it gives me a correct output, but LMG-RUPE pdf gives me a worng output. Metadata is same for both pdf.
      1) https://drive.google.com/open?id=0B4Sse_x7pvrqRnRETzNsUk1BY0k (Shruti font)
      2) https://drive.google.com/open?id=0B4Sse_x7pvrqVC0zb2NqTzNvYVU (LMG-RUPE font)

      Attachments

        1. PDFBOX-3445-rupen.pdf
          48 kB
          Tilman Hausherr
        2. PDFBOX-3445-rupen-debugger.png
          118 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalbhalala gopalbhalala
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: