Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-938

Wrong extracted text using PDFBox 1.4

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • Text extraction
    • None

    Description

      Hello ,

      I am using PDFBox v1.4 to extract some text from a PDF, but some words are not extracted right.
      For example words :
      "Nefteiugansk" is read: "Nežeiugansk"
      "fiancee" is read: "Äancée"
      "first" is read: "Ärst"

      Please check the attached file to test this.

      Best regards

      Attachments

        1. Sample.zip
          8.83 MB
          Hesham
        2. Another+book+-+Wrong+extracted+f+char.txt
          2 kB
          Andreas Lehmkühler
        3. Another book - Wrong extracted f char.pdf
          32 kB
          Hesham
        4. Wrong extracted f char.pdf
          455 kB
          Hesham

        Activity

          People

            lehmi Andreas Lehmkühler
            hesham Hesham
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: