Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4431

PDFBox recognizes only a few words

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      OS: Windows 10.
      IDE: Oxygen.3a Release (4.7.3a)
      PDF version: Adobe Acrobat Pro DC - 2019.010.20069.49826

      Description

      The code I have posted takes in 5 arguments which include the location to a pdf document and a search term. The code is to parse through the PDF document and return all the matches to the keyword in the document and return their locations depending on the format (last given argument).

      The code for some reason recognizes only a few words and errors on other words. I am not sure why this is.

      There seems to be no difference in these words in terms of font, size location etc.

        Attachments

        1. RS13170.pdf
          11.77 MB
          Krutheeka Rajkumar
        2. RS13170.txt
          80 kB
          Tilman Hausherr

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              K_TorontoVic Krutheeka Rajkumar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: