Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4322

Extract Text feature is not working for some part of PDF

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2, 2.0.11
    • Fix Version/s: 2.0.12, 3.0.0 PDFBox
    • Component/s: Text extraction
    • Labels:
      None

      Description

      Text Extraction feature cannot extract text from attached pdf properly.

       

      Text inside of rectangle box (e.g value of Lending Specialist and others) is not getting extracted.

        Attachments

        1. PDFBOX-4322-Q3FOMIEI6S2BMGSRZUNRBP2OZQ4BPSKY.pdf
          80 kB
          Tilman Hausherr
        2. PDFBOX-4322-Empty-ToUnicode-reduced.pdf
          21 kB
          Tilman Hausherr
        3. pdf__1.pdf.xml
          4 kB
          Tim Allison
        4. pdf__1.pdf
          524 kB
          Amit Maheshwari

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                aa.amit.mahheshwari Amit Maheshwari
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: