Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4532

PDFTextStripper replacing the decimal with white space

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.15
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:

      Description

      I'm using the PDFTextStripperByArea to be specific and trying to extract a particular area from the document. 

      In the output most the numbers (all but one) have their decimal point replaced by a white space. When I copy and paste the text using Abobe reader/chrome the decimal point are preserved.

        Attachments

        1. code_textStripper.PNG
          10 kB
          Akash Gupta
        2. FSUSA00BDD.pdf
          275 kB
          Akash Gupta
        3. numbers_without_decimal.PNG
          5 kB
          Akash Gupta
        4. PDFBOX-4532-reduced.pdf
          85 kB
          Tilman Hausherr

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              akashsgpgi Akash Gupta
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: