Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1575

PDFTextStripper adds spaces after a detached words

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.8.1
    • None
    • Text extraction
    • Linux 64bit

    Description

      Hello dear developers,

      I noticed that PDFTextStripper sometimes adds spaces after a completely detached words...
      For example - if you make text extraction for attached file you will se that PDFTextStripper adds one space after words: "Qty " and "Unit Price " but not adds after "Description" and "Line Total".
      I think this is a bug, because after words "Qty " and "Unit Price " should not be present the whitespace.
      Can you please fix it?
      (see attach)

      Thank you very much,
      Vitalie

      Attachments

        1. example.pdf
          61 kB
          Vitalie Bureanu

        Activity

          People

            lehmi Andreas Lehmkühler
            vitalie_bureanu Vitalie Bureanu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified