Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1575

PDFTextStripper adds spaces after a detached words

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.8.1
    • Fix Version/s: None
    • Component/s: Text extraction
    • Environment:
      Linux 64bit

      Description

      Hello dear developers,

      I noticed that PDFTextStripper sometimes adds spaces after a completely detached words...
      For example - if you make text extraction for attached file you will se that PDFTextStripper adds one space after words: "Qty " and "Unit Price " but not adds after "Description" and "Line Total".
      I think this is a bug, because after words "Qty " and "Unit Price " should not be present the whitespace.
      Can you please fix it?
      (see attach)

      Thank you very much,
      Vitalie

        Attachments

        1. example.pdf
          61 kB
          Vitalie Bureanu

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              vitalie_bureanu Vitalie Bureanu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified