Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1447

wasted work in PDFMarkedContentExtractor.processTextPosition()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.1
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:

      Description

      The problem appears in version 1.7.1 and in revision 1409864. I
      attached a one-line patch that fixes it.

      In method "PDFMarkedContentExtractor.processTextPosition", the loop
      over "sameTextCharacters" should break immediately after
      "suppressCharacter" is set to "true". All the iterations after
      "suppressCharacter" is set to "true" do not perform any useful work,
      at best they just set "suppressCharacter" again to "true".

      Method "processTextPosition" in class "PDFTextStripper" has a similar
      loop, and this loop breaks immediately after "suppressCharacter" is
      set to "true", just like in the proposed patch.

        Attachments

        1. patch.diff
          0.6 kB
          Adrian Nistor

          Activity

            People

            • Assignee:
              tboehme Timo Boehme
              Reporter:
              adriannistor Adrian Nistor
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: