Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1447

wasted work in PDFMarkedContentExtractor.processTextPosition()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.1
    • 1.8.0
    • None

    Description

      The problem appears in version 1.7.1 and in revision 1409864. I
      attached a one-line patch that fixes it.

      In method "PDFMarkedContentExtractor.processTextPosition", the loop
      over "sameTextCharacters" should break immediately after
      "suppressCharacter" is set to "true". All the iterations after
      "suppressCharacter" is set to "true" do not perform any useful work,
      at best they just set "suppressCharacter" again to "true".

      Method "processTextPosition" in class "PDFTextStripper" has a similar
      loop, and this loop breaks immediately after "suppressCharacter" is
      set to "true", just like in the proposed patch.

      Attachments

        1. patch.diff
          0.6 kB
          Adrian Nistor

        Activity

          People

            tboehme Timo Boehme
            adriannistor Adrian Nistor
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: