[PDFBOX-1447] wasted work in PDFMarkedContentExtractor.processTextPosition() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.7.1
Fix Version/s: 1.8.0
Component/s: None
Labels:
- patch
- perfomance

Description

The problem appears in version 1.7.1 and in revision 1409864. I
attached a one-line patch that fixes it.

In method "PDFMarkedContentExtractor.processTextPosition", the loop
over "sameTextCharacters" should break immediately after
"suppressCharacter" is set to "true". All the iterations after
"suppressCharacter" is set to "true" do not perform any useful work,
at best they just set "suppressCharacter" again to "true".

Method "processTextPosition" in class "PDFTextStripper" has a similar
loop, and this loop breaks immediately after "suppressCharacter" is
set to "true", just like in the proposed patch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

patch.diff
15/Nov/12 18:50
0.6 kB
Adrian Nistor

Activity

People

Assignee:: Timo Boehme

Reporter:: Adrian Nistor

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Nov/12 18:49

Updated:: 23/Mar/13 12:56

Resolved:: 18/Nov/12 14:26