Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2800

PDFTextStripper calculates the character bounding boxes incorrectly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.8.9
    • None
    • Utilities
    • java version "1.6.0_35"
      Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
      Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

    Description

      For a specific file the extracted coordinates provided by a TextPosition stored in a charactersByArticle variable do not match the actual positions of the characters of the content. Some of the rectangles return with zero heights, and others appear shifted on a vertical axis. I am attaching the files illustrating the issue, both the sample file itself and a highlighted bounding rectangles on the 2nd page that mismatch.

      Attachments

        1. C9002.pdf
          89 kB
          Evgeny Chesnokov
        2. C9002-highlighted.png
          63 kB
          Evgeny Chesnokov

        Activity

          People

            Unassigned Unassigned
            echesnokov Evgeny Chesnokov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: