Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2749

Annotations character bounding boxes size 3 times higher than expected

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.8.4
    • Fix Version/s: None
    • Component/s: Text extraction
    • Labels:
      None

      Description

      After text extraction the character bounding boxes 3 times higher than expected. For example, see the first few character bounding boxes below:
      [90.1,46,6.64,40.06],[96.7,46,5.09,40.06],[101.79,46,5.8,40.06].
      The values are x, y, width, height. The width of the characters are between 5 and 7 pixels, but the height of the characters are 40.6 pixels. The actual height of each line of text appears to be about 12 pixels. The example pdf document attached.

        Attachments

        1. RESULT.pdf
          155 kB
          Hayk Hayryan

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hhayryan Hayk Hayryan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: