Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-577

TextPosition should expose its bounding box

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • PDModel
    • None

    Description

      It does not seem to be possible to calculate the bounding box of a TextPosition.

      IIUC, TextPosition#getY is the baseline of the text and TextPosition#getHeight is the absolute height of the text. When I subtract the latter from the former I get a top line, but this is only correct if the text does not contain descender characters.

      Below is a screenshot (AFM-getHeight.png) which shows the bounding boxes of TextPositions calculated as

      {#getX(), #getY() - #getHeight, #getWidth, #getHeight}

      painted in random colors. For example, the bounding boxes of parentheses are severely misplaced, which makes the line-by-line text extraction impossible.

      Right now I've solved the problem by tweaking AFM FontMetrics code so that it returns BoundingBox#getUpperRightY instead of BoundingBox#getHeight when queried via PDSimpleFont#getFontHeight(byte[], int, int). Another screenshot (AFM-getUpperRightY.png) shows how this restores the previously broken text extraction ability.

      It seems like a good idea to rework TextPosition so that it would be aware of its bounding box:
      *) Replace methods PDSimpleFont#getFontWidth(byte[], int, int) and PDSimpleFont#getFontHeight(byte[], int, int) with a single method PDSimpleFont#getFontBoundingBox(byte[], int, int)
      *) Replace the constructor TextPosition(Matrix, Matrix) with TextPosition(Matrix, BoundingBox)
      *) Add new methods TextPosition#getBoundingBox, TextPosition#getBoundingBoxDir. This shouldn't affect existing application clients, because TextPosition#getY and TextPosition#getHeight remain in place.

      Attachments

        1. textposition-randombg.zip
          3 kB
          Villu Ruusmann
        2. 0001-PDFont.java-Add-methods-to-retreive-the-Ascent-and-D.patch
          4 kB
          Karl Ward
        3. AFM-getUpperRightY.png
          1.07 MB
          Villu Ruusmann
        4. AFM-getHeight.png
          1.07 MB
          Villu Ruusmann

        Activity

          People

            Unassigned Unassigned
            vfed Villu Ruusmann
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: