Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3175

PDFTextStreamEngine probably miscalculates text height

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: Text extraction
    • Labels:
      None
    • Flags:
      Patch

      Description

      When parsing a PDF document, TextPosition is created with constant text height, about 2 time smaller than character width, regardless of font size.
      The following workaround to calculate dyDisplay fixes the issue:

      float verticalScaling = 1/1000f;
      if (font instanceof PDType3Font)

      { Matrix fontMatrix = font.getFontMatrix(); verticalScaling = fontMatrix.getValue(1, 1); }

      float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;

        Attachments

        1. MarketT_140815-1-marked-1.png
          509 kB
          Tilman Hausherr
        2. MarketT_140815-1-marked-1-18.png
          88 kB
          Tilman Hausherr
        3. PDFBOX-3175-reduced.pdf
          14 kB
          Tilman Hausherr
        4. snapshot.png
          100 kB
          Leo

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                ElEl Leo
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: