Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3175

PDFTextStreamEngine probably miscalculates text height

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • Text extraction
    • None
    • Patch

    Description

      When parsing a PDF document, TextPosition is created with constant text height, about 2 time smaller than character width, regardless of font size.
      The following workaround to calculate dyDisplay fixes the issue:

      float verticalScaling = 1/1000f;
      if (font instanceof PDType3Font)

      { Matrix fontMatrix = font.getFontMatrix(); verticalScaling = fontMatrix.getValue(1, 1); }

      float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;

      Attachments

        1. snapshot.png
          100 kB
          Leo
        2. PDFBOX-3175-reduced.pdf
          14 kB
          Tilman Hausherr
        3. MarketT_140815-1-marked-1-18.png
          88 kB
          Tilman Hausherr
        4. MarketT_140815-1-marked-1.png
          509 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              ElEl Leo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: