[PDFBOX-3405] Display font size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.0.3, 3.0.0 PDFBox
Fix Version/s: 2.0.3, 3.0.0 PDFBox
Component/s: Documentation, Text extraction
Labels:
None

Description

I (along with others) have found using the font size of text to be very useful when doing things like trying to recover the structure of PDFs. For example, in heuristics like 'text with large font sizes are probably titles'. However, I noticed a few cases where getFontSizeInPt or getFontSize return seemingly very inaccurate results. For example, in the attached pdf the getFontSizeInPt for the title text is over 500.

After digging into this a little, as I understand it neither of these methods return the a font size scaled to the display space. getFontSize returns the "raw" encoded font size and getFontSizeInPt returns the font size scaled by the text matrix, but not by the current transformation matrix.

Basically, in order to get reliable font information, it would be helpful if either
1) getFontSizeInPt includes the effect of using the current transformation matrix
2) A new method like getDisplayFontSize is added that returns the font size scaled to the display space

As a side note, I have seen several users (including myself), assume that getFontSize returns the font size as would be observed when one opens the PDF, and the been confused when these method occasionally do not return the results expected. I think getFontSize would benefit from a clear note that the results might not include scaling factors that were used when the text was rendered.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

bad-font-2.pdf
30/Jun/16 22:19
103 kB
Christopher Clark
bad-font-p1.pdf
28/Jun/16 23:19
102 kB
Christopher Clark

Activity

People

Assignee:: Tilman Hausherr

Reporter:: Christopher Clark

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Jun/16 23:19

Updated:: 25/Mar/17 18:12

Resolved:: 03/Jul/16 10:03