Bug 54081 - Properly tag hyphenated words
Summary: Properly tag hyphenated words
Status: RESOLVED FIXED
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: pdf (show other bugs)
Version: all
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-31 16:59 UTC by Vincent Hennebert
Modified: 2012-11-02 21:01 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Hennebert 2012-10-31 16:59:28 UTC
If a hyphenated word is stored as-is in the PDF output, a screen reader will read it differently to when it is not hyphenated. This can result into incomprehensible text.

To fix that problem, a hyphenated word should properly be tagged as such. This can be done in 2 ways.

The first possibility is to add an 'ActualText' entry to the property list of the corresponding marked-content sequence. Its value would basically be the whole text minus the last hyphen character.

The second possibility is to replace the last hyphen with a soft hyphen character, which will be recognized by screen readers such that the split word will be read as one. This will work only if the font has a glyph for the soft hyphen character.

The latter possibility is the recommended way to handle hyphenated words. The former can be implemented as a fallback for when there is no available glyph for the soft hyphen, or when the hyphenation character is not actually a hyphen (this can be customized through the hyphenation-character property).
Comment 1 Vincent Hennebert 2012-11-02 21:01:51 UTC
Fixed in rev. 1405158:
http://svn.apache.org/viewvc?rev=1405158&view=rev