Summary: | [PATCH] Hyphenation | ||
---|---|---|---|
Product: | Fop - Now in Jira | Reporter: | Luca Furini <lfurini> |
Component: | page-master/layout | Assignee: | fop-dev |
Status: | CLOSED FIXED | ||
Severity: | enhancement | ||
Priority: | P3 | ||
Version: | trunk | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Attachments: |
patch created using cvs diff
FO sample file to test the behavior of fop before and after the patch updated diff proposed patch to HyphenationTree test fo file: words with punctuation marks and parenthesis |
Description
Luca Furini
2004-03-18 12:53:51 UTC
Created attachment 10868 [details]
patch created using cvs diff
Created attachment 10869 [details]
FO sample file to test the behavior of fop before and after the patch
Hi all I apologize for having posted some proposed changes without a line explaining what I was changing and why. Really sorry, I'm going to try and do it now. First of all, the condition now tested in the LineLayoutManager.getNextBreakPoss() to decide whether or not to try hyphenation is: if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) { ... } So hyphenation is tried for *every* justified block of text; only if the "language" attribute is not set the getHyphenContext() method returns null and hyphenation is not actually done. My suggestion is to change it so that the fo property hyphenate (7.9.4 in the recommendation, and already implemented in the CommonHyphenation class) is tested instead: if (hyphProps.hyphenate == Constants.TRUE) { ... } In the TextInfo class there is a boolean attribute called "bCanHyphenate", but it is given a default value of true, and it is no more modified. It seems to me that it is quite useless. Is is used in the TextLayoutManager.getHyphenIPD() method: if (textArray.length < iStopIndex || foText.textInfo.bCanHyphenate == false) { ... } but this method is called only if context.tryHyphenate() is true; with my suggested change, context.tryHyphenate() returns true only if the fo property hyphenate is true, and so it is no more necessary to test foText.textInfo.bCanHyphenate. The second problem is that the hyphenation character is not shown: to fix this I added in the AreaInfo class (which stores information about the area that will be generated) a boolean attribute "bHyphenated". Is is set according to the value of the flags of the corresponding BreakPoss, and it is tested in the TextLayoutManager.addAreas() method to decide whether to add the hyphen. Last (and least): I tried to implement the fo property hyphenation-character (7.9.5 in the recommendation). As it applies to blocks, I thought to store it as a character variable in the TextInfo class, with initial value "-". It is set in the PropertyManager.getTextLayoutProps() method, together with the other TextInfo attribute. Is is used to calculate the hyphIPD in the TextLayoutManager constructor, and in the addAreas method. Bye Luca Sorry, I'm here again (hope I am not making too much a mess!) :-) I'm going to attach an updated proposed patch, as I noticed there was an error in the one I posted before. In the TextLayoutManager.addAreas() method, the hyphen size must be added to the total inline progression dimension of the line *before* the adjustment calculation; this involves also moving a few lines of code. Bye Luca Created attachment 11138 [details]
updated diff
Hi Luca, Your patch looks good. It works fine on your test fo, and it also works fine on a test fo I had lying around. There are a few things to note: AreaInfo.bHyphenated is identical to (flags & BreakPoss.HYPHENATED); ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD)) is equivalent to bp.setStackingSize(MinOptMax.add(ipd, new MinOptMax(hyphIPD))). In other words, the same information is present in the BreakPoss, which is held by the LineLM. Unfortunately, this information is not accessible in the addAreas method of TextLM. This is so because posIter.next() returns the position member of the BreakPoss, not the BreakPoss itself. I believe that is an error in PositionIterator which eventually needs correction. For now your solution is a good one. Your proposed TextInfo.hypChar is also known as LineLM.hyphProps.hyphenationChar. This information is not passed on to childLMs and thus not known to TextLM. Storing it in TextInfo seems a good idea; perhaps it is not even needed in hyphProps. It is not completely clear that + if (ai.bHyphenated) { + str += foText.textInfo.hyphChar; + ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD)); + } is always correct. This happens on the last AI returned by posIter.next(), which is not necessarily the last one in the line. But it is hard to conceive of an AI which is last in its LM and also hyphenated; therefore it will work. Ideally, the LineLM indicates which BP is the one ending the line; perhaps this needs correction at some time. Regards, Simon Yes, I was quite surprised to see that all the information stored in the BreakPoss was "thrown away" before adding areas; I chose to duplicate the needed values because this involved fewer and less radical changes. I have found another small bug concerning hyphenation in the HyphenationTree.hyphenate() method. Before checking the exception list or using the algorithm, the function "normalizes" the word: during this phase, if a non-letter character is found null is returned. // normalize word char[] c = new char[2]; for (i = 1; i <= len; i++) { c[0] = w[offset + i - 1]; int nc = classmap.find(c, 0); if (nc < 0) { // found a non-letter character, abort return null; } word[i] = (char)nc; } I think the condition (nc < 0) is too strong: at the moment words followed by punctuation marks, or in parenthesis, are not hyphenated. This is how I tried to fix this problem: - non-letter characters at the beginning are not copied into word[] - if a non-letter character is found which is not at the beginning, it is not copied into word[] and a boolean variable becomes true - if a letter-character is found when the variable is true, null is returned; otherwise, word[] is used to find hyphenation points I have also added a little optimization: if, after the normalization and the non-letter character removal, the word size is less than (remainCharCount + pushCharCount), null is returned, without checking the exception list and performing the algorithm. I'm going to attach the proposed patch and a test fo file which shows a few examples. Regards Luca Created attachment 11237 [details]
proposed patch to HyphenationTree
Created attachment 11238 [details]
test fo file: words with punctuation marks and parenthesis
Applied. Thanks. Simon batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed |