I have found a small bug concerning hyphenation in the HyphenationTree.hyphenate() method. Before checking the exception list or using the algorithm, the function "normalizes" the word: during this phase, if a non-letter character is found null is returned. // normalize word char[] c = new char[2]; for (i = 1; i <= len; i++) { c[0] = w[offset + i - 1]; int nc = classmap.find(c, 0); if (nc < 0) { // found a non-letter character, abort return null; } word[i] = (char)nc; } I think the condition (nc < 0) is too strong: at the moment words followed by punctuation marks, or in parenthesis, are not hyphenated. So, for example, the word "suggestion" can be hyphenated, but "suggestion." and "(suggestion)," cannot. This is how I tried to fix this problem: - non-letter characters at the beginning are not copied into word[] - if a non-letter character is found which is not at the beginning, it is not copied into word[] and a boolean variable becomes true - if a letter-character is found when the variable is true, null is returned; otherwise, word[] is used to find hyphenation points I have also added a little optimization: if, after the normalization and the non-letter character removal, the word size is less than (remainCharCount + pushCharCount), null is returned, without checking the exception list and performing the algorithm. I'm going to attach the proposed patch and a test fo file which shows a few examples. Regards Luca
Created attachment 11258 [details] proposed patch to HyphenationTree
Created attachment 11259 [details] test fo file: words with punctuation marks and parenthesis
Luca, The patch works well. I do not find the name bAfterLetter very clear. It really is bNonLetterAfterLetters, but that is too long. I find bEndOfLetters a reasonable choice. The 'else if (!bAfterLetter)' might as well be just 'else'. The venom is in the tail. I do not know the details of this part of hyphenation. Your addition of 'iIgnoreAtBeginning' seems OK. I think you should also add 'iIgnoreAtBeginning' in the if branch (hyphenation exceptions), but the results of a test fo are not quite in favour. Perhaps you can have a look into this. I added a long comment explaining various features, perhaps most to myself. I added cases to the test fo showing a word that is too short (when one adds debug logging, one sees the effect), and 4 cases with a hyphenation exception word. Regards, Simon
Created attachment 11264 [details] An expanded test fo file
Created attachment 11265 [details] A slightly modified patch
Your assumptions appear correct, I checked the Washington Post newspaper and saw that hyphenation does indeed occur with words that have a period or comma at the end of them. Glen
Simon, concernig names, unnecessary "if", etc. , I agree with you. It seems to me that your change concerning hyphenation exceptions works, otherwise the hyphenation points would appear in the wrong place because of the punctuation marks. The strange pdf generated is due, IMO, to a couple of problems: -1- In the last test case the text is (quite oddly) divided among 3 TextLM "**[...]** (philanthrop" "ic)." " " Specifying the property linefeed-treatment="ignore", the text is all in a TLM. Removing from the test file the linefeed after "(philanthropic).", the text is still split in two parts: "***************************" "*************************************** (philanthropic)." So, it seems there is an irksome bug affecting text splitting. -2- The last line in a justified paragraph is sometimes justified too (bug 28314). The "phantom linefeed" is by default treated as a space, and so it is adjusted. Anyway, I was pleased to notice that, although shattered, the word is correctly collected and hyphenated. Regards Luca
Luca, I agree with item 1. Although the spaces before the hyphenated word are often incorrect, the word is hyphenated correctly, and the patch solves the problem it set out to solve. I also agree with item 2. The break up of the word is incorrect. Also the spaces before the hyphenated word are often incorrect. However, this problem is not caused by the patch, it is just revealed by it. The patch can be applied. The break-up and spacing problems should be solved in a separate effort. Simon
Patch applied. Thanks. Simon
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed