Bug 53316 - Obscure algorithm of lines break in output PDF
Summary: Obscure algorithm of lines break in output PDF
Status: RESOLVED WORKSFORME
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: page-master/layout (show other bugs)
Version: 1.0
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-29 12:04 UTC by Nawa
Modified: 2012-05-29 15:37 UTC (History)
0 users



Attachments
screenshots (272.56 KB, application/x-zip-compressed)
2012-05-29 12:04 UTC, Nawa
Details
new samples (43.42 KB, application/x-zip-compressed)
2012-05-29 13:36 UTC, Nawa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nawa 2012-05-29 12:04:51 UTC
Created attachment 28853 [details]
screenshots

Some words that can be fit into a one line moved to the following line. In my screenshot (line-width-problem.png) I have big space after word "interface". The word "its" can fit  in the first line of the paragraph - page witdh allows it. But as a result that word was moved to the second line.

I tried to reduce word "aaaaaaaaa-aaaaaaaa-integration-pack.zip" in middle of the paragraph to "pack.zip" and as a result the word "its" moved to the first line from second (line-width-problem2.png)

The code of the paragraph looks like

<fo:block font-size="10pt">
     For performing your tasks, you use the public Web service interface of BSS. This interface its documentation and additional resources, templates, samples, and utilities are provided in the BSS integration package (aaaaaaaaa-aaaaaaaa-integration-pack.zip file). A detailed documentation for these resources is provided as Javadoc. By opening the readme.htm file of the integration package, you can access the available Javadoc documentation as well as the resources themselves.
</fo:block>

Could you explain what affects to lines break?
Comment 1 Pascal Sancho 2012-05-29 13:08:45 UTC
screenshot doesn't help to reproduce what you describe.
Please attach (not copy/paste in comment) a short XSL-FO and PDF output that demonstrate what you describe.
Comment 2 Nawa 2012-05-29 13:36:39 UTC
Created attachment 28854 [details]
new samples

input1.fo contains simplified input with "aaaaaaaaa-aaaaaaaa-integration-pack.zip" word. input2.fo contains input with "pack.zip" word
Comment 3 Manuel Mall 2012-05-29 14:13:46 UTC
What you are observing is most likely the effect of the FOP linebreaking algorithm that is based on so called Knuth model (http://wiki.apache.org/xmlgraphics-fop/KnuthsModel). In a very informal description it attempts to find the most 'visually pleasing' linebreaks and as such it does not attempt to fill every line to its maximum. Instead it tries to find some optimum which leaves similar amounts whitespace on each line. It will avoid creating paragraphs with 'badly ragged' right margins if you don't use justification, or if you use justification it will result in the inter word gaps on each line to be of similar size.
Comment 4 Nawa 2012-05-29 14:22:29 UTC
Thank Manuel

Seems to be true. Is it possible to change linebreaking behavior from Knuth model to simple in my XSL-FO?
Comment 5 Manuel Mall 2012-05-29 14:52:57 UTC
AFAIK you cannot switch off the Knuth algorithm or fallback to a simpler algorithm. Looking at your output1.pdf it would be interesting to figure out what a first fit algorithm would produce. For example the word 'its' moves to the first line. 'in the' but probably not 'BSS' moves to the second line. That is not enough to move 'documentation' to the 3rd line. Leaving you with a very short 3rd line. My gut feel is it would look inferior to what you have now. But I could be wrong.
Comment 6 Glenn Adams 2012-05-29 15:37:38 UTC
i'm closing since this is not a bug, but a feature request; if someone wants to create another bug report asking for the addition of one or more other line breaking algorithms along with a new extension property, fox:line-breaking-strategy, or, alternatively, implement the CSS3 Text Module's line-break property [1], then please do so; however, unless such a request is backed up by implementation activity leading to the submission of a patch, then there will likely be little done on such a bug report;

i would prefer that no bug be filed unless it is accompanied by a patch to add new features in this area; instead, it would be better to the "wanted features" wiki;

if you want more information about the current implementation, see [3] and [4]

[1] http://dev.w3.org/csswg/css3-text/#line-break
[2] http://wiki.apache.org/xmlgraphics-fop/MostWantedFeatures
[3] http://wiki.apache.org/xmlgraphics-fop/KnuthsModel
[4] http://wiki.apache.org/xmlgraphics-fop/LineBreaking