Bug 44328 - [PATCH] orphans/widows not respected in some cases
[PATCH] orphans/widows not respected in some cases
Status: NEW
Product: Fop - Now in Jira
Classification: Unclassified
Component: general
trunk
All All
: P2 normal
: ---
Assigned To: fop-dev
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2008-01-30 09:03 UTC by Andrew McFarland
Modified: 2012-04-11 03:19 UTC (History)
2 users (show)



Attachments
XSL-FO file that displays the problem (1.08 KB, text/x-xslfo)
2008-01-30 09:04 UTC, Andrew McFarland
Details
patch fixing some of the issues with widows/orphans (13.22 KB, patch)
2011-03-27 02:36 UTC, Andreas L. Delmelle
Details | Diff
extended testcase (6.85 KB, text/plain)
2011-03-27 02:38 UTC, Andreas L. Delmelle
Details
updated test, showing additional issue (8.99 KB, text/plain)
2011-03-28 16:52 UTC, Andreas L. Delmelle
Details
updated test case (9.17 KB, text/plain)
2011-03-29 17:52 UTC, Andreas L. Delmelle
Details
new patch, handling widows/orphans at block-level (43.30 KB, patch)
2011-03-31 14:23 UTC, Andreas L. Delmelle
Details | Diff
Updated patch, eliminating some code duplication (9.87 KB, patch)
2011-04-02 15:32 UTC, Andreas L. Delmelle
Details | Diff
patch for the layout unit tests (24.29 KB, patch)
2011-04-02 15:36 UTC, Andreas L. Delmelle
Details | Diff
updated patch, including two tests for orphans/widows (24.65 KB, patch)
2011-04-02 17:14 UTC, Andreas L. Delmelle
Details | Diff
updated test case, showing additional issue for tables (10.59 KB, application/octet-stream)
2011-04-02 17:23 UTC, Andreas L. Delmelle
Details
updated patch, addresses the described issue with nested tables/lists (46.63 KB, patch)
2011-04-04 11:28 UTC, Andreas L. Delmelle
Details | Diff
updated test FO, including tables/lists (15.28 KB, text/plain)
2011-04-04 11:30 UTC, Andreas L. Delmelle
Details
updated patch, containing fix for combinations of widows/orphans and keep-* (57.32 KB, patch)
2011-05-24 20:45 UTC, Andreas L. Delmelle
Details | Diff
updated patch, fixing potential issues with higher orphans values (57.38 KB, patch)
2011-05-25 22:30 UTC, Andreas L. Delmelle
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew McFarland 2008-01-30 09:03:14 UTC
When I process the following fo, I get a PDF with a one-line widow at the start
of the second page, even though widows for that fo:block is set to 4.

<?xml version="1.0" encoding="ISO-8859-1"?>

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>
  <fo:simple-page-master master-name="A4">
    <fo:region-body />
  </fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="A4">
  <fo:flow flow-name="xsl-region-body">
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block>Paragraph</fo:block>
    <fo:block widows="4" linefeed-treatment="preserve" >line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
line
</fo:block>
  </fo:flow>
</fo:page-sequence>

</fo:root>
Comment 1 Andrew McFarland 2008-01-30 09:04:31 UTC
Created attachment 21452 [details]
XSL-FO file that displays the problem
Comment 2 Andreas L. Delmelle 2011-03-27 02:34:26 UTC
Actually, this is a more general issue, too: orphans and widows only work reliably at the lowest block level and could lead to break-decisions that cause violations at higher levels in some cases (e.g. blocks consisting of subparagraphs --preserved linefeeds or nested blocks-- consisting of a single line)

Attachments to follow:
- a new, extended test case demonstrating some of the issues
- a patch for a partial fix

The fix so far, has forced me to update a few test cases (changes included in the patch). Those changes all seemed to follow naturally from the corrected behavior: if a block generates less than 4 lines of content, then that is a virtual keep-together with default orphans/widows setting. This is now reflected in the element lists, requiring some shifts in tests on element list structure.
Comment 3 Andreas L. Delmelle 2011-03-27 02:36:03 UTC
Created attachment 26800 [details]
patch fixing some of the issues with widows/orphans
Comment 4 Andreas L. Delmelle 2011-03-27 02:38:24 UTC
Created attachment 26801 [details]
extended testcase


Still a plain FO... proper test case TBD, so not yet added in the patch.
Comment 5 Andreas L. Delmelle 2011-03-27 02:46:57 UTC
Note: not entirely sure about the very last case.
It could be an overflow situation, too, as it is not possible to satisfy all constraints at the same time otherwise. Forcing orphans to "1" on the outermost block would definitely create the situation that is described in the comment. If my interpretation is correct, that is...
Comment 6 Andreas L. Delmelle 2011-03-28 16:52:00 UTC
Created attachment 26807 [details]
updated test, showing additional issue


The partial fix does not yet correct the behavior in the newly added second case. If a regular text paragraph is followed by a child block, "widows" would not apply to the last 2 lines of the first paragraph, as they are not 'in the last area' generated by the parent block. The algorithm would be allowed to break before the last line of that paragraph.

A similar issue is demonstrated for orphans in the fourth test.

Conclusion: orphans/widows can only be reliably taken into account by the LineLayoutManager if there really is no more content preceding/following. If there is, it seems best to leave it up to the parent BlockLayoutManager to sort out how many 'lines'[*] need to be kept together.

[*] Note: quoted, since it is ambiguous for nested inline-block content, especially for lists and tables, where the boxes do not really represent 'lines', but rather list-item- or table-parts. Perhaps list/table content should be excluded in some way, since it is already handled by fox:widow-content-limit and fox:orphan-content-limit...?
Comment 7 Andreas L. Delmelle 2011-03-29 17:52:39 UTC
Created attachment 26813 [details]
updated test case


Corrected the last case to reflect the situation described in the comment. orphans="1" should be set on the outermost block, inherited by the first level, and reset on the second level to trigger that particular side-effect.

Also reformatted the source file to reflect the eventual line layout.
Comment 8 Andreas L. Delmelle 2011-03-31 14:23:09 UTC
Created attachment 26820 [details]
new patch, handling widows/orphans at block-level


The new patch should result in a correct rendering of the test. 20 pages demonstrating some possibilities of widows/orphans.
One scenario I still need to look into is the case of tables/list nested in blocks. All seems to work pretty well when they are direct flow descendants, but I somehow still expect trouble...
All unit tests pass, nevertheless.

Notes:
Trying to fix this, I was tempted to try and reuse the existing code in ElementListUtils.removeLegalBreaks(), that is used for fox:orphan-content-limit and fox:widow-content-limit, but got to wondering... Why not at the same time try to improve/optimize a little bit? For now, I added similar code inside BlockLayoutManager, using a slightly different approach. The duplication I am still going to address, but it might go a bit further than that. 
We should be able to share code more efficiently by:
- using a plain removeLegalBreaks() method, period. No fromEnd/fromStart, or differentiation between content-limit in length or lines. That should be determined by the caller.
- use one pre-process iteration to determine the limit, then just hand off the corresponding sub-list to the above method. No need to juggle with next/previous = less confusion.

Also, the following additions are made in comparison to the existing variant in ElementListUtils:
In the existing method, If we encounter a penalty (or BreakElement), we set it to infinite to avoid a break. If I interpret correctly, and the element appears in between two boxes, the more appropriate response would be to just remove it (or if possible, as it happens currently in LineLM: do not add it in the first place). This would make the eventual element lists simpler for the BreakingAlgorithm. By design, a box is never a legal break, so adding an infinite penalty in between two boxes is plain redundancy, and is bound to confuse the algorithm in some scenarios.

In dealing with glues, I noticed the following two sequences:

box - box[aux., w=x] - box
box - penalty[inf, w=0] - glue[w=x] - box

Again, an opportunity for optimization here. The above two are equivalent in case the glue is not stretchable, and if I interpret correctly, there currently is no case where --during raw element list generation, i.e. /before/ space-resolution-- stretchable glues are used in block-level layout. Less elements means less positions, means less computations...

Having chosen the above approach to remove the legal breaks, I did manage to upset quite a few test cases. I have already included them in the patch.
In the end for some, I chose the 'lazy' fix, which means "set widows and orphans to 1" to allow breaking before the last line or after the first. Just a convenient way to trigger the same behavior as before. Others I have adapted to the correct behavior, like region-body_column-count_bug37828. Interesting case, that one, since it shows that the change upsets the balance, but I believe it is entirely correct to do that. No column-break allowed, so we get an empty second column.
Comment 9 Andreas L. Delmelle 2011-04-01 15:03:20 UTC
(In reply to comment #8)
> One scenario I still need to look into is the case of tables/list nested in
> blocks. All seems to work pretty well when they are direct flow descendants,
> but I somehow still expect trouble...

Pondering some more, I think that all will work splendid as long as it is all single-line blocks, which most of our list/table tests consist of. As soon as we get multi-line content, the game will change.

If we have a table/list nested in a block, we generate merged boxes for the table or list content, respecting the widows/orphans constraints of the descendant blocks. The latter means that lists/tables, by design, now always generate multi-line boxes as their first/last elements, which the higher-level BlockLM will take to be single lines, to which it applies its own constraints... That may lead to unnecessarily keeping together the first N table rows, and in some extreme cases, move entire tables/lists to their own page.

As for the best way to avoid it, I don't like the idea of checking for instanceof TableLM or ListBlockLM in BlockLM, but would not like to move the problem to those former classes either, since it is ultimately a shared concern between lists and tables. Perhaps, a special type of box that offers easy access to the maximum number of lines contained in it? We could then, instead of just incrementing by one, increase the boxCount with that number... 

I'll see if I can whip up a proof-of-concept and already add the necessary logic to the BlockLM, then later adapt the other LMs.

I also still have to generate a scenario where the above would have a discernible and at the same time undesirable impact. It would likely involve a few levels of nesting before it becomes really annoying...
Comment 10 Andreas L. Delmelle 2011-04-02 15:32:20 UTC
Created attachment 26842 [details]
Updated patch, eliminating some code duplication


Attached patch cleans up the previous one. Unrelated fixups in LineLM were already committed to allow better focusing on the relevant changes. The changes to test cases are also not included for the same reason. I will attach them as a separate patch, for those who need it.

The loop duplication has been reduced as described:
- one additional removeLegalBreaks() method in ElementListUtils that just accepts one parameter, namely the list to remove the breaks from
- remaining logic (i.e. determining the sublist that the breaks should be removed from) stays in BlockLM

I was also looking at using the same method for the fox:widow/orphan-content-limit case, and I bumped into the fact that widows/orphans was implemented in reverse here. 
That is, removeLegalBreaksFromEnd() is called for orphan-content-limit, while according to the spec, orphans tells you "the minimum number of lines of a paragraph that must be left at the bottom of a page" or "the minimum number of line-areas in the first area generated by the formatting object". So, it should be exactly the other way around. Nobody ever noticed this... 8-)
Comment 11 Andreas L. Delmelle 2011-04-02 15:36:05 UTC
Created attachment 26843 [details]
patch for the layout unit tests
Comment 12 Andreas L. Delmelle 2011-04-02 17:14:10 UTC
Created attachment 26845 [details]
updated patch, including two tests for orphans/widows

added two new tests to the main patch
Comment 13 Andreas L. Delmelle 2011-04-02 17:23:43 UTC
Created attachment 26846 [details]
updated test case, showing additional issue for tables


Added a new test to show the effect on tables (same to be expected for list-blocks): the last cell/row can be split, but isn't because the BlockLM mistakes the two merged boxes for lines, so keeps them together, which strictly speaking, is not necessary.

This effect is expected to become more annoying with higher orphans/widows settings, where it may yield pages with too much white-space...
Comment 14 Andreas L. Delmelle 2011-04-04 11:28:29 UTC
Created attachment 26851 [details]
updated patch, addresses the described issue with nested tables/lists


Patch includes two new test cases to check for behavior in case a list-block or table is nested in a block.

A lineCount member was added to KnuthBlockBox, which is now used, if available, and TableStepper and ListItemLM now always generate KnuthBlockBoxes (as is the case for the LineLM).

Note: while addressing the issue for tables, I noticed that currently, the header and footer box are not marked as auxiliary. I tried to change this, as it seemed to make sense, and it did not seem to have an effect on the unit tests. It would influence widows/orphans behavior: if the header/footer are not marked auxiliary, their lines would potentially be counted as well. I do not believe this is the intended result... The change to TableContentLM is not yet included in the patch, as I have some conflicting changes to work out first, and I do not have a test to demonstrate the effect, yet.
Comment 15 Andreas L. Delmelle 2011-04-04 11:30:00 UTC
Created attachment 26852 [details]
updated test FO, including tables/lists
Comment 16 Andreas L. Delmelle 2011-05-24 20:45:46 UTC
Created attachment 27055 [details]
updated patch, containing fix for combinations of widows/orphans and keep-*


Further testing revealed odd behavior in case the first child block generates less lines of content than the orphans constraint *and* has a keep-with-next set. The pending keeps were not properly cleared, as the call to super() in BlockLM.addInBetweenBreak() was skipped.
Incidentally, this made me aware that the previous patch was also flawed in that it only (potentially) skips this for the second childLM. That would assume the total number of lines generated by the first two childLMs is enough to satisfy the constraint --which obviously works pretty well if you check only initial values... ;-) Test case TBD, but should already be accounted for in the patch.

I was slightly concerned about moving to KnuthBlockBox in all situations, as it reserves space for a few extra references per instance. To mitigate that somewhat, the footnote-related information was put in a static class, so that those two references at least are compounded. No footnotes means only one unused reference instead of two. None would be even better, but that requires a bit of clever restructuring in the KnuthBox hierarchy (insert a LineBox in between KnuthBox and KnuthBlockBox?). On my mind, but I have not quite figured it all out yet.
Comment 17 Andreas L. Delmelle 2011-05-25 22:30:55 UTC
Created attachment 27066 [details]
updated patch, fixing potential issues with higher orphans values
Comment 18 Glenn Adams 2012-04-07 01:44:20 UTC
resetting P2 open bugs to P3 pending further review
Comment 19 Glenn Adams 2012-04-11 03:19:54 UTC
increase priority for bugs with a patch