Bug 51304

Summary: footnotes in multi-column region-body overlap
Product: Fop - Now in Jira Reporter: Andreas L. Delmelle <adelmelle>
Component: page-master/layoutAssignee: fop-dev
Status: NEW ---    
Severity: normal    
Priority: P3    
Version: trunk   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: sample FO demonstrating two sides of the same issue
intermediate patch, containing partial fix
The requested PDF
Sample FO file demonstrating the two-column footnote problem
PDF output demonstrating the two-column footnote problem

Description Andreas L. Delmelle 2011-05-31 19:21:27 UTC
Created attachment 27098 [details]
sample FO demonstrating two sides of the same issue

Creating this bug to track the status of a known issue, for which we already have a disabled test case. 
The test comment says 'may overlap', as if it is the exception, but in multi-column documents containing a significant amount of footnotes, it appears to be the rule.
The effect is not always visible, and in some cases there just is no break possibility that leaves enough content for the overlap to be triggered, but for the remainder...

Further investigation reveals that the issue is that each column break is determined by only taking into account the height of the footnote content associated with _that_ particular column, which works fine for single-column pages only.
If there are footnotes in the first column, this will cause overlaps in all _but_ the first. If there are (also) footnotes in following columns, there will (also) be overlap in the first.

The first part, I believe, can be fixed by a very small change in PageBreakingAlgorithm.computeDifference(), at line 506 and following:

if (footnotesPending) {
    // compute the total length of the footnotes not yet inserted
    int allFootnotes = totalFootnotesLength;

    // if activeNode is not a page-break, the footnotes for preceding
    // column(s) should be counted as well, otherwise we need to subtract
    // the footnotes already inserted
    if (pageProvider.endPage(activeNode.line - 1)) {
        allFootnotes -= pageNode.totalFootnotes;
    }
...

This renders the first sample correctly (also for more than two columns), and gives no complaints from the test suite. However, the third case --overlaps in both columns-- leads to an infinite loop.
Comment 1 Andreas L. Delmelle 2011-05-31 19:23:52 UTC
The second part will take a bit more effort to address. As soon as newFootnotes are detected, the algorithm should restart from the last page-break, and take into account the additional footnotes to revisit the previous column-breaks.
Comment 2 Andreas L. Delmelle 2011-06-10 20:11:07 UTC
Still in the process of figuring this out, but thought I'd already float some ideas. 

The change as described in an earlier comment --fixing the overlaps in following columns if an earlier one had footnotes-- does not fix the behavior after checking closer. 
For columns beyond the second, it yields impossible values for KnuthPageNode.totalFootnotes, which is set to PBA.insertedFootnotesLength, which in turn depends on that subtraction in case all footnotes fit. That alteration needs to be slightly more invasive.

As for fully avoiding overlaps in multi-column layout with footnotes, as far as I can see at the moment, it would require a new mode/type of restart.
In reality, a footnote can only be deferred to a following page (= the footnote area spans all columns), so the footnote deferral mechanism should definitely be revised in order to make this work.

In this particular context, we would have to keep track of the footnotes that were added before the restart, instead of just throwing them away, and redo the page, but using the full length of all footnotes. When we arrive at the next unavoidable page-break, it then needs to be checked whether the citation of the last footnote would still appear on the page. If that is not the case, the process can be repeated with slightly less footnote content.

I am also looking into switching to one flat footnoteList, that would merge the footnote content. This in conjunction with a list of indices pointing to the boundaries. 
Not sure yet if that will simplify things, but it would offer the potential benefit of being able to better handle space-resolution between footnotes (which is currently not supported). At any rate, at this level, a list of lists with a separate listIndex and elementIndex just seems to add to the confusion. (Not even counting the potential bugs arising from those indices being even the slightest bit out of synch with the list... Infinite loops, AIOOBExceptions and the like)

Intermediate patch containing some initial work/cleanups to follow soon.
Comment 3 Andreas L. Delmelle 2011-06-19 17:53:46 UTC
Created attachment 27176 [details]
intermediate patch, containing partial fix


Intermediate patch contains a more proper fix to address the overlaps in columns following one that had footnotes (test included). The node whose totalFootnotes is subtracted will now always be the last page break, so that following columns will still take into account the footnote height added by previous columns on the same page.

Aside from the fix, the patch also contains some general minor cleanups/restructuring. Most important is probably the gathering of all footnote-related variables/methods in PageBreakingAlgorithm.Footnotes. I find this makes the whole process much easier to debug (one watch to get all relevant info), and would facilitate potential further extraction/refactoring of this logic.
Comment 4 Andreas L. Delmelle 2011-06-21 19:22:53 UTC
Just summarizing again what I think would work (helps me get a clearer picture in my head if I force myself to write it down...):
[starting from the situation after attached patch]

1° Footnotes.handleFootnotes()
  - should get a helper list of boxes whose footnotes are present in the list
    (will in fact only contain the boxes that are 'in scope' for the current page)
  - should check this list, and if the boxes' footnotes were already added, just skip
    (if yes, then we would be restarting due to a new-footnotes-before-last-column condition (see below)

2° Footnotes.getDifference() (still going to rename that one, as it is a bit confusing, I think... That was just a quick rename after extraction from PBA.computeDifference())
  - if we are restarting due to footnotes before the last column, then just add the height for all "current" footnotes
    (at that point, those will include the footnotes that were encountered later on the page the first time around)

3° PBA.restartFrom()
If there is a footnotes-before-last-column condition, then:
  - do not reset the footnotes, but
  - restart from the last break that started a page
    (this should then trigger the special treatment under 1° and 2°)
If there is no such condition, then the helper list in Footnotes should be cleared too, so that we get a 'proper' restart.

Still left to figure out what needs to happen once the second pass is complete. Situations can arise where the anchor(s) of the last footnote(s) would be pushed off the page, due to the earlier breaks in the first columns, if we would include all footnotes. 
This can probably be checked (at least approximately) right before restarting (i.e. at that point, we could already decide to defer a few footnote-lines to the next pass?)
Comment 5 Andreas L. Delmelle 2011-07-05 19:03:29 UTC
Update: there was still something lacking in the overall picture as sketched in the previous comment.
The additional step should actually take the form of a *forced* restart, at the point of the next unavoidable page-break.

The reason I misread it is that, at first, I overlooked that the algorithm behaves slightly differently if the available width is such that each break fits exactly (e.g. 72pt available, and all lines are 12pt --assuming no stretch/shrink).
As soon as the width is increased or decreased (say to 73, resp. 71pt), there is a restart on every unavoidable page/column-break. When a line is reached that would generate overflow, there is a break-plus-restart from the position after the last line that still fits. In such a scenario, it is conceivable to alter the mode of the restart, so it jumps back to the last page-break, and retries the complete page, with a different set of parameters.

If every unavoidable break is 'perfect', however, a restart is never triggered, and thus it would have to be forced somehow. Given that the cycle is just node-activation and -deactivation, it likely means overriding deactivateNode() in PageBreakingAlgorithm. Around every unavoidable break, there is a brief moment at which there are two active nodes, which then gets reduced to one, so the outer loop in findBreakingPoints() just continues with the next element.
At the point where a page-break node is being deactivated, there can be a rather simple check, going back to the first column-break and comparing that node's totalFootnotes with the footnotes for the node that is about to be deactivated. 
If they are not equal, it means footnotes were added after the first column. The next node should then be deactivated as well, so that we end up with 0 active nodes, and a restart is forced higher up. Provided that we then also properly set the lastTooShort and/or lastDeactivated node(s), the rest of the solution as sketched earlier should suffice to handle the rest.
I am even thinking that we can already infer at that point whether the last box with anchors --for the last (set of) footnote(s)-- can still fit with all footnote content, or whether part of the last footnote needs to be deferred. Updated patch likely to follow soon.
Comment 6 Glenn Adams 2012-04-07 01:37:02 UTC
resetting severity from major to normal pending further review
Comment 7 Glenn Adams 2012-04-07 01:42:17 UTC
resetting P2 open bugs to P3 pending further review
Comment 8 Glenn Adams 2012-04-08 05:18:52 UTC
please provide output PDF file(s) that correspond to supplied input file that demonstrate problem
Comment 9 Glenn Adams 2012-04-24 05:43:17 UTC
(In reply to comment #8)
> please provide output PDF file(s) that correspond to supplied input file that
> demonstrate problem

andreas?
Comment 10 Olivier Scherler 2012-05-09 06:31:08 UTC
Created attachment 28746 [details]
The requested PDF

Output of 27098. Not that it was very difficult to produce.
Comment 11 Olivier Scherler 2012-05-09 06:50:41 UTC
Created attachment 28747 [details]
Sample FO file demonstrating the two-column footnote problem
Comment 12 Olivier Scherler 2012-05-09 06:51:35 UTC
Created attachment 28748 [details]
PDF output demonstrating the two-column footnote problem
Comment 13 Olivier Scherler 2012-05-09 06:54:22 UTC
It’s a pretty serious problem, as the two attachments I just uploaded (“Sample FO file demonstrating the two-column footnote problem” and “PDF output demonstrating the two-column footnote problem”) demonstrate.

Basically, if you use footnotes in a two-column layout, you’re playing a lottery where Mr. Murphy has a pretty high chance to win, and none of the alternatives (reverting to one column or doing away with footnotes) is very viable.
Comment 14 Glenn Adams 2012-05-09 06:55:34 UTC
(In reply to comment #10)
> Created attachment 28746 [details]
> The requested PDF
> 
> Output of 27098. Not that it was very difficult to produce.

thanks; i understand it isn't difficult to produce the PDF, and I could take the time to do so, but I'm trying to encourage (enforce) that submitters provide full and accurate data in maximally minimal form; otherwise, those of us processing these bugs end up spending a lot of time doing things that should have been done by the submitter in the first place
Comment 15 Olivier Scherler 2012-05-09 07:02:10 UTC
OK, understood. :)

The files I just submitted are less minimal, but demonstrate the real-world situation where I first encountered the problem.
Comment 16 Petter Reinholdtsen 2012-08-16 18:03:48 UTC
Could this be the same problem I encountered with xmlto and docbook-xsl and a single column body?  See <URL: http://bugs.debian.org/683197 > for the
minimum docbook file I created to reproduce the problem, and the generated fo file
triggering the bug, and the PDF demonstrating the problem.

Is this the same bug, or should I create a new bug report?