Bug 46486 - page-position="last" causes blocks with span=all to disappear
Summary: page-position="last" causes blocks with span=all to disappear
Status: CLOSED FIXED
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: page-master/layout (show other bugs)
Version: trunk
Hardware: All All
: P2 major
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-07 02:30 UTC by Chris Bowditch
Modified: 2012-04-01 06:57 UTC (History)
0 users



Attachments
Sample FO to recreate bug (3.35 KB, text/plain)
2009-01-07 02:32 UTC, Chris Bowditch
Details
Minimized sample file that demonstrates the issue (1.61 KB, text/plain)
2009-04-03 14:41 UTC, Andreas L. Delmelle
Details
patch not fixing the issue (17.16 KB, patch)
2009-04-15 11:53 UTC, Andreas L. Delmelle
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Bowditch 2009-01-07 02:30:49 UTC
If the FO uses a combination of page-position="last", 2 column layout and blocks with span="all" then some of the fo:blocks disappear and don't make it to the final PDF. Sample FO to reproduce attached.
Comment 1 Chris Bowditch 2009-01-07 02:32:22 UTC
Created attachment 23090 [details]
Sample FO to recreate bug
Comment 2 Andreas L. Delmelle 2009-03-22 02:05:00 UTC
Debugging this, I see that the last page-master triggers a restart of the algorithm, but the restarting point is assumed to be the last span change. The other span changes already added the areas for the earlier blocks to a page, but that page is discarded. Since the layout for the earlier parts was already considered 'finished', they are not revisited and disappear.

The issue presents itself always in case of span changes on the last page, when a last page-master is present. Only the last span will be retained in the output.

A possible fix may be to treat span changes slightly differently than they are now, if there is a last page-master. Instead of triggering addAreas() unconditionally for the whole list, we might be able to limit that to 'everything up to the last break preceding the span change'. 

For the moment, that's still a thought, though... Limiting the list for which to add the areas seems easy enough, but that's only part of the story. It seems possible, but I still have to investigate whether it is also doable.
Comment 3 Andreas L. Delmelle 2009-03-23 16:00:24 UTC
Locally refactored the methods PageBreaker.doPhase3With...() into one restartAt() method.
 
Besides the fact that one slightly larger method body saves about 20 lines in total of essentially duplicate code, this seems to give me a better view on a possible fix. With a bit of perspective, since it becomes apparent that we can basically restart from any given part, and given one extra flag maybe even temporarily disable or bypass addAreas() for the last part, if necessary.

For the moment, the fix could be to temporarily store the blockLists for that last part, to include later with the next sequence. Right now, areas are always added for the whole list. 
This should be made conditional: if there is a last page-master, and there's more content to follow, then only add the areas up to the last break. The rest of the original list should be retained, to be prepended to the following set of block-lists.

Basically the same thing as what happens for the last page right now, but instead of adding the areas, somehow carry the elements over to the next iteration. 
One question mark: will the info regarding span changes be carried over (easily) too...? This might need some additional thought/work.
Comment 4 Andreas L. Delmelle 2009-03-25 13:29:19 UTC
Part of the issue already fixed locally by working with a deferred block list to which the elements of the part after the last break are added before continuing with the next iteration.

This already keeps the content from disappearing, but as I expected, column-balancing is then no longer automatically triggered for the 'flow table' part in the sample.
Comment 5 Andreas L. Delmelle 2009-03-28 08:13:06 UTC
Studying the effects of using a deferred block list, it is definitely not enough to simply carry those blocks over. Since they were created under the assumption of, say, 2-column layout, that number could change for the last page.

What we would actually need is not to defer the block lists, but somehow pass the message to the FlowLayoutManager that it needs to generate the element list, starting from the position corresponding to the first block in the deferred list. Only then are we certain that the line boxes will have a width corresponding to the narrower column-width.

Then again, this would only solve the problem in case of a restart due to a last-page condition or a span change. I'm suddenly not so certain if FOP currently deals gracefully with the case of alternating page-masters with different column-count... Judging from the code, I'd say it doesn't (?) 
Going to check that, and will report back here.
Comment 6 Andreas L. Delmelle 2009-03-31 13:48:13 UTC
For documentation purposes:
Spent quite some time trying to get it working correctly, by deferring the part after the last page-break in case we have a last page-master and more content is to follow, but then realized that that approach is suffering the same deficiency as the implementation as a whole. 
By only working with the prepared block-lists/line-boxes, only the page-breaks not the line-breaks are recomputed.

Checked and verified my suspicion that alternating page-masters with different column-count don't work either, so this is part of a deeper problem.

My further thoughts can only go in the direction of something similar to interleaved line- and page-breaking, so a solution for this issue:
a) would solve a lot of other ones as well, but
b) could cause significant overlap with Vincent's long-term objective

A bit more concrete:
LineLayoutManager should be refactored, and line-breaking should probably modeled along the same lines as page-breaking. See AbstractBreaker->PageBreaker. Makes you wonder why we don't have a LineBreaker (yet)? I get the feeling that such refactoring may make it much easier to restart the line-breaking, using a different reference-ipd. I'm not even sure we really need a LineLayoutManager to begin with. There is, strictly speaking, no PageLayoutManager either...
This may cause some other relocations, as I have the distinct impression that AbstractBreaker is far from 'abstract'. There's too much code in there that seems quite specific to page- (or block-level) breaking. An AbstractPageBreaker as a common superclass for BlockContainerBreaker, StaticContentBreaker...?
If we:
a) introduce a LineBreaker, 
b) expose its 'doLayout()' method, 
c) make sure it is associated with the line-boxes

then it should be possible to restart the LineBreaker, during the page-breaking loop without having to pass via the FlowLayoutManager again.

One suboptimal issue then remains: FlowLayoutManager.getNextKnuthElements() will not return until a span-change or forced break is signaled. That could mean we compute line-breaks for 40 pages of content, where maybe the second page is already different, and 39 pages have to be redone...
Comment 7 Vincent Hennebert 2009-04-01 03:30:29 UTC
Hi Andreas,

I'd suggest you not to worry about changes in the number of columns, otherwise you will indeed bump into the very problems that I am trying to solve at the moment. It's a known limitation that FOP can't handle pages of different widths at the moment, and changes in the number of columns are basically that: different line widths.

Your idea of deferring the calls to addAreas looks promising to me, I think you should pursue in that direction and that may be enough for now. Also, in the sample file the dimensions of the region-body for the last page (PageBack) are the same as for the other pages (PageFront). I've seen a lot of documents for which that was the case, and the only difference was in the static-contents for the page header/footer. It may be worthwhile to check the dimensions of the last page, and if they are the same there's no need to re-do page breaking at all. Or even, if the bpd of the last page is smaller than on other pages but big enough to receive the content for the last part, then no need to re-do page breaking either. That would also solve the column-balancing problem.

HTH,
Vincent


(In reply to comment #6)
> For documentation purposes:
> Spent quite some time trying to get it working correctly, by deferring the part
> after the last page-break in case we have a last page-master and more content
> is to follow, but then realized that that approach is suffering the same
> deficiency as the implementation as a whole. 
> By only working with the prepared block-lists/line-boxes, only the page-breaks
> not the line-breaks are recomputed.
> 
> Checked and verified my suspicion that alternating page-masters with different
> column-count don't work either, so this is part of a deeper problem.
> 
> My further thoughts can only go in the direction of something similar to
> interleaved line- and page-breaking, so a solution for this issue:
> a) would solve a lot of other ones as well, but
> b) could cause significant overlap with Vincent's long-term objective
> 
> A bit more concrete:
> LineLayoutManager should be refactored, and line-breaking should probably
> modeled along the same lines as page-breaking. See
> AbstractBreaker->PageBreaker. Makes you wonder why we don't have a LineBreaker
> (yet)? I get the feeling that such refactoring may make it much easier to
> restart the line-breaking, using a different reference-ipd. I'm not even sure
> we really need a LineLayoutManager to begin with. There is, strictly speaking,
> no PageLayoutManager either...
> This may cause some other relocations, as I have the distinct impression that
> AbstractBreaker is far from 'abstract'. There's too much code in there that
> seems quite specific to page- (or block-level) breaking. An AbstractPageBreaker
> as a common superclass for BlockContainerBreaker, StaticContentBreaker...?
> If we:
> a) introduce a LineBreaker, 
> b) expose its 'doLayout()' method, 
> c) make sure it is associated with the line-boxes
> 
> then it should be possible to restart the LineBreaker, during the page-breaking
> loop without having to pass via the FlowLayoutManager again.
> 
> One suboptimal issue then remains: FlowLayoutManager.getNextKnuthElements()
> will not return until a span-change or forced break is signaled. That could
> mean we compute line-breaks for 40 pages of content, where maybe the second
> page is already different, and 39 pages have to be redone...
Comment 8 Andreas L. Delmelle 2009-04-03 14:41:35 UTC
Created attachment 23440 [details]
Minimized sample file that demonstrates the issue
Comment 9 Andreas L. Delmelle 2009-04-06 11:05:42 UTC
Status update:
Been busy attempting to complete the deferral mechanism. 
By itself, not very complicated changes (only 2 affected classes), but all sorts of other issues keep arising... There is some progress, but not as much as I'd hoped.
It seems to require some more invasive refactoring than I had in mind.

One interesting scenario that keeps boggling me is:
-> a regular block with span="none", taking up a bit more than a page, followed by
-> a block with span="all" that, by itself, would take up exactly one page

The first block will be split, and leaves a deferred part, to which column-balancing should be applied.
I at least got that working for the minimized sample file. To my dismay, then noticed the original one (with two span changes on the last page) exhibits some other weirdness.

Apart from that, my current approach does not solve the case described above. 
Since I do not yet process the trailing sequence until the preceding deferred part is processed, it seems I can no longer accurately determine the last page condition. hasMoreContent() has been refactored here, to take into account the deferred part, so the last page condition would never be reached (it will only return false if the breaker is currently processing the last part, so as long as there's still a deferred part...)
Using childFLM.isFinished() offers relief to trigger last page handling, but then this still requires at least running the trailing sequence through the algorithm /first/, to see if there is more than one page-break in combination with the deferred part. If not, only then can we safely add the deferred areas to what we know will be the last page.

If so, then we will have processed the deferred part three times. Once with the initial run, where the sequence is deferred. A second time to check whether it would cause a page-break in the following sequence, then a third time to actually add the areas.

Writing this, it becomes apparent that I'll probably have to go in the direction of instantiating the BreakingAlgorithm, but deferring it, rather than deferring the original block list...

<TBC>
Comment 10 Andreas L. Delmelle 2009-04-14 16:23:59 UTC
Some more notes... Seems I'm unfortunately going to have to put this aside for the moment. It seems like I either sorely underestimated the complexity of the issue, or I'm overlooking something very obvious. Even the closest attempt yet, still breaks some 50 other tests.

Up to now, my changes are localized to AbstractBreaker and PageBreaker. The basic idea is to somehow recreate the exact sequence of events that would be generated when the FlowLM returns due to a span change. By this, I'm attempting to make use of the fact that AbstractBreaker already has a blockLists member that can hold multiple elements over which the for-loop in doLayout() will iterate. Currently, the blockLists member is cleared with every iteration, so in practice *always* contains one element.
By itself, it is rather easy to have PageBreaker's doPhase3(), under certain conditions, defer the sub-list corresponding to the part after the last break until the next iteration in doLayout().

The hard part --still have not figured that out yet-- is how to get the span changes across. The doPhase3() method has been merged to allow (theoretically) to do a restart with column-balancing for a part on the last page. BlockSequence has been equipped with an extra span member, so the span of the original list is not lost.
The main loop in doLayout() has been refactored to do special processing of the block lists that are remnants of a previous iteration.

Still, not enough... maybe I'm completely misunderstanding the mechanism behind span changes here.
Comment 11 Vincent Hennebert 2009-04-15 08:57:23 UTC
(In reply to comment #7)
> Hi Andreas,
> 
> I'd suggest you not to worry about changes in the number of columns, otherwise
> you will indeed bump into the very problems that I am trying to solve at the
> moment. It's a known limitation that FOP can't handle pages of different widths
> at the moment, and changes in the number of columns are basically that:
> different line widths.
> 
> Your idea of deferring the calls to addAreas looks promising to me, I think you
> should pursue in that direction and that may be enough for now. Also, in the
> sample file the dimensions of the region-body for the last page (PageBack) are
> the same as for the other pages (PageFront). I've seen a lot of documents for
> which that was the case, and the only difference was in the static-contents for
> the page header/footer. It may be worthwhile to check the dimensions of the
> last page, and if they are the same there's no need to re-do page breaking at
> all. Or even, if the bpd of the last page is smaller than on other pages but
> big enough to receive the content for the last part, then no need to re-do page
> breaking either. That would also solve the column-balancing problem.

Thinking more about that, assuming that we leave the changing IPD problem aside for now, no re-layouting should be necessary at all. If the bpd of the last page is smaller than the content, then I think the page should be left as is, and another page should be generated using the page-master for page-position="last" (plus possibly an additional empty one in between to honour the force-page-count property).

In the case where the content fits on the last page, then maybe it's possible to simply 'detach' from the page-area the region-viewport area corresponding to the region-body, and re-attach it to a page-area produced by the page-master for page-position="last". It may be simpler (and also more efficient) than maintaining a list of deferred areas, re-launching the breaking algorithm, etc.

Thoughts?
Vincent

<snip/>
Comment 12 Andreas L. Delmelle 2009-04-15 10:19:13 UTC
FWIW, for the moment, I have decided to do the following:
-> committed most of the changes to PageBreaker.java to the trunk, since they're more cleanup and things that simply make sense (see commit message for the motivations behind the changes)

-> make a patch (should be quite small) with the remaining changes to PageBreaker and AbstractBreaker, which will follow shortly after that in an attachment to this report. Just for the sake of not losing it (although especially last night, I began getting the impression that the attempt was next to worthless)
Comment 13 Chris Bowditch 2009-04-15 10:26:37 UTC
I certainly like your idea Vincent. Changing IPD is a known limitation of the current architecture of FOP so I think any solution to the disappearing content issue can overlook that for now. Most use cases of page-position="last" just involves different headers and footers, so finding a solution that doesn't lose content and works for different headers and footers is the most immediate concern. Solving this problem fully can be left to the redesigned algorithm that copes with changing IPD.

Thanks,

Chris
Comment 14 Andreas L. Delmelle 2009-04-15 10:53:16 UTC
(In reply to comment #11)

Hi Vincent

My next attempt would probably go in that direction. If you look at what BlockContainerBreaker does, for example, this gives a clue as to how it might be solvable (but that's still a vague idea in my head). That breaker implementation explicitly performs the layout but skips area-addition, and keeps the algorithm suspended after phase 2, until it is called upon by the parent LM (see: addContainedAreas()).

The only thing that's still a bit puzzling is the case of multiple span-changes on the last page. 

So, we need to defer the area addition until the last part is reached. 
Actually, a bit more correct:
- first we need to add the areas for the first part (else the following part will not get the correct available BPD)
- then, if the following part produces only one page, we know that both parts fit on one page, and we somehow need to keep a reference to the areas

This will have to be taken into account every time the FlowLM returns for a span-change. The first parts can only safely be let go if the last part produces more than one page. Only one page for the last part means all the parts will fit together on one page, which will then also be the last (or only).

With this, the difficulty rises of determining when we reach the last page condition. Currently, this means 'if the FlowLM is finished'. 
Since we would add a scenario of multiple parts being deferred to the last page, this no longer seems to suffice, so we'll need to come up with another way to check this condition.

At any rate, one lesson learnt: teaching AbstractBreaker to deal with a sequence of BlockSequences (instead of one BlockSequence at a time) including span-changes, does not seem doable. Maybe we'd better remove that intermediary blockLists member then, instead of pretending to loop over it... ;-)
Comment 15 Andreas L. Delmelle 2009-04-15 11:53:03 UTC
Created attachment 23495 [details]
patch not fixing the issue

Tried my best to keep looking at the bright side working on this one. The changes do not affect many classes, so one obvious possibility is that I was too focused on that area, and completely ignored other parts.

In spite of adding the span-value to the BlockSequence, and 'faking' the span change, it produces weird effects. Checking the debug-log, I only get span changes from EN_NONE to EN_NONE (= NOT_SET in the LayoutContext), so column-balancing is not yet triggered, and something is definitely wrong with the BPD updates.
Also, if I interpret correctly, the areas for a given part are possibly added twice (or more, depending on the number of deferred span-changes).
Comment 16 Andreas L. Delmelle 2009-04-15 12:02:19 UTC
BTW: Actually, quite a shame I didn't get near a solution, so far. I still think teaching AbstractBreaker to possibly deal with a sequence of block lists could be a very worthwhile addition (if feasible). It could make things possible like returning from the FlowLM early, as soon as the accumulated block lists' content-height would exceed, say, 5 times the available BPD (a very rough 5 pages). If AbstractBreaker can be made to process those incrementally, we might be able to finish and release pages sooner.
Comment 17 Vincent Hennebert 2010-01-07 08:02:25 UTC
I'm going to have a look at this. I'll investigate the idea mentioned in comment #11 to detach the region-viewport area and re-attach it to the page area corresponding to the last page.

Vincent
Comment 18 Vincent Hennebert 2010-01-18 04:30:13 UTC
Should be fixed in rev. 900364:
http://svn.apache.org/viewvc?rev=900364&view=rev
Comment 19 Glenn Adams 2012-04-01 06:57:20 UTC
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed