Bug 42067 - [PATCH] Exact positioning of PDF internal links
Summary: [PATCH] Exact positioning of PDF internal links
Status: CLOSED FIXED
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: pdf (show other bugs)
Version: trunk
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
: 41532 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-04-07 08:34 UTC by Paul Vinkenoog
Modified: 2012-04-01 07:06 UTC (History)
1 user (show)



Attachments
Proposed patch file (78.16 KB, patch)
2007-04-07 08:37 UTC, Paul Vinkenoog
Details | Diff
Revised patch, no more idfirsts attribute in area tree XML (83.14 KB, patch)
2007-05-04 07:12 UTC, Paul Vinkenoog
Details | Diff
small patch file (1.19 KB, patch)
2007-05-17 08:39 UTC, Adrian Cumiskey
Details | Diff
small patch file (1.19 KB, patch)
2007-05-17 08:45 UTC, Adrian Cumiskey
Details | Diff
Patch to check for comma together with brackets (771 bytes, patch)
2007-05-18 06:57 UTC, Paul Vinkenoog
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Vinkenoog 2007-04-07 08:34:48 UTC
This patch re-implements the full internal link functionality in the PDF output.
Bookmarks, basic-links and named destinations again point to the exact spot
instead of the top of the page. The intermediate area tree XML format is also
supported (except for named destinations).

The target PageViewport is still determined during the layout phase; the on-page
position is calculated in PDFRenderer. Note that the code could be greatly
simplified if we moved everything to the renderer. This wouldn't break anything
in FOP, but it would prevent custom renderers from getting the target
PageViewport key. The patch itself also breaks renderers who pick up that key
(as the type of the InternalLink trait has changed), but not beyond repair: all
that's needed is a minor code change wherever that trait is read and the caller
expects a String.

The vast majority of changes are in PDFRenderer, PDFFactory, XMLRenderer and
AreaTreeParser. But some other modifications were also required (or at least
highly desired). The most important of them are:
- PageViewport needs to know for what IDs it is the first PV, independently of
ATH. This info is needed in the renderer.
- The dummy area generated by a fo:wrapper must be added to the area tree, at
least if it has an ID. (DocBook generates empty wrappers for <indexterm>s.)
[Note: I kept this a "generic" InlineArea. But maybe it would be better to
create a subclass, e.g. DummyInlineArea.]
- Two extra constructors for BookmarkData, for stuff read back from the area
tree XML. Also made the tagging of subdata as "unresolved" conditional, because
bookmarks read from AT XML are already resolved.
- To avoid code duplication (and divergence), the actual adding of an
INTERNAL_LINK trait to an area is now done exclusively in the LinkResolver.
- Created an additional constructor for PDFDestination.
- New method PDFState.getBaseTransform() returns the transformation to be used
for fixed-positioned areas.

I also added some helper functions here and there to move duplicated code into
(esp. in PDFFactory). And on at least two occasions I changed existing comments
because they puzzled me rather than helping me understand the purpose of the
commented-upon object. Whether all these changes are acceptable is of course up
to the team. If any changes or extensions are required, I'm willing to provide them.
Comment 1 Paul Vinkenoog 2007-04-07 08:37:54 UTC
Created attachment 19922 [details]
Proposed patch file
Comment 2 Jeremias Maerki 2007-04-29 05:28:10 UTC
First of all, thanks a lot for working on this patch, Paul. I've looked at it
during the last two days since I realized I couldn't really apply my own changes
[1] in this area without processing your patch first because of possible
conflicts. Anyway, in my own tests and your patch locally applied, none of the
GoTo Actions get properly positioned since PageViewport.setFirstWithID() is
never called (except from the AreaTreeParser which is only active in special
environments. PDFRenderer.getTargetableID() never returns an ID because those
"first IDs" are not filled. Can you comment on that?

I'm wondering if it weren't simpler just to track "first IDs" inside PDFRenderer
only, i.e. whenever an area with an ID is encountered its position is recorded.
If later the same ID is recorded but from a page logically before the earlier
page (out-of-order rendering), the position would be overwritten. It could
possibly also remove the necessity to extend the area tree XML format with the
"idfirsts" trait. Just a thought.

[1] Just FYI, my changes I mentioned above are targeted at restoring proper
package dependencies, as the pdf package currently relies on the area package
which is bad if we plan to extract the PDF library to XML Graphics Commons at
some point. 
Comment 3 Paul Vinkenoog 2007-04-30 05:35:05 UTC
Hi Jeremias,

> Anyway, in my own tests and your patch locally applied, none of the
> GoTo Actions get properly positioned since PageViewport.setFirstWithID()
> is never called (except from the AreaTreeParser which is only active in
> special environments.

It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top
of the patch file). Maybe this didn't make it into your source because there are
several conflicts with a more recent commit. They're all trivial though; easy to
resolve.

> I'm wondering if it weren't simpler just to track "first IDs" inside
> PDFRenderer only, i.e. whenever an area with an ID is encountered its
> position is recorded. If later the same ID is recorded but from a page
> logically before the earlier page (out-of-order rendering), the position
> would be overwritten.

Yes, this was one of the options I presented on fop-dev while I was working on
this. It would make PDFRenderer more complicated, but you could kick the entire
link resolution stuff out of the layout code (PV, ATH), so overall the code
would become simpler. The problem is that this would break other (custom)
renderers that expect the INTERNAL_LINK trait to contain the target PV key.

> It could possibly also remove the necessity to extend the area tree XML
> format with the "idfirsts" trait. Just a thought.

I take it you don't like that attribute. Neither do I. It's ugly, and every time
I look at it I get the feeling that it shouldn't be necessary. But it was the
best I could come up with given the chosen approach (= resolve to page level in
layout phase, determine exact location in renderer). There's a workaround
possible though (in PDFRenderer): store the first occurrence of each ID on every
PV in a map (with the ID as key, and a list of PV->location maps as value).
Whenever an internal link trait is parsed, the ID plus the key of the first PV
containing that ID are extracted. At that moment, if the location of that ID on
that PV is already in the map, the resolution can be completed. If not, it'll
have to wait (but this is also the case in the current patch). The renderer
wouldn't even have to worry about which page comes first, because the link trait
already contains that information.

If you want me to implement this, I will.
Comment 4 Jeremias Maerki 2007-04-30 06:16:19 UTC
(In reply to comment #3)
> Hi Jeremias,
> 
> > Anyway, in my own tests and your patch locally applied, none of the
> > GoTo Actions get properly positioned since PageViewport.setFirstWithID()
> > is never called (except from the AreaTreeParser which is only active in
> > special environments.
> 
> It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top
> of the patch file). Maybe this didn't make it into your source because there are
> several conflicts with a more recent commit. They're all trivial though; easy to
> resolve.

You're right. Looks like I clicked too quickly through the "apply patch" dialog.

<snip/>

> > I'm wondering if it weren't simpler just to track "first IDs" inside
> > PDFRenderer only, i.e. whenever an area with an ID is encountered its
> > position is recorded. If later the same ID is recorded but from a page
> > logically before the earlier page (out-of-order rendering), the position
> > would be overwritten.
> 
> Yes, this was one of the options I presented on fop-dev while I was working on
> this. It would make PDFRenderer more complicated, but you could kick the entire
> link resolution stuff out of the layout code (PV, ATH), so overall the code
> would become simpler. The problem is that this would break other (custom)
> renderers that expect the INTERNAL_LINK trait to contain the target PV key.

Well, that's me not being able to catch up with everything lately. :-( I get the
impression that it should be possible to do this without breaking
backwards-compatibility too much. Maybe we should do an inventory of people who
are maintaining external renderers so we have an idea who would be affected and
how much.

> > It could possibly also remove the necessity to extend the area tree XML
> > format with the "idfirsts" trait. Just a thought.
> 
> I take it you don't like that attribute. Neither do I. It's ugly, and every time
> I look at it I get the feeling that it shouldn't be necessary. But it was the
> best I could come up with given the chosen approach (= resolve to page level in
> layout phase, determine exact location in renderer). There's a workaround
> possible though (in PDFRenderer): store the first occurrence of each ID on every
> PV in a map (with the ID as key, and a list of PV->location maps as value).
> Whenever an internal link trait is parsed, the ID plus the key of the first PV
> containing that ID are extracted. At that moment, if the location of that ID on
> that PV is already in the map, the resolution can be completed. If not, it'll
> have to wait (but this is also the case in the current patch). The renderer
> wouldn't even have to worry about which page comes first, because the link trait
> already contains that information.
> 
> If you want me to implement this, I will.

I'm in no position to "want" anything. :-) I can only make wishes and be happy
that there are people around helping us out. But consider the removal of the
idfirsts attribute a wish. *g* Since I'll be leaving for ApacheCon in a few
hours I may not have enough time to finish applying the current patch. If you
could look into the issue until the end of the week that'd be awesome. But don't
worry if you don't make it.

BTW, another wish if I may: Would you please submit an ICLA (and CCLA if
necessary) [2]? Your patch is a borderline case where an ICLA on file becomes
necessary. Thanks a lot! You can submit the document by fax or by sending a
scanned version to BOTH secretary.at.apache.org AND legal-archive.at.apache.org.

[2] http://www.apache.org/licenses/#clas
Comment 5 Paul Vinkenoog 2007-04-30 06:41:56 UTC
(In reply to comment #4)

> I get the impression that it should be possible to do this without breaking
> backwards-compatibility too much. Maybe we should do an inventory of people
> who are maintaining external renderers so we have an idea who would be
> affected and how much.

Yes, if nobody picks up this trait, then we don't have to worry about it either.
Within FOP, only PDFRenderer uses it. Notice that my patch does break custom
renderers at this point, but they can be fixed with a one-line code change. The
only way not to break them at all would be to pass PV key and ID in separate traits.

>> It could possibly also remove the necessity to extend the area tree XML
>> format with the "idfirsts" trait. Just a thought.

> I'm in no position to "want" anything. :-) I can only make wishes and be happy
> that there are people around helping us out. But consider the removal of the
> idfirsts attribute a wish. *g*

Well, I think I can do it this week. The more I look at that attribute, the more
I hate it, so I guess I'm motivated :-)

> Since I'll be leaving for ApacheCon in a few hours I may not have enough time
> to finish applying the current patch.

I wish you all a great time at the conference and in Amsterdam (my hometown). At
least the weather is fine!

> BTW, another wish if I may: Would you please submit an ICLA (and CCLA if
> necessary)

OK, I'll take care of it.
Comment 6 Paul Vinkenoog 2007-05-04 07:08:55 UTC
OK, I've removed the idfirsts attribute from the area tree XML. I found a
simpler and better solution than tracking the ID positions on all pages in the
PDFRenderer though. Since the AT XML is always parsed in document order, all
that's needed is for the AreaTreeParser.Handler to call
currentPageViewport.setFirstWithID() whenever a new PROD_ID is encountered. This
way PageViewport.isFirstWithID() remains usable no matter whether the source is
rendered straight from the XSL-FO or from an XML area tree. Otherwise the effort
to determine the correct PV would have to be duplicated in every interested
renderer.

I'll upload a revised patch against the current source tree.
Comment 7 Paul Vinkenoog 2007-05-04 07:12:56 UTC
Created attachment 20131 [details]
Revised patch, no more idfirsts attribute in area tree XML
Comment 8 Jeremias Maerki 2007-05-07 05:25:28 UTC
Thanks a lot, Paul. I'm quite comfortable with the new patch, and so I applied it:
http://svn.apache.org/viewvc?view=rev&rev=535866

BTW, your ICLA has already been recorded. Thanks for that, too.
Comment 9 Paul Vinkenoog 2007-05-07 06:29:51 UTC
> I'm quite comfortable with the new patch, and so I applied it:

Thanks, I'm very glad that my patch is of use!
Comment 10 Jeremias Maerki 2007-05-11 00:28:44 UTC
*** Bug 41532 has been marked as a duplicate of this bug. ***
Comment 11 Adrian Cumiskey 2007-05-17 08:39:58 UTC
Created attachment 20211 [details]
small patch file

Patch file to fix j2se1.4 dependency.
Comment 12 Adrian Cumiskey 2007-05-17 08:43:27 UTC
I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my
Eclipse IDE environment today and discovered this small J2SE1.4 dependency on
the use of String.split().  This small patch should fix this.
Comment 13 Adrian Cumiskey 2007-05-17 08:45:10 UTC
Created attachment 20212 [details]
small patch file

checkstyled :-)
Comment 14 Jeremias Maerki 2007-05-18 05:00:55 UTC
(In reply to comment #12)
> I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my
> Eclipse IDE environment today and discovered this small J2SE1.4 dependency on
> the use of String.split().  This small patch should fix this.

Thanks for spotting this, Adrian. Looks like I'm not compiling against J2SE 1.3
often enough. Patch applied (this one anyway).
Comment 15 Paul Vinkenoog 2007-05-18 06:55:12 UTC
(In reply to comment #14)
>> I discovered recently that FOP still aims to be JDK1.3 compliant
Sorry, I wasn't aware of this.
>> This small patch should fix this.
It does, but the patched code also throws a StringIndexOutOfBoundsException if
for some reason the attribute value is in brackets but doesn't contain a comma.
I think it's best to add a comma check to the innermost if. This will also bring
the code in line with the description (my original code wasn't, I'm afraid).
So here's an even smaller patch.
Comment 16 Paul Vinkenoog 2007-05-18 06:57:50 UTC
Created attachment 20215 [details]
Patch to check for comma together with brackets
Comment 17 Jeremias Maerki 2007-05-20 23:18:43 UTC
(In reply to comment #16)
> Created an attachment (id=20215) [edit]
> Patch to check for comma together with brackets
> 

Patch applied. Thanks, Paul.
Comment 18 Glenn Adams 2012-04-01 07:06:22 UTC
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed