Fop
  1. Fop
  2. FOP-1348

[PATCH] Exact positioning of PDF internal links

    Details

    • Type: Bug Bug
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: None
    • Component/s: pdf
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: All
    • External issue ID:
      42067

      Description

      This patch re-implements the full internal link functionality in the PDF output.
      Bookmarks, basic-links and named destinations again point to the exact spot
      instead of the top of the page. The intermediate area tree XML format is also
      supported (except for named destinations).

      The target PageViewport is still determined during the layout phase; the on-page
      position is calculated in PDFRenderer. Note that the code could be greatly
      simplified if we moved everything to the renderer. This wouldn't break anything
      in FOP, but it would prevent custom renderers from getting the target
      PageViewport key. The patch itself also breaks renderers who pick up that key
      (as the type of the InternalLink trait has changed), but not beyond repair: all
      that's needed is a minor code change wherever that trait is read and the caller
      expects a String.

      The vast majority of changes are in PDFRenderer, PDFFactory, XMLRenderer and
      AreaTreeParser. But some other modifications were also required (or at least
      highly desired). The most important of them are:

      • PageViewport needs to know for what IDs it is the first PV, independently of
        ATH. This info is needed in the renderer.
      • The dummy area generated by a fo:wrapper must be added to the area tree, at
        least if it has an ID. (DocBook generates empty wrappers for <indexterm>s.)
        [Note: I kept this a "generic" InlineArea. But maybe it would be better to
        create a subclass, e.g. DummyInlineArea.]
      • Two extra constructors for BookmarkData, for stuff read back from the area
        tree XML. Also made the tagging of subdata as "unresolved" conditional, because
        bookmarks read from AT XML are already resolved.
      • To avoid code duplication (and divergence), the actual adding of an
        INTERNAL_LINK trait to an area is now done exclusively in the LinkResolver.
      • Created an additional constructor for PDFDestination.
      • New method PDFState.getBaseTransform() returns the transformation to be used
        for fixed-positioned areas.

      I also added some helper functions here and there to move duplicated code into
      (esp. in PDFFactory). And on at least two occasions I changed existing comments
      because they puzzled me rather than helping me understand the purpose of the
      commented-upon object. Whether all these changes are acceptable is of course up
      to the team. If any changes or extensions are required, I'm willing to provide them.

      1. Trait.diff
        1 kB
        Adrian Cumiskey
      2. Trait.diff
        1 kB
        Adrian Cumiskey
      3. internal-destination.patch
        78 kB
        Paul Vinkenoog
      4. internal-destination.patch
        83 kB
        Paul Vinkenoog
      5. check-comma.patch
        0.8 kB
        Paul Vinkenoog

        Issue Links

          Activity

          Hide
          Paul Vinkenoog added a comment -

          Attachment internal-destination.patch has been added with description: Proposed patch file

          Show
          Paul Vinkenoog added a comment - Attachment internal-destination.patch has been added with description: Proposed patch file
          Hide
          Jeremias Maerki added a comment -

          First of all, thanks a lot for working on this patch, Paul. I've looked at it
          during the last two days since I realized I couldn't really apply my own changes
          [1] in this area without processing your patch first because of possible
          conflicts. Anyway, in my own tests and your patch locally applied, none of the
          GoTo Actions get properly positioned since PageViewport.setFirstWithID() is
          never called (except from the AreaTreeParser which is only active in special
          environments. PDFRenderer.getTargetableID() never returns an ID because those
          "first IDs" are not filled. Can you comment on that?

          I'm wondering if it weren't simpler just to track "first IDs" inside PDFRenderer
          only, i.e. whenever an area with an ID is encountered its position is recorded.
          If later the same ID is recorded but from a page logically before the earlier
          page (out-of-order rendering), the position would be overwritten. It could
          possibly also remove the necessity to extend the area tree XML format with the
          "idfirsts" trait. Just a thought.

          [1] Just FYI, my changes I mentioned above are targeted at restoring proper
          package dependencies, as the pdf package currently relies on the area package
          which is bad if we plan to extract the PDF library to XML Graphics Commons at
          some point.

          Show
          Jeremias Maerki added a comment - First of all, thanks a lot for working on this patch, Paul. I've looked at it during the last two days since I realized I couldn't really apply my own changes [1] in this area without processing your patch first because of possible conflicts. Anyway, in my own tests and your patch locally applied, none of the GoTo Actions get properly positioned since PageViewport.setFirstWithID() is never called (except from the AreaTreeParser which is only active in special environments. PDFRenderer.getTargetableID() never returns an ID because those "first IDs" are not filled. Can you comment on that? I'm wondering if it weren't simpler just to track "first IDs" inside PDFRenderer only, i.e. whenever an area with an ID is encountered its position is recorded. If later the same ID is recorded but from a page logically before the earlier page (out-of-order rendering), the position would be overwritten. It could possibly also remove the necessity to extend the area tree XML format with the "idfirsts" trait. Just a thought. [1] Just FYI, my changes I mentioned above are targeted at restoring proper package dependencies, as the pdf package currently relies on the area package which is bad if we plan to extract the PDF library to XML Graphics Commons at some point.
          Hide
          Paul Vinkenoog added a comment -

          Hi Jeremias,

          > Anyway, in my own tests and your patch locally applied, none of the
          > GoTo Actions get properly positioned since PageViewport.setFirstWithID()
          > is never called (except from the AreaTreeParser which is only active in
          > special environments.

          It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top
          of the patch file). Maybe this didn't make it into your source because there are
          several conflicts with a more recent commit. They're all trivial though; easy to
          resolve.

          > I'm wondering if it weren't simpler just to track "first IDs" inside
          > PDFRenderer only, i.e. whenever an area with an ID is encountered its
          > position is recorded. If later the same ID is recorded but from a page
          > logically before the earlier page (out-of-order rendering), the position
          > would be overwritten.

          Yes, this was one of the options I presented on fop-dev while I was working on
          this. It would make PDFRenderer more complicated, but you could kick the entire
          link resolution stuff out of the layout code (PV, ATH), so overall the code
          would become simpler. The problem is that this would break other (custom)
          renderers that expect the INTERNAL_LINK trait to contain the target PV key.

          > It could possibly also remove the necessity to extend the area tree XML
          > format with the "idfirsts" trait. Just a thought.

          I take it you don't like that attribute. Neither do I. It's ugly, and every time
          I look at it I get the feeling that it shouldn't be necessary. But it was the
          best I could come up with given the chosen approach (= resolve to page level in
          layout phase, determine exact location in renderer). There's a workaround
          possible though (in PDFRenderer): store the first occurrence of each ID on every
          PV in a map (with the ID as key, and a list of PV->location maps as value).
          Whenever an internal link trait is parsed, the ID plus the key of the first PV
          containing that ID are extracted. At that moment, if the location of that ID on
          that PV is already in the map, the resolution can be completed. If not, it'll
          have to wait (but this is also the case in the current patch). The renderer
          wouldn't even have to worry about which page comes first, because the link trait
          already contains that information.

          If you want me to implement this, I will.

          Show
          Paul Vinkenoog added a comment - Hi Jeremias, > Anyway, in my own tests and your patch locally applied, none of the > GoTo Actions get properly positioned since PageViewport.setFirstWithID() > is never called (except from the AreaTreeParser which is only active in > special environments. It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top of the patch file). Maybe this didn't make it into your source because there are several conflicts with a more recent commit. They're all trivial though; easy to resolve. > I'm wondering if it weren't simpler just to track "first IDs" inside > PDFRenderer only, i.e. whenever an area with an ID is encountered its > position is recorded. If later the same ID is recorded but from a page > logically before the earlier page (out-of-order rendering), the position > would be overwritten. Yes, this was one of the options I presented on fop-dev while I was working on this. It would make PDFRenderer more complicated, but you could kick the entire link resolution stuff out of the layout code (PV, ATH), so overall the code would become simpler. The problem is that this would break other (custom) renderers that expect the INTERNAL_LINK trait to contain the target PV key. > It could possibly also remove the necessity to extend the area tree XML > format with the "idfirsts" trait. Just a thought. I take it you don't like that attribute. Neither do I. It's ugly, and every time I look at it I get the feeling that it shouldn't be necessary. But it was the best I could come up with given the chosen approach (= resolve to page level in layout phase, determine exact location in renderer). There's a workaround possible though (in PDFRenderer): store the first occurrence of each ID on every PV in a map (with the ID as key, and a list of PV->location maps as value). Whenever an internal link trait is parsed, the ID plus the key of the first PV containing that ID are extracted. At that moment, if the location of that ID on that PV is already in the map, the resolution can be completed. If not, it'll have to wait (but this is also the case in the current patch). The renderer wouldn't even have to worry about which page comes first, because the link trait already contains that information. If you want me to implement this, I will.
          Hide
          Jeremias Maerki added a comment -

          (In reply to comment #3)
          > Hi Jeremias,
          >
          > > Anyway, in my own tests and your patch locally applied, none of the
          > > GoTo Actions get properly positioned since PageViewport.setFirstWithID()
          > > is never called (except from the AreaTreeParser which is only active in
          > > special environments.
          >
          > It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top
          > of the patch file). Maybe this didn't make it into your source because there are
          > several conflicts with a more recent commit. They're all trivial though; easy to
          > resolve.

          You're right. Looks like I clicked too quickly through the "apply patch" dialog.

          <snip/>

          > > I'm wondering if it weren't simpler just to track "first IDs" inside
          > > PDFRenderer only, i.e. whenever an area with an ID is encountered its
          > > position is recorded. If later the same ID is recorded but from a page
          > > logically before the earlier page (out-of-order rendering), the position
          > > would be overwritten.
          >
          > Yes, this was one of the options I presented on fop-dev while I was working on
          > this. It would make PDFRenderer more complicated, but you could kick the entire
          > link resolution stuff out of the layout code (PV, ATH), so overall the code
          > would become simpler. The problem is that this would break other (custom)
          > renderers that expect the INTERNAL_LINK trait to contain the target PV key.

          Well, that's me not being able to catch up with everything lately. I get the
          impression that it should be possible to do this without breaking
          backwards-compatibility too much. Maybe we should do an inventory of people who
          are maintaining external renderers so we have an idea who would be affected and
          how much.

          > > It could possibly also remove the necessity to extend the area tree XML
          > > format with the "idfirsts" trait. Just a thought.
          >
          > I take it you don't like that attribute. Neither do I. It's ugly, and every time
          > I look at it I get the feeling that it shouldn't be necessary. But it was the
          > best I could come up with given the chosen approach (= resolve to page level in
          > layout phase, determine exact location in renderer). There's a workaround
          > possible though (in PDFRenderer): store the first occurrence of each ID on every
          > PV in a map (with the ID as key, and a list of PV->location maps as value).
          > Whenever an internal link trait is parsed, the ID plus the key of the first PV
          > containing that ID are extracted. At that moment, if the location of that ID on
          > that PV is already in the map, the resolution can be completed. If not, it'll
          > have to wait (but this is also the case in the current patch). The renderer
          > wouldn't even have to worry about which page comes first, because the link trait
          > already contains that information.
          >
          > If you want me to implement this, I will.

          I'm in no position to "want" anything. I can only make wishes and be happy
          that there are people around helping us out. But consider the removal of the
          idfirsts attribute a wish. g Since I'll be leaving for ApacheCon in a few
          hours I may not have enough time to finish applying the current patch. If you
          could look into the issue until the end of the week that'd be awesome. But don't
          worry if you don't make it.

          BTW, another wish if I may: Would you please submit an ICLA (and CCLA if
          necessary) [2]? Your patch is a borderline case where an ICLA on file becomes
          necessary. Thanks a lot! You can submit the document by fax or by sending a
          scanned version to BOTH secretary.at.apache.org AND legal-archive.at.apache.org.

          [2] http://www.apache.org/licenses/#clas

          Show
          Jeremias Maerki added a comment - (In reply to comment #3) > Hi Jeremias, > > > Anyway, in my own tests and your patch locally applied, none of the > > GoTo Actions get properly positioned since PageViewport.setFirstWithID() > > is never called (except from the AreaTreeParser which is only active in > > special environments. > > It's also called from AreaTreeHandler.associateIDWithPageViewport (near the top > of the patch file). Maybe this didn't make it into your source because there are > several conflicts with a more recent commit. They're all trivial though; easy to > resolve. You're right. Looks like I clicked too quickly through the "apply patch" dialog. <snip/> > > I'm wondering if it weren't simpler just to track "first IDs" inside > > PDFRenderer only, i.e. whenever an area with an ID is encountered its > > position is recorded. If later the same ID is recorded but from a page > > logically before the earlier page (out-of-order rendering), the position > > would be overwritten. > > Yes, this was one of the options I presented on fop-dev while I was working on > this. It would make PDFRenderer more complicated, but you could kick the entire > link resolution stuff out of the layout code (PV, ATH), so overall the code > would become simpler. The problem is that this would break other (custom) > renderers that expect the INTERNAL_LINK trait to contain the target PV key. Well, that's me not being able to catch up with everything lately. I get the impression that it should be possible to do this without breaking backwards-compatibility too much. Maybe we should do an inventory of people who are maintaining external renderers so we have an idea who would be affected and how much. > > It could possibly also remove the necessity to extend the area tree XML > > format with the "idfirsts" trait. Just a thought. > > I take it you don't like that attribute. Neither do I. It's ugly, and every time > I look at it I get the feeling that it shouldn't be necessary. But it was the > best I could come up with given the chosen approach (= resolve to page level in > layout phase, determine exact location in renderer). There's a workaround > possible though (in PDFRenderer): store the first occurrence of each ID on every > PV in a map (with the ID as key, and a list of PV->location maps as value). > Whenever an internal link trait is parsed, the ID plus the key of the first PV > containing that ID are extracted. At that moment, if the location of that ID on > that PV is already in the map, the resolution can be completed. If not, it'll > have to wait (but this is also the case in the current patch). The renderer > wouldn't even have to worry about which page comes first, because the link trait > already contains that information. > > If you want me to implement this, I will. I'm in no position to "want" anything. I can only make wishes and be happy that there are people around helping us out. But consider the removal of the idfirsts attribute a wish. g Since I'll be leaving for ApacheCon in a few hours I may not have enough time to finish applying the current patch. If you could look into the issue until the end of the week that'd be awesome. But don't worry if you don't make it. BTW, another wish if I may: Would you please submit an ICLA (and CCLA if necessary) [2] ? Your patch is a borderline case where an ICLA on file becomes necessary. Thanks a lot! You can submit the document by fax or by sending a scanned version to BOTH secretary.at.apache.org AND legal-archive.at.apache.org. [2] http://www.apache.org/licenses/#clas
          Hide
          Paul Vinkenoog added a comment -

          (In reply to comment #4)

          > I get the impression that it should be possible to do this without breaking
          > backwards-compatibility too much. Maybe we should do an inventory of people
          > who are maintaining external renderers so we have an idea who would be
          > affected and how much.

          Yes, if nobody picks up this trait, then we don't have to worry about it either.
          Within FOP, only PDFRenderer uses it. Notice that my patch does break custom
          renderers at this point, but they can be fixed with a one-line code change. The
          only way not to break them at all would be to pass PV key and ID in separate traits.

          >> It could possibly also remove the necessity to extend the area tree XML
          >> format with the "idfirsts" trait. Just a thought.

          > I'm in no position to "want" anything. I can only make wishes and be happy
          > that there are people around helping us out. But consider the removal of the
          > idfirsts attribute a wish. g

          Well, I think I can do it this week. The more I look at that attribute, the more
          I hate it, so I guess I'm motivated

          > Since I'll be leaving for ApacheCon in a few hours I may not have enough time
          > to finish applying the current patch.

          I wish you all a great time at the conference and in Amsterdam (my hometown). At
          least the weather is fine!

          > BTW, another wish if I may: Would you please submit an ICLA (and CCLA if
          > necessary)

          OK, I'll take care of it.

          Show
          Paul Vinkenoog added a comment - (In reply to comment #4) > I get the impression that it should be possible to do this without breaking > backwards-compatibility too much. Maybe we should do an inventory of people > who are maintaining external renderers so we have an idea who would be > affected and how much. Yes, if nobody picks up this trait, then we don't have to worry about it either. Within FOP, only PDFRenderer uses it. Notice that my patch does break custom renderers at this point, but they can be fixed with a one-line code change. The only way not to break them at all would be to pass PV key and ID in separate traits. >> It could possibly also remove the necessity to extend the area tree XML >> format with the "idfirsts" trait. Just a thought. > I'm in no position to "want" anything. I can only make wishes and be happy > that there are people around helping us out. But consider the removal of the > idfirsts attribute a wish. g Well, I think I can do it this week. The more I look at that attribute, the more I hate it, so I guess I'm motivated > Since I'll be leaving for ApacheCon in a few hours I may not have enough time > to finish applying the current patch. I wish you all a great time at the conference and in Amsterdam (my hometown). At least the weather is fine! > BTW, another wish if I may: Would you please submit an ICLA (and CCLA if > necessary) OK, I'll take care of it.
          Hide
          Paul Vinkenoog added a comment -

          OK, I've removed the idfirsts attribute from the area tree XML. I found a
          simpler and better solution than tracking the ID positions on all pages in the
          PDFRenderer though. Since the AT XML is always parsed in document order, all
          that's needed is for the AreaTreeParser.Handler to call
          currentPageViewport.setFirstWithID() whenever a new PROD_ID is encountered. This
          way PageViewport.isFirstWithID() remains usable no matter whether the source is
          rendered straight from the XSL-FO or from an XML area tree. Otherwise the effort
          to determine the correct PV would have to be duplicated in every interested
          renderer.

          I'll upload a revised patch against the current source tree.

          Show
          Paul Vinkenoog added a comment - OK, I've removed the idfirsts attribute from the area tree XML. I found a simpler and better solution than tracking the ID positions on all pages in the PDFRenderer though. Since the AT XML is always parsed in document order, all that's needed is for the AreaTreeParser.Handler to call currentPageViewport.setFirstWithID() whenever a new PROD_ID is encountered. This way PageViewport.isFirstWithID() remains usable no matter whether the source is rendered straight from the XSL-FO or from an XML area tree. Otherwise the effort to determine the correct PV would have to be duplicated in every interested renderer. I'll upload a revised patch against the current source tree.
          Hide
          Paul Vinkenoog added a comment -

          Attachment internal-destination.patch has been added with description: Revised patch, no more idfirsts attribute in area tree XML

          Show
          Paul Vinkenoog added a comment - Attachment internal-destination.patch has been added with description: Revised patch, no more idfirsts attribute in area tree XML
          Hide
          Jeremias Maerki added a comment -

          Thanks a lot, Paul. I'm quite comfortable with the new patch, and so I applied it:
          http://svn.apache.org/viewvc?view=rev&rev=535866

          BTW, your ICLA has already been recorded. Thanks for that, too.

          Show
          Jeremias Maerki added a comment - Thanks a lot, Paul. I'm quite comfortable with the new patch, and so I applied it: http://svn.apache.org/viewvc?view=rev&rev=535866 BTW, your ICLA has already been recorded. Thanks for that, too.
          Hide
          Paul Vinkenoog added a comment -

          > I'm quite comfortable with the new patch, and so I applied it:

          Thanks, I'm very glad that my patch is of use!

          Show
          Paul Vinkenoog added a comment - > I'm quite comfortable with the new patch, and so I applied it: Thanks, I'm very glad that my patch is of use!
          Hide
          Jeremias Maerki added a comment -
              • FOP-1305 has been marked as a duplicate of this bug. ***
          Show
          Jeremias Maerki added a comment - FOP-1305 has been marked as a duplicate of this bug. ***
          Hide
          Adrian Cumiskey added a comment -

          Patch file to fix j2se1.4 dependency.

          Show
          Adrian Cumiskey added a comment - Patch file to fix j2se1.4 dependency.
          Hide
          Adrian Cumiskey added a comment -

          Attachment Trait.diff has been added with description: small patch file

          Show
          Adrian Cumiskey added a comment - Attachment Trait.diff has been added with description: small patch file
          Hide
          Adrian Cumiskey added a comment -

          I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my
          Eclipse IDE environment today and discovered this small J2SE1.4 dependency on
          the use of String.split(). This small patch should fix this.

          Show
          Adrian Cumiskey added a comment - I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my Eclipse IDE environment today and discovered this small J2SE1.4 dependency on the use of String.split(). This small patch should fix this.
          Hide
          Adrian Cumiskey added a comment -

          checkstyled

          Show
          Adrian Cumiskey added a comment - checkstyled
          Hide
          Adrian Cumiskey added a comment -

          Attachment Trait.diff has been added with description: small patch file

          Show
          Adrian Cumiskey added a comment - Attachment Trait.diff has been added with description: small patch file
          Hide
          Jeremias Maerki added a comment -

          (In reply to comment #12)
          > I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my
          > Eclipse IDE environment today and discovered this small J2SE1.4 dependency on
          > the use of String.split(). This small patch should fix this.

          Thanks for spotting this, Adrian. Looks like I'm not compiling against J2SE 1.3
          often enough. Patch applied (this one anyway).

          Show
          Jeremias Maerki added a comment - (In reply to comment #12) > I discovered recently that FOP still aims to be JDK1.3 compliant so I updated my > Eclipse IDE environment today and discovered this small J2SE1.4 dependency on > the use of String.split(). This small patch should fix this. Thanks for spotting this, Adrian. Looks like I'm not compiling against J2SE 1.3 often enough. Patch applied (this one anyway).
          Hide
          Paul Vinkenoog added a comment -

          (In reply to comment #14)
          >> I discovered recently that FOP still aims to be JDK1.3 compliant
          Sorry, I wasn't aware of this.
          >> This small patch should fix this.
          It does, but the patched code also throws a StringIndexOutOfBoundsException if
          for some reason the attribute value is in brackets but doesn't contain a comma.
          I think it's best to add a comma check to the innermost if. This will also bring
          the code in line with the description (my original code wasn't, I'm afraid).
          So here's an even smaller patch.

          Show
          Paul Vinkenoog added a comment - (In reply to comment #14) >> I discovered recently that FOP still aims to be JDK1.3 compliant Sorry, I wasn't aware of this. >> This small patch should fix this. It does, but the patched code also throws a StringIndexOutOfBoundsException if for some reason the attribute value is in brackets but doesn't contain a comma. I think it's best to add a comma check to the innermost if. This will also bring the code in line with the description (my original code wasn't, I'm afraid). So here's an even smaller patch.
          Hide
          Paul Vinkenoog added a comment -

          Attachment check-comma.patch has been added with description: Patch to check for comma together with brackets

          Show
          Paul Vinkenoog added a comment - Attachment check-comma.patch has been added with description: Patch to check for comma together with brackets
          Hide
          Jeremias Maerki added a comment -

          (In reply to comment #16)
          > Created an attachment (id=20215) [edit]
          > Patch to check for comma together with brackets
          >

          Patch applied. Thanks, Paul.

          Show
          Jeremias Maerki added a comment - (In reply to comment #16) > Created an attachment (id=20215) [edit] > Patch to check for comma together with brackets > Patch applied. Thanks, Paul.
          Hide
          Glenn Adams added a comment -

          batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

          Show
          Glenn Adams added a comment - batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

            People

            • Assignee:
              fop-dev
              Reporter:
              Paul Vinkenoog
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development