41959 – External links are broken if pdf is encryped

Bug 41959 - External links are broken if pdf is encryped

Summary: External links are broken if pdf is encryped

Status:	NEW

Alias:	None

Product:	Fop - Now in Jira
Classification:	Unclassified
Component:	pdf (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P3 normal
Target Milestone:	---
Assignee:	fop-dev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-03-27 06:19 UTC by Cornelius Hald
Modified:	2012-11-26 22:16 UTC (History)
CC List:	4 users (show)

Attachments
Proposed way to approach this problem. (30.27 KB, patch) 2008-12-01 02:24 UTC, Jeremias Maerki	Details \| Diff
updated patch (37.56 KB, patch) 2008-12-28 04:32 UTC, Andreas L. Delmelle	Details \| Diff
another update (38.50 KB, patch) 2008-12-28 13:24 UTC, Andreas L. Delmelle	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Cornelius Hald 2007-03-27 06:19:32 UTC

If pdf encryption is turned on links to external resources are scrambled. The
link destination varies with every conversion, so it looks like the decryption
or encryption of the link is not working properly.

Steps to recreate this bug:

1. Create a fop document with a link like this:
<fo:basic-link external-destination="url(http://www.apache.org)">

2. Convert this document to pdf with turned on encryption:
java -jar fop.jar in.fo out.pdf -u secret

Tested with fop 0.92 and 0.93 with Adobe Reader 7.0 and 8.0 under WindowsXP.

Comment 1 Andreas L. Delmelle 2007-03-28 12:15:40 UTC

Confirmed. Running examples/fo/basic/newlinktest.fo even crashed my Adobe Reader just now... :/

No immediate idea on the cause or fix, though.

Comment 2 Andreas L. Delmelle 2008-11-29 05:37:02 UTC

Still present in FOP Trunk

Comment 3 Andreas L. Delmelle 2008-11-30 03:44:59 UTC

Recently reported again on fop-users@ by Peter Coppens, who made progress towards a fix:

---
I have something that seems to be working, but I am not sure it is always
correct and I doubt it's the most optimal approach.

Here are the changes I did

1. PDFLink#toPDFString
Added

       this.action.setParent(this);

(would be better to do this when creating the PDFUri object )

2. PDFUri#getAction

   public String getAction() {
       return "<< /URI (" + uri + ")\n/S /URI >>";

   }


-->

   public String getAction() {
         String uriString=new String(this.encodeText2(uri));

         return "<< /URI " + uriString + "\n /S /URI >>";

       }


3. Added PDFObject#encodeText2


   protected byte[] encodeText2(String text)  {
       if (getDocumentSafely().isEncryptionActive()) {
           final byte[] buf = PDFText.encode(text);
           byte[] enc = getDocument().getEncryption().encrypt(buf, this);
           return PDFText.toHex(enc,true).getBytes();

        } else {
           return encode(PDFText.escapeText(text, false));

        }

   }


Perhaps this can be used as something to start from for a patch for bug
41959 ?

Thanks,

Peter
---

Comment 4 Andreas L. Delmelle 2008-11-30 03:47:49 UTC

Adding my response to Peter's suggestions here too:

---
> Here are the changes I did
>
> 1. PDFLink#toPDFString
> Added
>
>      this.action.setParent(this);
>
> (would be better to do this when creating the PDFUri object )

Or maybe, we should do this in PDFLink#setAction() ?

That seems to be the point where the PDFAction (PDFUri) is associated with the PDFLink.

I already checked whether this had unwanted side-effects when the same URI is used in multiple links, but that does not seem to be the case. The parent link is only used to get to the ancestor document, which is obviously the same for all links.

> 2. PDFUri#getAction
> <snip />
> 3. Added PDFObject#encodeText2

I'm wondering to what extent this new, separate method could be merged with the original encodeText(), maybe by changing the signature, and adding a second parameter to distinguish this case from the one covered by the original method.
---

Comment 5 Andreas L. Delmelle 2008-11-30 14:12:01 UTC

I started playing with it myself, and I had the idea of registering the URI as a separate object, to be referenced by the links.

Not sure, but it seems a cleaner and more comprehensive approach for cases where an identical URI is referenced by many links. 
OTOH, it could have drawbacks, in case a large number of different URIs is referenced by only one link at a time (?)

Changes made in a nutshell:

1. PDFDocument 
  -> added List of uris + accompanying findUriAction() method
2. PDFFactory 
  -> added getUriAction() method, which depends on 1. 
  -> changed getExternalAction() accordingly: replace instantiations to go through getUriAction()
3. PDFUri 
  -> change getAction() implementation to return the PDF object reference (like PDFGoTo does)
  -> change toPDFString() to contain the code contained in Peter's suggested PDFObject.encodeText2()

I'll try to post a patch shortly, but it seems I've made a rather large number of other unrelated changes (mostly minor cleanups) to the code, which would only confuse people...

Comment 6 Jeremias Maerki 2008-12-01 01:05:13 UTC

I'm sorry that I'm a little late here but I fear the approach you both have taken is probably not the best one. Some background: When I write the PDF-in-PDF extension I needed generic PDF structures (arrays, dictionaries etc.) to I could easily transform the PDFBox model into something that can be processed by FOP's PDF library. That's when I introduced PDFArray, PDFDictionary & Co. The thing is: I've only converted some of the existing PDF classes to use the generic structures, yet. Now, fixing the encryption problem only for this special case seems like a bad idea. You may remember we had a similar problem with the PDF outlines (fo:bookmark) in the past. So we're essentially fixing the same bug in different places. A better idea would be to convert the existing PDF classes to use the generic structures when they need to be touched.

Yesterday, I've had some train time (a rare resource lately) and I've converted all the classes related to Actions to use PDFDictionary. That reduces the number of lines of code quite a bit and made the code more readable. I have one remaining problem before my changes are in a state to be committed: I noticed (a little late) that there are two different string types: strings and text strings. String objects are currently treated as if they are "text strings", i.e. allow Unicode. But "strings" (as used by the "URI" dict entry) cannot use Unicode. So I probably have to introduce a "PDFString" class that simply carries a "string" object and handles the encoding (especially in the encryption case) correctly.

BTW, Andreas, the reuse of URI actions may not always be a good idea since the URI alone may not uniquely identify a URI action. There's also a dictionary entry "IsMap" which offers additional functionality. I'm also not entirely sure if this should be a functionality PDFDocument has to offer. This can easily be done outside the PDF library. I'm not saying this is wrong but I wonder if we don't bloat the whole PDF library too much with stuff like this. OTOH, even in PDF 1.7 the URI action doesn't support more than the "URI" and "IsMap" entries, so if your look-up method includes both the parameters, it should be safe.

Comment 7 Peter Coppens 2008-12-01 01:28:52 UTC

I kind of already wondered whether there would not be other cases where this issue would be present (which seems to come from 'forgetting' to invoke an encoding/encryption method before writing out the PDF). I guess that is confirmed now and obviously architectural improvements where it is just no longer possible to create this bug are to be preferred compared to an adhoc fix. That is why I said (..I doubt it's the most optimal approach...).

But for now I am all set (having a local fix on 0.95) and I guess the next release will have the issue fixed (in a better way).

Thanks for moving this forward (and helping me out).

Comment 8 Jeremias Maerki 2008-12-01 02:24:21 UTC

Created attachment 22969 [details]
Proposed way to approach this problem.

I've attached a patch that shows how I would approach the problem. I've done some testing but not enough to simply commit the changes. For example, I haven't checked FileSpecs which I changed, too, as they are also using "string" objects (they would have the same problem when encrypted). I assume Andreas could combine this with his changes. Anyway, a second pair of eyes would be good here. Let me know I should commit the changes.

A more critical part could be the equals() methods that I removed from many classes I converted. I added an equals()/hashcode() pair to PDFDictionary to compensate but I'm not sure if any of this is needed and if it still works as before.

Comment 9 Andreas L. Delmelle 2008-12-03 13:39:11 UTC

(In reply to comment #8)
> Created an attachment (id=22969) [details]
> Proposed way to approach this problem.

Nice! Some time ago, when considering the extension to allow injecting custom PDF dictionaries into the output, I already felt a PDFString object to be absent. 

<snip />
> I assume Andreas could combine this with his changes. 

Yes, seems to be no problem.

> Anyway, a second pair of eyes would be
> good here. Let me know I should commit the changes.
> 
> A more critical part could be the equals() methods that I removed from many
> classes I converted. I added an equals()/hashcode() pair to PDFDictionary to
> compensate but I'm not sure if any of this is needed and if it still works as
> before.
> 

Seems more than OK at the very first glance. Certainly beneficial that the equals()/hashCode() pair is centralized in PDFDictionary. 

For a code-compression competition, maybe equals could be:

public boolean equals(Object o) {
  return this == o
    || (o != null
      && this.getClass() == o.getClass()
      && (this.entries == ((PDFDictionary)o).entries
          || (this.entries != null 
              && this.entries.equals(((PDFDictionary)o).entries))));
}

... but this is more a personal favorite. I'm not sure whether such logical short-circuits actually have any benefit after compilation. I just find them to be clear and concise code-wise... :-)

One thing that could lead to issues, would be when a PDFDictionary can contain PDFObjects that fall back on the default Object.equals(). Not sure if this is a serious issue, but if so, fixing that for the object types in question shouldn't be too difficult. Quickly redeclaring equals() abstract in PDFObject revealed some 19 types that don't provide an override (hence equals() will only return true in case of identity). 
A few of those will be taken care of by your patch. For the others, we should probably first determine if changing them too would make sense (if they can be used in dictionaries)
One type that concerned me, for example, is PDFName, which could act as key in the dictionary. Then I noticed that this is implemented with Java Strings, rather than PDFName objects, so nothing to worry about there. If a PDFName would be used as a value, however, I think we could end up with two dictionaries that have equal-yet-non-identical entries, so entries.equals() would return false because one of the value pairs are of a type that invokes Object.equals().

Comment 10 Andreas L. Delmelle 2008-12-28 04:32:38 UTC

Created attachment 23053 [details]
updated patch


Basically the same as Jeremias' patch-proposal, but with additional hashCode/equals pairs for some other types that are used as values in the dictionaries but are not PDFDictionary subclasses themselves (PDFName, PDFNumber, PDFReference, PDFArray...) .

Still not sure if I got them all. PDFArray can contain any type of Object it seems, so it could well be that I'm still missing some...

Comment 11 Andreas L. Delmelle 2008-12-28 13:24:27 UTC

Created attachment 23054 [details]
another update

Same as the previous, only with the addition of creating only one URI action for each distinct URI. Those are referenced by the links through their PDF object number. A benefit for cases where separate links with the same URI occur in a lot of places in the document. 
Only small thing to add is support for the isMap entry.

Comment 12 Marc Haesen 2009-02-12 09:02:59 UTC

(In reply to comment #11)
> Created an attachment (id=23054) [details]
> another update
> Same as the previous, only with the addition of creating only one URI action
> for each distinct URI. Those are referenced by the links through their PDF
> object number. A benefit for cases where separate links with the same URI occur
> in a lot of places in the document. 
> Only small thing to add is support for the isMap entry.

I had the same problem and I tried this patch. It solved the link to an external web-site problem but a lot of internal links were not working anymore.

I have also tried the patch suggested in Comment #3, and this seems to solve the external link problem without breaking the internal links.

Comment 13 Andreas L. Delmelle 2009-02-22 13:04:28 UTC

(In reply to comment #12)

Apologies for the rather late response...

> 
> I had the same problem and I tried this patch. It solved the link to an
> external web-site problem but a lot of internal links were not working anymore.

Anything particular about the internal links? I assume that by 'a lot of' you mean 'not all'. I just tried a small sample, and could not reproduce this immediately (latest FOP Trunk with the patch applied). Can you provide a small sample document, perhaps?

> I have also tried the patch suggested in Comment #3, and this seems to solve
> the external link problem without breaking the internal links.

That is expected, since Peter's quick fix only touches the code related to external links (PDF URI Actions). The attached patch proposal is a refactoring of all PDF Actions, including internal links, bookmarks etc.

Comment 14 James Kauczka 2011-04-29 15:12:20 UTC

What is the current status of this bug / patch?
Any thoughts on applying Peter's code from comment 3?
I'm still running 0.95 and I haven't found any mention of this being looked at or fixed in 1.0. I'm thinking if I do try Peter's suggestion I should at least move to 1.0 beforehand. I just hate putting a self-code patch in and then worry that any new release might contain a modification that would break is. 
Duck

Comment 15 Glenn Adams 2012-04-07 01:41:29 UTC

resetting P2 open bugs to P3 pending further review

Comment 16 Glenn Adams 2012-04-11 06:17:10 UTC

change status from ASSIGNED to NEW for consistency