PDFBox
  1. PDFBox
  2. PDFBOX-912

PDF signing interface and improvments

    Details

      Description

      This is a first version of a signing interface for pdfbox. There are some design issues i could not handle without rewriting too much of the code.

      Here we go:

      • incremental update support (tested for signatures with pdf/a compatibility), not compatible with encrypted documents nor with xref-streams
      • cos object improvment
        • COSString with ability to force writing hexbin for given string
        • COSBase with ability to write direct into a dictionary (that means if this is set, no indirect object will be wrote) (sry hard to explain what i mean, actualy needed for incremental update to lower the rate of indirect objects)
        • COSBase with ability to force writing object (this hook help the COSWriter write needed objects for inkremental update)
        • COSName added new names
        • COSDocument some getter and setter for handling new signature and incremental features
      • SignatureException with some exceptions for handling the bunch of new possible errors

      -Parser improvments
      – PDFParser saves now the position of the last startxref
      – VisualSignatureParser (hook for parsing visual signature templates) (it's only for prepared visualisation that should be merged with the document)

      -IO improvments
      – COSFilterInput helps to find the proper content that should be hashed / signed
      – COSStandardOutputStream is tricky, it helps the writer to jump to the right place in the document
      – COSWriter got some improvments for incremental update
      – COSWriterXRefEntry needed for incremental updates and writing the new Xref table

      • PDDocument
        • got a new method addSignature with the needed implementation (do the whole signature stuff)
        • cleanup
      • Fields and Annotations
        • PDSignature represent the signature dictionary
        • PDSignatureFild / Annotation are the visible & unvisible signature representations
      • Signature Interface and options
        • SignatureInterface the interface that shall be implemented for proper signing
        • SignatureOptions, some additional options for signing

      Patch splited into pieces

      sry for spelling, didn't include a spellchecker for english.

      1. writer_patch.txt
        28 kB
        Thomas Chojecki
      2. test.pdf
        23 kB
        Thomas Chojecki
      3. test_signed.pdf
        61 kB
        Thomas Chojecki
      4. signature_exception_patch.txt
        3 kB
        Thomas Chojecki
      5. parser_patch.txt
        11 kB
        Thomas Chojecki
      6. main_documents_patch.txt
        20 kB
        Thomas Chojecki
      7. io_patch.txt
        6 kB
        Thomas Chojecki
      8. interface_options_patch.txt
        2 kB
        Thomas Chojecki
      9. fields_annotations_patch.txt
        17 kB
        Thomas Chojecki
      10. cos_object_improvement_patch.txt
        7 kB
        Thomas Chojecki

        Issue Links

          Activity

          Hide
          Ralf Hauser added a comment -

          see also PDFBOX-1842 re the combination of signing and encryption

          Show
          Ralf Hauser added a comment - see also PDFBOX-1842 re the combination of signing and encryption
          Hide
          Adam added a comment -

          I don't personally need that library, but I'd encourage you to propose
          it on the PDFBOX developer mailing list. All the people I talked to on
          there were very helpful in giving feedback on how to do it in a way
          which would be most useful to others.

          From what I have experienced, if you need an open source library to do
          something, the best thing to do is:
          1.) Ask on the developer's mailing list if the feature already exists
          2.) Ask for suggestions on where/how the feature should be added
          3.) Whether you get any response to either of the inquiries above, just
          implement the feature yourself and post the patches to the mailing list.
          The worst case senerio is that the patches don't get accepted but
          library does what you need it to do, and the patches are in the mailing
          list archives for other people who really need the changes.

          Best of luck in your coding & signatures.

          Regards,
          Adam

          Show
          Adam added a comment - I don't personally need that library, but I'd encourage you to propose it on the PDFBOX developer mailing list. All the people I talked to on there were very helpful in giving feedback on how to do it in a way which would be most useful to others. From what I have experienced, if you need an open source library to do something, the best thing to do is: 1.) Ask on the developer's mailing list if the feature already exists 2.) Ask for suggestions on where/how the feature should be added 3.) Whether you get any response to either of the inquiries above, just implement the feature yourself and post the patches to the mailing list. The worst case senerio is that the patches don't get accepted but library does what you need it to do, and the patches are in the mailing list archives for other people who really need the changes. Best of luck in your coding & signatures. Regards, Adam
          Hide
          vakhtang koroghlishvili added a comment -

          At the moment, As I know, if you need and digital signature with visible signature, you must have an stream of template with visible field (image, or text, or something like that). I've just written library for pdfbox. You just create one class. you can set Image, Image Size, Image Zoom, Image location and so on... and finally You have template (with fake signature), with visible signature. then you can use that signature in order to create digital signature with visible signature too. Do you need that library? You can just set an image, and everything will be automatically. In this way there will not be templates (there will be created temporary template in memory, at the background, and that stream will be added automatically too). is this good way? what do you think? I can write that code. what do you think?

          Show
          vakhtang koroghlishvili added a comment - At the moment, As I know, if you need and digital signature with visible signature, you must have an stream of template with visible field (image, or text, or something like that). I've just written library for pdfbox. You just create one class. you can set Image, Image Size, Image Zoom, Image location and so on... and finally You have template (with fake signature), with visible signature. then you can use that signature in order to create digital signature with visible signature too. Do you need that library? You can just set an image, and everything will be automatically. In this way there will not be templates (there will be created temporary template in memory, at the background, and that stream will be added automatically too). is this good way? what do you think? I can write that code. what do you think?
          Hide
          Jeffrey Dare added a comment -

          Thanks a lot Thomas!!

          I was able to sign based on the example given. But is there an direct API in PDFBOX, that takes the certificate to sign without using the bouncy castle?

          Regards,
          Jeff

          Show
          Jeffrey Dare added a comment - Thanks a lot Thomas!! I was able to sign based on the example given. But is there an direct API in PDFBOX, that takes the certificate to sign without using the bouncy castle? Regards, Jeff
          Hide
          Thomas Chojecki added a comment -

          Hi Jeffrey,
          please look at the comment of the 4th. Apr. 2011. There you should find a link to an older implementation for signing with bouncy castle. Maybe someone can add this to the pdfbox example package.

          Best regards
          Thomas

          Show
          Thomas Chojecki added a comment - Hi Jeffrey, please look at the comment of the 4th. Apr. 2011. There you should find a link to an older implementation for signing with bouncy castle. Maybe someone can add this to the pdfbox example package. Best regards Thomas
          Hide
          Jeffrey Dare added a comment -

          Hi,

          Can someone please add an example, on how to digitally sign a pdf? I was not able to find it anywhere in the net. I have a certifate, i need to sign the pdf with it.

          Thanks for your help!
          Jeff

          Show
          Jeffrey Dare added a comment - Hi, Can someone please add an example, on how to digitally sign a pdf? I was not able to find it anywhere in the net. I have a certifate, i need to sign the pdf with it. Thanks for your help! Jeff
          Hide
          Fanny PUAUD added a comment -

          Great ! Thank you very much for the response.
          I will wait for the release !
          Best regards
          Fanny

          Show
          Fanny PUAUD added a comment - Great ! Thank you very much for the response. I will wait for the release ! Best regards Fanny
          Hide
          Thomas Chojecki added a comment -

          Yes, it will be available in the next release (1.6.0).

          I think the next release will be come very very soon. maybe this month.

          Best regards
          Thomas

          Show
          Thomas Chojecki added a comment - Yes, it will be available in the next release (1.6.0). I think the next release will be come very very soon. maybe this month. Best regards Thomas
          Hide
          Fanny PUAUD added a comment -

          Hello,
          Will this new feature be available in the next release 1.6.0 ?
          By the way, do you know when the next release will be available ? I didn't find the information
          Thank you for the response.
          Regards,
          Fanny Puaud

          Show
          Fanny PUAUD added a comment - Hello, Will this new feature be available in the next release 1.6.0 ? By the way, do you know when the next release will be available ? I didn't find the information Thank you for the response. Regards, Fanny Puaud
          Hide
          Adam Nichols added a comment -

          Thanks adding the license headers Andreas, and thanks for the feedback Thomas. I'll leave this open until we're sure that recompiling will fix your project, but it looks like this is pretty much all wrapped up.

          Show
          Adam Nichols added a comment - Thanks adding the license headers Andreas, and thanks for the feedback Thomas. I'll leave this open until we're sure that recompiling will fix your project, but it looks like this is pretty much all wrapped up.
          Hide
          Andreas Lehmkühler added a comment -

          I added the missing license header to 4 files in revision 1095174.

          The effected files were:

          org/apache/pdfbox/pdfwriter/COSFilterInputStream.java
          org/apache/pdfbox/pdmodel/interactive/digitalsignature/SignatureInterface.java
          org/apache/pdfbox/pdmodel/interactive/digitalsignature/SignatureOptions.java
          org/apache/pdfbox/pdfparser/VisualSignatureParser.java

          Show
          Andreas Lehmkühler added a comment - I added the missing license header to 4 files in revision 1095174. The effected files were: org/apache/pdfbox/pdfwriter/COSFilterInputStream.java org/apache/pdfbox/pdmodel/interactive/digitalsignature/SignatureInterface.java org/apache/pdfbox/pdmodel/interactive/digitalsignature/SignatureOptions.java org/apache/pdfbox/pdfparser/VisualSignatureParser.java
          Hide
          Thomas Chojecki added a comment -

          I checked out the head of the project and compile the pdfbox. Used this generated jar to test my sample and this work like a charm.
          Have tried also replacing my old pdfbox 1.3 snapshot from the signing implementation off the work with this one and this failed with "java.lang.IncompatibleClassChangeError" i think i need to recompile the project at work and test it.again.

          But for now adam i would like to thank you for including this changes to the head of the pdfbox i think this wasn't so easy

          my next plan is to solve some small open issues i found. i think this issue PDFBOX-912 can be closed for now and i will open another one for additional improvements.

          Show
          Thomas Chojecki added a comment - I checked out the head of the project and compile the pdfbox. Used this generated jar to test my sample and this work like a charm. Have tried also replacing my old pdfbox 1.3 snapshot from the signing implementation off the work with this one and this failed with "java.lang.IncompatibleClassChangeError" i think i need to recompile the project at work and test it.again. But for now adam i would like to thank you for including this changes to the head of the pdfbox i think this wasn't so easy my next plan is to solve some small open issues i found. i think this issue PDFBOX-912 can be closed for now and i will open another one for additional improvements.
          Hide
          Adam Nichols added a comment -

          ...and the files I forgot to commit in the first batch were committed by revision 1092858. I'm pretty sure I got all the changes for this task, and none of the changes I made for the conforming parser that I'm working on. I even checked out the head tag and made sure it compiled, all of the test cases passed, and used the jar file in the SignExample program to make sure everything works. I really don't want to be "the guy who broke the build"! If someone could just verify that I did everything properly, we can go ahead and mark this as closed.

          Show
          Adam Nichols added a comment - ...and the files I forgot to commit in the first batch were committed by revision 1092858. I'm pretty sure I got all the changes for this task, and none of the changes I made for the conforming parser that I'm working on. I even checked out the head tag and made sure it compiled, all of the test cases passed, and used the jar file in the SignExample program to make sure everything works. I really don't want to be "the guy who broke the build"! If someone could just verify that I did everything properly, we can go ahead and mark this as closed.
          Hide
          Adam Nichols added a comment -

          Updated/merged with the latest code locally, tested with the example linked above (thank you Thomas) and my custom pdfbox.jar lib, and verified the results using Adobe Reader 9 (Linux). Everything worked exactly as expected. Committed to CVS in revision 1092855

          Show
          Adam Nichols added a comment - Updated/merged with the latest code locally, tested with the example linked above (thank you Thomas) and my custom pdfbox.jar lib, and verified the results using Adobe Reader 9 (Linux). Everything worked exactly as expected. Committed to CVS in revision 1092855
          Hide
          Thomas Chojecki added a comment -

          The follow example demonstrate a simple sign as specified in the ISO32000-1:2008

          i zipped my eclipse example folder with all needed libs and a older pdfbox version 1.3 snapshot (can't actual build the last version, maven can't handle my last update)

          all needed sample-files, a self-signed pkcs12 keystore and other stuff you can find in the resources folder. all needed libraries are in the libs folder.

          Have fun

          http://media-nation.de/~rayman2200/PDFBox-SignExample.zip

          Show
          Thomas Chojecki added a comment - The follow example demonstrate a simple sign as specified in the ISO32000-1:2008 i zipped my eclipse example folder with all needed libs and a older pdfbox version 1.3 snapshot (can't actual build the last version, maven can't handle my last update) all needed sample-files, a self-signed pkcs12 keystore and other stuff you can find in the resources folder. all needed libraries are in the libs folder. Have fun http://media-nation.de/~rayman2200/PDFBox-SignExample.zip
          Hide
          Thomas Chojecki added a comment -

          Hi Adam,
          sry for the late response.

          I can't provide the original implementation due to the copyright. So i need to write a new implementation.

          I used for example the follow imports from BC:
          import org.bouncycastle.asn1.cms.Attribute;
          import org.bouncycastle.asn1.cms.AttributeTable;
          import org.bouncycastle.asn1.ess.ESSCertID;
          import org.bouncycastle.asn1.ess.ESSCertIDv2;
          import org.bouncycastle.asn1.ess.SigningCertificate;
          import org.bouncycastle.asn1.ess.SigningCertificateV2;
          import org.bouncycastle.asn1.x509.AlgorithmIdentifier;
          import org.bouncycastle.cms.CMSException;
          import org.bouncycastle.cms.CMSProcessableByteArray;
          import org.bouncycastle.cms.CMSSignedData;
          import org.bouncycastle.cms.CMSSignedDataGenerator;
          import org.bouncycastle.cms.SignerId;
          import org.bouncycastle.cms.SignerInformation;

          But the org.bouncycastle.asn1.ess.* imports are only need for CAdES / PAdES signatures. So from BC you need the BCProvider and the SMIME jar.

          Best regards
          Thomas

          PS:
          I hope i can provide next days a small example for basic signatures. Maybe a zip with all files (without libs) you need for doing a signature.

          Show
          Thomas Chojecki added a comment - Hi Adam, sry for the late response. I can't provide the original implementation due to the copyright. So i need to write a new implementation. I used for example the follow imports from BC: import org.bouncycastle.asn1.cms.Attribute; import org.bouncycastle.asn1.cms.AttributeTable; import org.bouncycastle.asn1.ess.ESSCertID; import org.bouncycastle.asn1.ess.ESSCertIDv2; import org.bouncycastle.asn1.ess.SigningCertificate; import org.bouncycastle.asn1.ess.SigningCertificateV2; import org.bouncycastle.asn1.x509.AlgorithmIdentifier; import org.bouncycastle.cms.CMSException; import org.bouncycastle.cms.CMSProcessableByteArray; import org.bouncycastle.cms.CMSSignedData; import org.bouncycastle.cms.CMSSignedDataGenerator; import org.bouncycastle.cms.SignerId; import org.bouncycastle.cms.SignerInformation; But the org.bouncycastle.asn1.ess.* imports are only need for CAdES / PAdES signatures. So from BC you need the BCProvider and the SMIME jar. Best regards Thomas PS: I hope i can provide next days a small example for basic signatures. Maybe a zip with all files (without libs) you need for doing a signature.
          Hide
          Adam Nichols added a comment -

          Thomas, would you be able to provide a sample signature class so we can run some end-to-end tests? I'd like to see what classes are being used from BC (i.e. CmsEnvelopedDataStreamGenerator, BinaryWriter, etc.).

          Show
          Adam Nichols added a comment - Thomas, would you be able to provide a sample signature class so we can run some end-to-end tests? I'd like to see what classes are being used from BC (i.e. CmsEnvelopedDataStreamGenerator, BinaryWriter, etc.).
          Hide
          Adam Nichols added a comment - - edited

          Reviewed the remaining patch files. VisualSignatureParser::skipToNextObj() looks like the loop will be inifinite if both if statements are false. Read in data, unread data, loop. Why unread it? Everything else seems to be okay.

          Can you give us a class which implements the signature interface and a class we can use to test signing and verifying the signatures? This would help for regression testing as well.

          Show
          Adam Nichols added a comment - - edited Reviewed the remaining patch files. VisualSignatureParser::skipToNextObj() looks like the loop will be inifinite if both if statements are false. Read in data, unread data, loop. Why unread it? Everything else seems to be okay. Can you give us a class which implements the signature interface and a class we can use to test signing and verifying the signatures? This would help for regression testing as well.
          Hide
          Thomas Chojecki added a comment - - edited

          Hi Adam and a happy new year
          1. The user can't pass a COSFIlterInputStream, because he didn't know the ByteRang to parametrize the Stream.
          2. Memory efficient isn't easy with the crypto. library BouncyCastle (BC). I can't find a method that accept InputStreams of the content that should be sign. Only a OutputStream is offert. Realy crappy, because the output is a small signature. Don't know how to fix this without modifying BC. Thats why i recommend to provide a additional jar file for the implementation, because there aren't a good way to use the code on cellphones or mobile devices.

          Optimization
          1. Good work, the RandomAccess File will help a lot in saving some memory.
          To look inside a pdf you didn't need to decompress it. Use a hex editor and look at the end for startxref and the offset [1]. You will find the offset and then you can jump to it and you will find the xref table [2].

          2. see the upper 2. section. But your right. the involved people can answer it better.

          3. documents with much objects are mostly linearized because the xref table grow and xrefstreams are compressed. also every web optimized document is linearized. The benefit is that the document is splitted in chunks and the browser didn't need to load the whole document for display the current page.

          to implement this xref stream, the developer need some knowledge about png kompression.

          Attachment
          [1] http://media-nation.de/~rayman2200/pdf1.png
          [2] http://media-nation.de/~rayman2200/pdf2.png

          Show
          Thomas Chojecki added a comment - - edited Hi Adam and a happy new year 1. The user can't pass a COSFIlterInputStream, because he didn't know the ByteRang to parametrize the Stream. 2. Memory efficient isn't easy with the crypto. library BouncyCastle (BC). I can't find a method that accept InputStreams of the content that should be sign. Only a OutputStream is offert. Realy crappy, because the output is a small signature. Don't know how to fix this without modifying BC. Thats why i recommend to provide a additional jar file for the implementation, because there aren't a good way to use the code on cellphones or mobile devices. Optimization 1. Good work, the RandomAccess File will help a lot in saving some memory. To look inside a pdf you didn't need to decompress it. Use a hex editor and look at the end for startxref and the offset [1] . You will find the offset and then you can jump to it and you will find the xref table [2] . 2. see the upper 2. section. But your right. the involved people can answer it better. 3. documents with much objects are mostly linearized because the xref table grow and xrefstreams are compressed. also every web optimized document is linearized. The benefit is that the document is splitted in chunks and the browser didn't need to load the whole document for display the current page. to implement this xref stream, the developer need some knowledge about png kompression. Attachment [1] http://media-nation.de/~rayman2200/pdf1.png [2] http://media-nation.de/~rayman2200/pdf2.png
          Hide
          Adam Nichols added a comment -

          Danke, und glückliches neues Jahr

          Re: don't pass the int[] to the signature interface, it'll make it more complex than necessary
          1. In retrospect, I agree that passing in the int[] seems like a bit much. But we could pass in a COSFilterInputStream instead, this keeps it simple since they can just read from the stream and not be bothered by the fact that it is split into multiple chunks.
          2. The reason to pass it in as a stream are because it is more memory efficient. Some devices simply don't have the memory available, and the ones that do have lots of memory don't want to use the memory unnecessarily.

          Optimization
          1. I've started working on a conforming parser which will create a RandomAccess object. The current problem I'm having is I can't figure out what the PDF spec means when it says "the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section." in reference to startxref (section 7.5.5). I tried using various tools to decompress PDFs, but the startxref value doesn't match up to the offset there the xref table starts. Once I figure out what needs to be decoded and how to get that offset, I should be able to get moving quickly on the conforming parser. It's just that this one little problem is really holding me up.
          2. I'd imagine so, but there's not much code there and it's only called if you're actually signing or verifying a signature, so there's very little overhead by keeping it within PDFBox. I remember there was talk about splitting up PDFBox into different sections so it'd be smaller (mainly for devices with limited memory like cellphones). The people involved in that would be the best ones to answer this.
          3. Good to know. Are linearized documents very common? I don't think I've run into them yet. Either way, it's better to have limited support for signatures than no support.

          Show
          Adam Nichols added a comment - Danke, und glückliches neues Jahr Re: don't pass the int[] to the signature interface, it'll make it more complex than necessary 1. In retrospect, I agree that passing in the int[] seems like a bit much. But we could pass in a COSFilterInputStream instead, this keeps it simple since they can just read from the stream and not be bothered by the fact that it is split into multiple chunks. 2. The reason to pass it in as a stream are because it is more memory efficient. Some devices simply don't have the memory available, and the ones that do have lots of memory don't want to use the memory unnecessarily. Optimization 1. I've started working on a conforming parser which will create a RandomAccess object. The current problem I'm having is I can't figure out what the PDF spec means when it says "the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section." in reference to startxref (section 7.5.5). I tried using various tools to decompress PDFs, but the startxref value doesn't match up to the offset there the xref table starts. Once I figure out what needs to be decoded and how to get that offset, I should be able to get moving quickly on the conforming parser. It's just that this one little problem is really holding me up. 2. I'd imagine so, but there's not much code there and it's only called if you're actually signing or verifying a signature, so there's very little overhead by keeping it within PDFBox. I remember there was talk about splitting up PDFBox into different sections so it'd be smaller (mainly for devices with limited memory like cellphones). The people involved in that would be the best ones to answer this. 3. Good to know. Are linearized documents very common? I don't think I've run into them yet. Either way, it's better to have limited support for signatures than no support.
          Hide
          Thomas Chojecki added a comment -

          first of all, merry xmas and a happy new year.
          sry for the late answer. i'm relocated this month and still waiting for my internet.

          @ Adam Nichols added a comment - 17/Dec/10 05:26 AM
          to pass the int[] isn't a good idea.
          1. the signature interface should be abstract as possible. so it should help the user to implement a crypto library as easy as possible.
          2. the byterange can only be calculated inside the pdfbox, why inform the sign library about the byterange and make it harder to implement a filterinputstream. for each implementation.

          Adam Nichols added a comment - 23/Dec/10 11:14 AM
          setConents(new byte[...]) is set to the large size because we used the implementation for some signature cards that used a large certificate chain.
          the amount of bytes can be set lower or a better way is to calculate the certificate chain size before writing it down into the signature.

          a other way is to give the pdfbox a size for the signature via the class signatureoptions. so the user can set his own size.

          Sry for the german comments inside the code. there is so much code i can't handle it self without comments and the easiest and fastest way are german comments for me But i will do some english comments for new code.

          ps: your german seams to be good

          Optimization:
          1. The method saveIncremental(...) should accept the same param as the save() method. The cause why i used a input and output stream for params is, because i need to read the whole file again for writing the signature on the right place. There is no way to mix a input/outputstream so maybe a random access file is a better solution.

          2. can the signature implemention be an own jar file? not every person need this implementation for the pdfbox. the incremental update i thing will not working in the regular code at all.

          3. the signature implementation doesn't work with encrypted or with linearized documents. for linearized documents there need to be a xrefstream writer as well. encrypted documents need to be decrypted before signing it.

          hope you can understand my english got my secound xmas drink (glühwein / eierpunsch) now and can hardly write.

          Show
          Thomas Chojecki added a comment - first of all, merry xmas and a happy new year. sry for the late answer. i'm relocated this month and still waiting for my internet. @ Adam Nichols added a comment - 17/Dec/10 05:26 AM to pass the int[] isn't a good idea. 1. the signature interface should be abstract as possible. so it should help the user to implement a crypto library as easy as possible. 2. the byterange can only be calculated inside the pdfbox, why inform the sign library about the byterange and make it harder to implement a filterinputstream. for each implementation. Adam Nichols added a comment - 23/Dec/10 11:14 AM setConents(new byte [...] ) is set to the large size because we used the implementation for some signature cards that used a large certificate chain. the amount of bytes can be set lower or a better way is to calculate the certificate chain size before writing it down into the signature. a other way is to give the pdfbox a size for the signature via the class signatureoptions. so the user can set his own size. Sry for the german comments inside the code. there is so much code i can't handle it self without comments and the easiest and fastest way are german comments for me But i will do some english comments for new code. ps: your german seams to be good Optimization: 1. The method saveIncremental(...) should accept the same param as the save() method. The cause why i used a input and output stream for params is, because i need to read the whole file again for writing the signature on the right place. There is no way to mix a input/outputstream so maybe a random access file is a better solution. 2. can the signature implemention be an own jar file? not every person need this implementation for the pdfbox. the incremental update i thing will not working in the regular code at all. 3. the signature implementation doesn't work with encrypted or with linearized documents. for linearized documents there need to be a xrefstream writer as well. encrypted documents need to be decrypted before signing it. hope you can understand my english got my secound xmas drink (glühwein / eierpunsch) now and can hardly write.
          Hide
          Adam Nichols added a comment -

          Reviewed COSDocument and PDDocument. In PDDocument there's a line that says "sigObject.setContents(new byte[0x2500 * 2 + 2]);" which uses up a large amount of memory, which simply isn't going to work on systems with limited memory available (e.g. cellphones, tablet computers, or perhaps even netbooks, laptops, desktops and servers). I need to read more from the spec to figure out a way to make this into a stream as opposed to holding everything in memory all at once. Once I figure out how to use streams for signing, I'll also be converting the comments to English so more people can read them (Mein Deutsch ist nicht so gut ).

          Show
          Adam Nichols added a comment - Reviewed COSDocument and PDDocument. In PDDocument there's a line that says "sigObject.setContents(new byte [0x2500 * 2 + 2] );" which uses up a large amount of memory, which simply isn't going to work on systems with limited memory available (e.g. cellphones, tablet computers, or perhaps even netbooks, laptops, desktops and servers). I need to read more from the spec to figure out a way to make this into a stream as opposed to holding everything in memory all at once. Once I figure out how to use streams for signing, I'll also be converting the comments to English so more people can read them (Mein Deutsch ist nicht so gut ).
          Hide
          Adam Nichols added a comment -

          Just going down the list of patched files... SignatureException looks fine.

          Show
          Adam Nichols added a comment - Just going down the list of patched files... SignatureException looks fine.
          Hide
          Adam Nichols added a comment -

          I looked over the changes to COSWriter and COSWriterXRefEntry and everything looks OK except for one thing which bothers me. In doWriteSignature(COSDocument doc) there is the following line: byte[] pdfContent = bytes.toByteArray(); This is going to consume a great deal of memory if the PDF is large. I see that's passed into SignatureInterface, so it seems that we should be able to pass the input stream along with the integer array which is used by the COSFilterInputStream to the SignatureInterface and then just read the data in there a chunk at a time. Of course this would change the API on the SignatureInterface, but seeing as how signing non-sequential sections seems to be a common (reference the ByteRange key on page 468 of ISO32000-1:2008), this seems like a reasonable thing to do. If anyone has any comments either for or against changing the SignatureInterface to take an InputStream and int[], speak now or forever hold your peace.

          I made trivial updates such as removing code which was added but commented out, expanded on some comments to reference sections in the PDF spec (ISO32000-1:2008), changed the indenting for the new code to be 4 spaces (to be consistent with the old code), and other non-functional changes. I'll continue to review each line of these enhancements as I have time. I'm going to hold off on committing everything until everything it reviewed, then I'll commit it all at once (assuming someone else doesn't beat me to it).

          Show
          Adam Nichols added a comment - I looked over the changes to COSWriter and COSWriterXRefEntry and everything looks OK except for one thing which bothers me. In doWriteSignature(COSDocument doc) there is the following line: byte[] pdfContent = bytes.toByteArray(); This is going to consume a great deal of memory if the PDF is large. I see that's passed into SignatureInterface, so it seems that we should be able to pass the input stream along with the integer array which is used by the COSFilterInputStream to the SignatureInterface and then just read the data in there a chunk at a time. Of course this would change the API on the SignatureInterface, but seeing as how signing non-sequential sections seems to be a common (reference the ByteRange key on page 468 of ISO32000-1:2008), this seems like a reasonable thing to do. If anyone has any comments either for or against changing the SignatureInterface to take an InputStream and int[], speak now or forever hold your peace. I made trivial updates such as removing code which was added but commented out, expanded on some comments to reference sections in the PDF spec (ISO32000-1:2008), changed the indenting for the new code to be 4 spaces (to be consistent with the old code), and other non-functional changes. I'll continue to review each line of these enhancements as I have time. I'm going to hold off on committing everything until everything it reviewed, then I'll commit it all at once (assuming someone else doesn't beat me to it).
          Hide
          Andreas Lehmkühler added a comment -

          IMO we need to run some more tests before we can include that stuff.

          Show
          Andreas Lehmkühler added a comment - IMO we need to run some more tests before we can include that stuff.
          Hide
          Thomas Chojecki added a comment -

          Ok sry, woking only with ITIL Jira and there it should be send to the product manager first

          The pdfbox architecture isn't a good base for incremental updates. The posted solution need to be updated for real incremental updates.
          The COSWriter need a starting object, i the case of signature it is the root dict. and then following by the fields, annotation and sig. dict.

          I didn't test it with document updates like adding a new page incrementally. It can work if all objects between the root and the new object are marked with the method needToBeUpdate() (this method indicate that an existing object was modified and need to be wrote again in the incremental update), but i haven't much hope this will work right now. the worst case is, that no update will be done at all.

          The next thing is, the whole document need to be read fully before doing a incremental update. this wouldn't save memory. This is so, because we need to know, which object is new and need to be write and which is old and should be ignored.

          The whole code is only optimized for signing.

          The main reason for excluding the signing stuff is, there are many crypto librarys out there and some people don't want to use only open source librarys like bouncy castle for there signing solution. maybe it would be a good subproject.

          Please wait till next day and please implement it on a new branch for the first time. The code need some optimization for a real stable and good standing.

          i will comment some things tomorow but need sleep right now.

          Show
          Thomas Chojecki added a comment - Ok sry, woking only with ITIL Jira and there it should be send to the product manager first The pdfbox architecture isn't a good base for incremental updates. The posted solution need to be updated for real incremental updates. The COSWriter need a starting object, i the case of signature it is the root dict. and then following by the fields, annotation and sig. dict. I didn't test it with document updates like adding a new page incrementally. It can work if all objects between the root and the new object are marked with the method needToBeUpdate() (this method indicate that an existing object was modified and need to be wrote again in the incremental update), but i haven't much hope this will work right now. the worst case is, that no update will be done at all. The next thing is, the whole document need to be read fully before doing a incremental update. this wouldn't save memory. This is so, because we need to know, which object is new and need to be write and which is old and should be ignored. The whole code is only optimized for signing. The main reason for excluding the signing stuff is, there are many crypto librarys out there and some people don't want to use only open source librarys like bouncy castle for there signing solution. maybe it would be a good subproject. Please wait till next day and please implement it on a new branch for the first time. The code need some optimization for a real stable and good standing. i will comment some things tomorow but need sleep right now.
          Hide
          Adam Nichols added a comment -

          We'll mark the issue as resolved when the code is committed to SVN. Leaving it as open reminds us that we still needs to get these patches committed.

          I don't have time to look into all the changes right now, but I'm looking forward to the incremental support. That should help reduce memory usage when merging two documents together (I'd expect roughly a 50% reduction in memory used if both files are the same size, since the first file wouldn't require any modifications at all).

          I'm also looking forward to seeing how the signing is done and seeing if we can have an implementation included in PDFBox instead of just an interface. It doesn't really make sense for each of us to implement the same functions outside of PDFBox on our own. Obviously not all possible methods of signing, but at least some common ones.

          Show
          Adam Nichols added a comment - We'll mark the issue as resolved when the code is committed to SVN. Leaving it as open reminds us that we still needs to get these patches committed. I don't have time to look into all the changes right now, but I'm looking forward to the incremental support. That should help reduce memory usage when merging two documents together (I'd expect roughly a 50% reduction in memory used if both files are the same size, since the first file wouldn't require any modifications at all). I'm also looking forward to seeing how the signing is done and seeing if we can have an implementation included in PDFBox instead of just an interface. It doesn't really make sense for each of us to implement the same functions outside of PDFBox on our own. Obviously not all possible methods of signing, but at least some common ones.
          Hide
          Thomas Chojecki added a comment -

          Forgot to comment.

          This code was used in beta and productive environment based on pdfbox 1.2 snapshot for about 3 month and working just fine.

          I have added some sample files. original and signed.

          The keystore that was used for this signature is outdated but working fine.

          Show
          Thomas Chojecki added a comment - Forgot to comment. This code was used in beta and productive environment based on pdfbox 1.2 snapshot for about 3 month and working just fine. I have added some sample files. original and signed. The keystore that was used for this signature is outdated but working fine.

            People

            • Assignee:
              Adam Nichols
              Reporter:
              Thomas Chojecki
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development