Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1613

The ability to inject the time/random component into the COSWriter process to write a PDF document allows some advanced signature creation scenarios where the signature is generated on a separate server that does not hold the full PDF document.

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.8.1
    • 1.8.3, 2.0.0
    • Writing
    • Any

    Description

      I have developed a prototype server based signing service for the Swedish National eID infrastructure.
      I'll skip the details, but I recently switched to PDFBox for the PDF signing process and it works great. However, I had to modify the COSWriter class to get this working.

      I'm writing to check whether you would consider adding the functionality I need to future version of PDFBox.

      The problem is the the signature service is just producing the signature, it is not trusted to handle the PDF document.
      The government service having the PDF document signed is using PDFBox in a 2 step process.

      1) To produce the SignedAttributes DER Object of the CMS signature to be created. This is the part that is hashed and signed by the signature service.
      2) After receiving the signature and signature certs from the signature service, completing the PDF signature by delivering the complete PKCS#7 object to PDFBox using the externally generated signature value and certs.

      There are probably a more pure way to handle this, but Since PDFBox allows me to create a signature interface that produces the SignedData. I found it to be the easiest way to run the signature process 2 times.
      1st pass using dummy key and dummy certs. This only to obtain the SignedAttributes.
      2nd pass by delivering a SignedData object that include the Signature value and certs produced by the signature service.

      Now in order to do this, I have to control the random seed added by the COSWriter, or else the signature created by the signature service will not match the hash in the SignedAttributes produced in the second pass.

      My modification is provided below.
      I simply provided an extra input parameter to the write function where I can provide the long seed

      I then added a backwards compatible write function where the long seed is current time.

      By providing the same seed to pass 1 and pass 2, I can get the externally created signature to match the SignedAttributes produced in the first pass.
      The write function below is identical to the original COSWriter function except that it takes the idTime value from the function input parameter instead of getting it from System.currentTimeMillis().

      Modified functions of COSWriter:

      /**

      • This will write the pdf document.
        *
      • @param doc The document to write.
        *
      • @throws COSVisitorException If an error occurs while generating the data.
        */
        public void write(PDDocument doc) throws COSVisitorException { write(doc, System.currentTimeMillis()); }

      /**

      • This will write the pdf document.
        *
      • @param doc The document to write.
      • @param idTime The time seed used to generate the id
        *
      • @throws COSVisitorException If an error occurs while generating the data.
        */
        public void write(PDDocument doc, long idTime) throws COSVisitorException {
        document = doc;
        if (incrementalUpdate) { prepareIncrement(doc); }

      // if the document says we should remove encryption, then we shouldn't encrypt
      if (doc.isAllSecurityToBeRemoved())

      { this.willEncrypt = false; // also need to get rid of the "Encrypt" in the trailer so readers // don't try to decrypt a document which is not encrypted COSDocument cosDoc = doc.getDocument(); COSDictionary trailer = cosDoc.getTrailer(); trailer.removeItem(COSName.ENCRYPT); }

      else {
      SecurityHandler securityHandler = document.getSecurityHandler();
      if (securityHandler != null) {
      try

      { securityHandler.prepareDocumentForEncryption(document); this.willEncrypt = true; }

      catch (IOException e)

      { throw new COSVisitorException(e); } catch (CryptographyException e) { throw new COSVisitorException(e); }

      } else

      { this.willEncrypt = false; }

      }

      COSDocument cosDoc = document.getDocument();
      COSDictionary trailer = cosDoc.getTrailer();
      COSArray idArray = (COSArray) trailer.getDictionaryObject(COSName.ID);
      if (idArray == null || incrementalUpdate) {
      try {

      //algorithm says to use time/path/size/values in doc to generate
      //the id. We don't have path or size, so do the best we can
      MessageDigest md = MessageDigest.getInstance("MD5");

      md.update(Long.toString(idTime).getBytes("ISO-8859-1"));
      COSDictionary info = (COSDictionary) trailer.getDictionaryObject(COSName.INFO);
      if (info != null) {
      Iterator<COSBase> values = info.getValues().iterator();
      while (values.hasNext())

      { md.update(values.next().toString().getBytes("ISO-8859-1")); }

      }
      idArray = new COSArray();
      COSString id = new COSString(md.digest());
      idArray.add(id);
      idArray.add(id);
      trailer.setItem(COSName.ID, idArray);
      } catch (NoSuchAlgorithmException e)

      { throw new COSVisitorException(e); } catch (UnsupportedEncodingException e) { throw new COSVisitorException(e); }

      }
      cosDoc.accept(this);
      }

      Finally. The way I use this in my signature process is by using this altered static function saveIncremental from the PDFDocument class.
      Since this function is static, I just call this duplicated function instead of the one in the PDFDocument class.
      Here I use my altered COSWriter (CsCOSWriter).

      /**

      • Save the pdf as incremental. This method is a modification of the same
      • method of PDDcoument. This method use an altered COSWriter that allows
      • control over the time used to create the ID of the document. This way it
      • is possible to perform two consecutive signature generation passes that
      • produce the same document hash.
        *
      • @param doc The document being written with signature creation
      • @param input An input file stream of the document being written
      • @param output An output file stream for the result document
      • @param idTime The time in milliseconds from Jan 1st, 1970 GMT when the
      • signature is created. This time is also used to calculate the ID of the
      • document.
      • @throws IOException if something went wrong
      • @throws COSVisitorException if something went wrong
        */
        public static void saveIncremental(PDDocument doc, FileInputStream input, OutputStream output, long idTime) throws IOException, COSVisitorException {
        //update the count in case any pages have been added behind the scenes.
        doc.getDocumentCatalog().getPages().updateCount();
        CsCOSWriter writer = null;
        try { // Sometimes the original file will be missing a newline at the end // In order to avoid having %%EOF the first object on the same line // as the %%EOF, we put a newline here. If there's already one at // the end of the file, an extra one won't hurt. PDFBOX-1051 output.write("\r\n".getBytes()); writer = new CsCOSWriter(output, input); writer.write(doc, idTime); writer.close(); }

        finally

        Unknown macro: { if (writer != null) { writer.close(); } }

        }

      Attachments

        Issue Links

          Activity

            Fixed in revision 1488049

            I've done some changes to the code. The PDDocument got a new getter and setter setDocumentId(Long id). If this value isn't null, it will be used for the id generation in the COSWriter. So it will be more intuitive to use (I hope)

            If this isn't what you are searching for, then let me know I will give you also a replay to your mail, please stay patient.

            tchojecki Thomas Chojecki added a comment - Fixed in revision 1488049 I've done some changes to the code. The PDDocument got a new getter and setter setDocumentId(Long id). If this value isn't null, it will be used for the id generation in the COSWriter. So it will be more intuitive to use (I hope) If this isn't what you are searching for, then let me know I will give you also a replay to your mail, please stay patient.

            Hmm,

            It will work, but I'm not sure this is a desirable way to do it.

            Let me explain, then I let you decide what to do.

            The default behaviour is that each time you create a document of the
            PDDocument object, you will have unique ID.
            My intention was not to change that.

            If you set the documentID, you practically turn off that behaviour and
            from now on the PDDocument object will use a static ID.
            If this PDDocument object is used over time for some reason as the
            document source, it will always use the same ID.

            I don't know if this may lead to any undesirable and unintended effects to
            other users of PDDocument. But perhaps not.

            So what I asked for was just a possibility to inject the ID to the
            document writing process at the instance of writing, when and only if that
            makes sense.
            Just consider that and do what you feel is best. I can certainly live with
            the update as you have done it.

            /Stefan

            razumain Stefan Santesson added a comment - Hmm, It will work, but I'm not sure this is a desirable way to do it. Let me explain, then I let you decide what to do. The default behaviour is that each time you create a document of the PDDocument object, you will have unique ID. My intention was not to change that. If you set the documentID, you practically turn off that behaviour and from now on the PDDocument object will use a static ID. If this PDDocument object is used over time for some reason as the document source, it will always use the same ID. I don't know if this may lead to any undesirable and unintended effects to other users of PDDocument. But perhaps not. So what I asked for was just a possibility to inject the ID to the document writing process at the instance of writing, when and only if that makes sense. Just consider that and do what you feel is best. I can certainly live with the update as you have done it. /Stefan

            Maybe I miss a use case or it was just late and I make a mistake.

            Here a part of the code from COSWriter
            Long idTime = doc.getDocumentId() == null ? System.currentTimeMillis() : doc.getDocumentId();

            if documentId was not set (default case), we use the the current timestamp, otherwise we use the documentId that the user set.
            The document id will only be fix if someone set the documentId through the setter, otherwise it will be always the timestamp.

            The normal use case will be, creating or loading a pdf. After alter the document and saving it, a new ID will be generated each time.
            The only case where the ID stay fix is if the user set this Id through the setter and save the document. Normally a user will save the document once. After this maybe it will open the document once again and create a new PDDocument object which will use the default behavior (current timestamp) and not the fixed one.

            If a user using the PDDocument object more than one time for saving (don't know a reason for this) he can set the documentId to null ( #setDocumentId(null) ) so he will get the old behavior. I think this setter turn on a specific feature, so the user will know what he is doing.

            Please correct me if I'm wrong. I have no feeling if this is good or bad, but it makes it possible using only the convenience classes to set an Id.

            Your original code inject the Id in the COSWriter which is a low level api call. Normally a user do not trigger the COSWriter.write(pdDocument) directly. For this, he will need to understand what happens behind the scene ( #saveIncremental(....))

            Best regards
            Thomas

            tchojecki Thomas Chojecki added a comment - Maybe I miss a use case or it was just late and I make a mistake. Here a part of the code from COSWriter Long idTime = doc.getDocumentId() == null ? System.currentTimeMillis() : doc.getDocumentId(); if documentId was not set (default case), we use the the current timestamp, otherwise we use the documentId that the user set. The document id will only be fix if someone set the documentId through the setter, otherwise it will be always the timestamp. The normal use case will be, creating or loading a pdf. After alter the document and saving it, a new ID will be generated each time. The only case where the ID stay fix is if the user set this Id through the setter and save the document. Normally a user will save the document once. After this maybe it will open the document once again and create a new PDDocument object which will use the default behavior (current timestamp) and not the fixed one. If a user using the PDDocument object more than one time for saving (don't know a reason for this) he can set the documentId to null ( #setDocumentId(null) ) so he will get the old behavior. I think this setter turn on a specific feature, so the user will know what he is doing. Please correct me if I'm wrong. I have no feeling if this is good or bad, but it makes it possible using only the convenience classes to set an Id. Your original code inject the Id in the COSWriter which is a low level api call. Normally a user do not trigger the COSWriter.write(pdDocument) directly. For this, he will need to understand what happens behind the scene ( #saveIncremental(....)) Best regards Thomas

            Great,

            I accept your reasoning.

            What I was unsure about was this: "If a user using the PDDocument object
            more than one time for saving (don't know a reason for this)".

            I wasn't sure whether some implementations did that or not. I don't know
            the product as well as you do.
            Hearing you say that this is not a normal use case makes the issue go away.

            I'm happy with the update.

            /Stefan

            razumain Stefan Santesson added a comment - Great, I accept your reasoning. What I was unsure about was this: "If a user using the PDDocument object more than one time for saving (don't know a reason for this)". I wasn't sure whether some implementations did that or not. I don't know the product as well as you do. Hearing you say that this is not a normal use case makes the issue go away. I'm happy with the update. /Stefan

            Merged into 1.8-branch in revision 1542711

            lehmi Andreas Lehmkühler added a comment - Merged into 1.8-branch in revision 1542711

            Closed after releasing 1.8.3

            lehmi Andreas Lehmkühler added a comment - Closed after releasing 1.8.3

            People

              tchojecki Thomas Chojecki
              razumain Stefan Santesson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: