PDFBox
  1. PDFBox
  2. PDFBOX-1133

Refactoring PDFParser.parseHeader() method

    Details

    • Type: Improvement Improvement
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0, 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: Parsing

      Description

      Re-factoring the method parseHeader() method to support an extra header declaration : "%CSO-".

      This header is foundable into some XFDF document, when the stamp appearance stream is define as complete COSDocument. This special document use "%CSO-1.0" as header declaration.

      So I purpose to enhance the PDFReader to be able to parse this kind of document.

      Pierre Huttin

      1. PDFParser.java.patch
        7 kB
        Pierre Huttin
      2. sample.cdata_decoded.cos
        7 kB
        Pierre Huttin
      3. sample.xfdf
        8 kB
        Pierre Huttin

        Activity

        Hide
        Pierre Huttin added a comment -

        patch proposal for PDF Reader enhancement

        Show
        Pierre Huttin added a comment - patch proposal for PDF Reader enhancement
        Hide
        Andreas Lehmkühler added a comment -

        Please, provide us with a sample file.

        Show
        Andreas Lehmkühler added a comment - Please, provide us with a sample file.
        Hide
        Pierre Huttin added a comment -

        Sample document that use CSO document in it.

        Show
        Pierre Huttin added a comment - Sample document that use CSO document in it.
        Hide
        Pierre Huttin added a comment -

        The file sample.xfdf is the xfdf file and the file sample .cdata_decoded.cos is the content of the member CDATA element from xfdf document. To decoded it I just convert the hex string to a byte array and inflate the byte array.

        This file is generated by a comment server library from adobe named manhattan_core.jar

        Show
        Pierre Huttin added a comment - The file sample.xfdf is the xfdf file and the file sample .cdata_decoded.cos is the content of the member CDATA element from xfdf document. To decoded it I just convert the hex string to a byte array and inflate the byte array. This file is generated by a comment server library from adobe named manhattan_core.jar
        Hide
        John Hewson added a comment -

        An FDF appearance stream is not a PDF file, and PDFParser should not treat it as such, so this patch should not be applied. Without a description of what the actual problem is, I can't propose anything better, so I'm just going to close this issue.

        Show
        John Hewson added a comment - An FDF appearance stream is not a PDF file, and PDFParser should not treat it as such, so this patch should not be applied. Without a description of what the actual problem is, I can't propose anything better, so I'm just going to close this issue.
        Hide
        Pierre Huttin added a comment -

        I'm agree with the fact an FDF stamp appearance stream is not a real "PDF file", but it could be a complete COSDocument, like in the sample I've attached to this issue. The problem here we are missing in PDFBox a real COSDocument parser, usually it's when we use dynamic stamp as the standards provided by adobe reader like: Approved by, Reviewed+ date, etc.. and also if you add some custom stamps Adobe Reader request to import them as PDF, and finally inclue the content of the pdf file into the appearance attribute of the stamp.

        Show
        Pierre Huttin added a comment - I'm agree with the fact an FDF stamp appearance stream is not a real "PDF file", but it could be a complete COSDocument, like in the sample I've attached to this issue. The problem here we are missing in PDFBox a real COSDocument parser, usually it's when we use dynamic stamp as the standards provided by adobe reader like: Approved by, Reviewed+ date, etc.. and also if you add some custom stamps Adobe Reader request to import them as PDF, and finally inclue the content of the pdf file into the appearance attribute of the stamp.
        Hide
        John Hewson added a comment -

        I'm reopening this issue as an "AcroForm" issue, as based on your comments we could handle FDF import in PDFBox, even if we don't use this specific patch.

        Show
        John Hewson added a comment - I'm reopening this issue as an "AcroForm" issue, as based on your comments we could handle FDF import in PDFBox, even if we don't use this specific patch.
        Hide
        Maruan Sahyoun added a comment -

        Andreas Lehmkühler I’ve assigned it to you as there is a parsing question wrt to the header. Other than that I’d think that could be moved to a later release potentially with looking at how we generate/handle Annotations which this feature relates to.

        Show
        Maruan Sahyoun added a comment - Andreas Lehmkühler I’ve assigned it to you as there is a parsing question wrt to the header. Other than that I’d think that could be moved to a later release potentially with looking at how we generate/handle Annotations which this feature relates to.

          People

          • Assignee:
            Andreas Lehmkühler
            Reporter:
            Pierre Huttin
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development