PDFBox
  1. PDFBox
  2. PDFBOX-1056

Integration of a PDF/A validator in PDFBox

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.7.0
    • Component/s: Utilities
    • Labels:
      None

      Description

      We (Atos Worldline) donate our PDF/A validator to the PDFBox project. This product is based on PDFBox and a javacc parser. Before this donation, the product was already distributed under Apache Licence 2. Its current name is padaf.
      Padaf complies the isartor test suite.
      This version depends on standard PDFBox 1.5.0 version. Only one test class does not compile with current HEAD (on 27 of june), all other test cases pass.

      Padaf is composed of 2 modules :

      • preflight : the validator
      • xmpbox : an other implementation of xmp parser and writer. We make that choice because we did not have the time to propose all necessary modification in jempbox.

      The attached tar ball contains :

      • sources of the 2 modules
      • junit tests for each one
      • a parent (that will soon disappear) already depending on pdfbox-parent

      These are SHA1 for each attached file
      b9bb323fa73e1416a8b282fe2a687cebf1ac2145 padaf-apache.tgz
      e9e5fb05105799b9884be0ae6c060323aed3211a pdfbox160.patch

      1. stax.patch
        0.6 kB
        Guillaume Bailleul
      2. rat-report.txt
        47 kB
        Andreas Lehmkühler
      3. pdfbox160.patch
        4 kB
        Guillaume Bailleul
      4. padaf-apache.tgz
        239 kB
        Guillaume Bailleul

        Activity

        Hide
        Guillaume Bailleul added a comment -

        patch on padaf-apache to make it work with PDFBox 1.6.0

        Show
        Guillaume Bailleul added a comment - patch on padaf-apache to make it work with PDFBox 1.6.0
        Hide
        Andreas Lehmkühler added a comment -

        Please add a MD5 or a SHA1 checksum for the tarball so that everybody can check its integrity.

        Show
        Andreas Lehmkühler added a comment - Please add a MD5 or a SHA1 checksum for the tarball so that everybody can check its integrity.
        Hide
        Andreas Lehmkühler added a comment -

        I've started with checking the poms to see if there are any problematic dependencies because of their license and I've found two!

        <dependency>
        <groupId>com.lowagie</groupId>
        <artifactId>itext</artifactId>
        <version>2.1.7</version>
        <scope>test</scope>
        </dependency>

        I couldn't find any code which uses itext. Did I miss something or is this dependency just obsolet?

        <dependency>
        <groupId>org.pdfa</groupId>
        <artifactId>isartor</artifactId>
        <version>1.0-20080813</version>
        <scope>test</scope>
        </dependency>

        We have to ensure that this artifact won't be a part of the released software, as the included test files are not available under a suitable license.

        Show
        Andreas Lehmkühler added a comment - I've started with checking the poms to see if there are any problematic dependencies because of their license and I've found two! <dependency> <groupId>com.lowagie</groupId> <artifactId>itext</artifactId> <version>2.1.7</version> <scope>test</scope> </dependency> I couldn't find any code which uses itext. Did I miss something or is this dependency just obsolet? <dependency> <groupId>org.pdfa</groupId> <artifactId>isartor</artifactId> <version>1.0-20080813</version> <scope>test</scope> </dependency> We have to ensure that this artifact won't be a part of the released software, as the included test files are not available under a suitable license.
        Hide
        Guillaume Bailleul added a comment -

        Dependency on itext is obsolete.
        It was only used for test at the beginning of the project to check some weird behaviours with pdfbox 0.8; we wanted not to generate and test PDF with the same product.
        I just built the project without this dependency, it worked. So it can be removed.

        Dependency isartor is an artefact we did containing the isartor test pdf/a.
        The only class using it is a junit test : org.apache.padaf.preflight.TestIsartorValidationFromClasspath
        I think the good way is to remove the dependency and the junit test class.
        I will propose a new version of the test, with a download of external resources as it is done for some other stuffs in PDFBox generation.
        What do you think of that ?

        Show
        Guillaume Bailleul added a comment - Dependency on itext is obsolete. It was only used for test at the beginning of the project to check some weird behaviours with pdfbox 0.8; we wanted not to generate and test PDF with the same product. I just built the project without this dependency, it worked. So it can be removed. Dependency isartor is an artefact we did containing the isartor test pdf/a. The only class using it is a junit test : org.apache.padaf.preflight.TestIsartorValidationFromClasspath I think the good way is to remove the dependency and the junit test class. I will propose a new version of the test, with a download of external resources as it is done for some other stuffs in PDFBox generation. What do you think of that ?
        Hide
        Andreas Lehmkühler added a comment -

        I had a similar idea. We should make the test optional and every developer could download them if needed. We must ensure that these files won't be part of our releases. We can discuss the details once the source is in our svn.

        Show
        Andreas Lehmkühler added a comment - I had a similar idea. We should make the test optional and every developer could download them if needed. We must ensure that these files won't be part of our releases. We can discuss the details once the source is in our svn.
        Hide
        Guillaume Bailleul added a comment -

        Here are piece of information on sha1 software :

        I used sha1sum on a fedora 13 Operating system

        [yugui@desktop ~]$ sha1sum --v
        sha1sum (GNU coreutils) 8.4
        Copyright (C) 2010 Free Software Foundation, Inc.
        License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
        This is free software: you are free to change and redistribute it.
        There is NO WARRANTY, to the extent permitted by law.

        Written by Ulrich Drepper, Scott Miller, and David Madore.

        [yugui@desktop ~]$ uname -a
        Linux desktop 2.6.34.7-56.fc13.x86_64 #1 SMP Wed Sep 15 03:36:55 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

        Show
        Guillaume Bailleul added a comment - Here are piece of information on sha1 software : I used sha1sum on a fedora 13 Operating system [yugui@desktop ~] $ sha1sum --v sha1sum (GNU coreutils) 8.4 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later < http://gnu.org/licenses/gpl.html >. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Ulrich Drepper, Scott Miller, and David Madore. [yugui@desktop ~] $ uname -a Linux desktop 2.6.34.7-56.fc13.x86_64 #1 SMP Wed Sep 15 03:36:55 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
        Hide
        Andreas Lehmkühler added a comment -

        I checked all files for suitable license headers using RAT [1]. Find attached the resulting rat-report. Everything likes fine. There are only a few files left which can be updated after the checkin.

        [1] http://incubator.apache.org/rat/

        Show
        Andreas Lehmkühler added a comment - I checked all files for suitable license headers using RAT [1] . Find attached the resulting rat-report. Everything likes fine. There are only a few files left which can be updated after the checkin. [1] http://incubator.apache.org/rat/
        Hide
        Guillaume Bailleul added a comment -

        These files should not be checked in :

        ./padaf-parent/.classpath
        ./padaf-parent/.project

        They are rubbish from preparation of donation

        Show
        Guillaume Bailleul added a comment - These files should not be checked in : ./padaf-parent/.classpath ./padaf-parent/.project They are rubbish from preparation of donation
        Hide
        Andreas Lehmkühler added a comment -

        As there were no objections to the IP-Clearance process [1] I added the donated code to the pdfbox repository.

        In detail I completed the following subtasks

        • added xmpbox in revision 1150371
        • added preflight in revision 1150373
        • didn't add empty directories and the parent folder
        • integrated the two modules into the pdfbox build (it works but there are still some improvements to do)
        • added missing license headers
        • excluded some binary files from the rat-report
        • excluded the isator files from the pom (we need to find another place to store them)
        • modified the isator test case so that it won't fail without input
        • updated the LICENSE and NOTICE files
        • removed the java6 dependency by adding activation and stax as maven dependency

        [1] http://incubator.apache.org/ip-clearance/pdfbox-padaf.html

        Show
        Andreas Lehmkühler added a comment - As there were no objections to the IP-Clearance process [1] I added the donated code to the pdfbox repository. In detail I completed the following subtasks added xmpbox in revision 1150371 added preflight in revision 1150373 didn't add empty directories and the parent folder integrated the two modules into the pdfbox build (it works but there are still some improvements to do) added missing license headers excluded some binary files from the rat-report excluded the isator files from the pom (we need to find another place to store them) modified the isator test case so that it won't fail without input updated the LICENSE and NOTICE files removed the java6 dependency by adding activation and stax as maven dependency [1] http://incubator.apache.org/ip-clearance/pdfbox-padaf.html
        Hide
        Guillaume Bailleul added a comment -

        I propose a patch for xmpbox test problem.
        Only including stax-api is not enough because an implementation is needed
        In our java 5 compilation, we used stax from codehaus (which is open source Apache License 2)
        The file stax.patch make the replacement

        Show
        Guillaume Bailleul added a comment - I propose a patch for xmpbox test problem. Only including stax-api is not enough because an implementation is needed In our java 5 compilation, we used stax from codehaus (which is open source Apache License 2) The file stax.patch make the replacement
        Hide
        Guillaume Bailleul added a comment -

        replace stax-api dependency by stax
        499d45f7951e2bc8cddd789d3a8f0d0a stax.patch

        Show
        Guillaume Bailleul added a comment - replace stax-api dependency by stax 499d45f7951e2bc8cddd789d3a8f0d0a stax.patch
        Hide
        Andreas Lehmkühler added a comment -

        I updated the stax dependency in revision 1151225. Thanks for the hint.

        I guess we are done here. Let's get to work

        Thanks again to Guillaume, Eric and Germain to come up with the idea to donate PaDaF to the ASF.

        Show
        Andreas Lehmkühler added a comment - I updated the stax dependency in revision 1151225. Thanks for the hint. I guess we are done here. Let's get to work Thanks again to Guillaume, Eric and Germain to come up with the idea to donate PaDaF to the ASF.

          People

          • Assignee:
            Andreas Lehmkühler
            Reporter:
            Guillaume Bailleul
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development