PDFBox
  1. PDFBox
  2. PDFBOX-1540

Add XML output option to preflight

    Details

      Description

      As part of a recent SPRUCE hackathon (http://wiki.opf-labs.org/display/SPR/Home) we added XML output to preflight. It would be good if preflight was able to offer this sort of output by default. Example outputs from our code are here: https://github.com/petecliff/pdfeh/tree/master/sample_preflight_outputs Our XML output code is here: https://github.com/willp-bl/preflight-app-mod You might want to implement it your own way?

      As an aside; we have a format corpus of test files here: https://github.com/openplanets/format-corpus Use of the files and contributions are encouraged!

      Thanks

      1. resultxml.patch
        13 kB
        Guillaume Bailleul
      2. resultxml-2.patch
        15 kB
        Guillaume Bailleul
      3. resultxml-3.patch
        13 kB
        Guillaume Bailleul

        Activity

        William Palmer created issue -
        Hide
        Guillaume Bailleul added a comment -

        This is an interesting idea. I will look at how to do that.

        I think it won't be for 1.8.0 release because devs for this version will be frozen on monday and I will not have time to work on it before.

        I have one question. The xml format of your example looks totally free. Is there any standart format for that stuff ? what are the minimal information to provide in the response ?

        Thanks for the idea.

        Show
        Guillaume Bailleul added a comment - This is an interesting idea. I will look at how to do that. I think it won't be for 1.8.0 release because devs for this version will be frozen on monday and I will not have time to work on it before. I have one question. The xml format of your example looks totally free. Is there any standart format for that stuff ? what are the minimal information to provide in the response ? Thanks for the idea.
        Guillaume Bailleul made changes -
        Field Original Value New Value
        Assignee Guillaume Bailleul [ gbm.bailleul ]
        Hide
        Eric Leleu added a comment -

        In addition of the XML format, maybe a JSON output could be useful ?

        BR,
        Eric

        Show
        Eric Leleu added a comment - In addition of the XML format, maybe a JSON output could be useful ? BR, Eric
        Hide
        William Palmer added a comment -

        We would like as much (relevant) information about the files as possible. We just changed what was already output to the console and wrapped it in XML. It was handy to add a count for identical error code/details as some happened quite a lot and it dramatically reduced the output size.

        I think the only element we would definitely want would be an <isValid>, as in the examples, with an attribute noting pdf type/version. Run time is also a useful metric to have, if possible.

        There is a PLANETS ontology here: http://sourceforge.net/projects/xcltools/ but I have not had a chance to look at it.

        Thanks for your interest

        Show
        William Palmer added a comment - We would like as much (relevant) information about the files as possible. We just changed what was already output to the console and wrapped it in XML. It was handy to add a count for identical error code/details as some happened quite a lot and it dramatically reduced the output size. I think the only element we would definitely want would be an <isValid>, as in the examples, with an attribute noting pdf type/version. Run time is also a useful metric to have, if possible. There is a PLANETS ontology here: http://sourceforge.net/projects/xcltools/ but I have not had a chance to look at it. Thanks for your interest
        Hide
        Guillaume Bailleul added a comment -

        This is a proposition of xml output option.
        I do not commit because I am not totally satisfied of the xml schema.
        Is anyone has ideas ...

        Show
        Guillaume Bailleul added a comment - This is a proposition of xml output option. I do not commit because I am not totally satisfied of the xml schema. Is anyone has ideas ...
        Guillaume Bailleul made changes -
        Attachment resultxml.patch [ 12574006 ]
        Hide
        William Palmer added a comment -

        Guillaume,

        The output looks good to me. <count> in our output was the number of occurrences of that error in the file that was being checked. Would it be possible to add that information to what you already have in the xml?

        Thanks

        Will

        Show
        William Palmer added a comment - Guillaume, The output looks good to me. <count> in our output was the number of occurrences of that error in the file that was being checked. Would it be possible to add that information to what you already have in the xml? Thanks Will
        Hide
        Guillaume Bailleul added a comment -

        This is a second version of the patch.
        The count attribute is now for the occurence of errors.

        If it works fine, I will check in.

        Show
        Guillaume Bailleul added a comment - This is a second version of the patch. The count attribute is now for the occurence of errors. If it works fine, I will check in.
        Guillaume Bailleul made changes -
        Attachment resultxml-2.patch [ 12578455 ]
        Hide
        William Palmer added a comment -

        Hi Guillaume,

        I tested the latest version you checked in and it looks good. I have noticed two things about the xml outputs though:

        • error counts are always 1
        • there are instances where there may be duplicate error codes but with different details/messages but they are not all shown, for example error 3.1.3 shows up only once, for one font, even if there are errors for multiple fonts.

        Thanks for your work on this

        Will

        Show
        William Palmer added a comment - Hi Guillaume, I tested the latest version you checked in and it looks good. I have noticed two things about the xml outputs though: error counts are always 1 there are instances where there may be duplicate error codes but with different details/messages but they are not all shown, for example error 3.1.3 shows up only once, for one font, even if there are errors for multiple fonts. Thanks for your work on this Will
        Hide
        Guillaume Bailleul added a comment -

        This new patch takes into account your last remarks.

        KR

        Show
        Guillaume Bailleul added a comment - This new patch takes into account your last remarks. KR
        Guillaume Bailleul made changes -
        Attachment resultxml-3.patch [ 12587849 ]
        Hide
        Guillaume Bailleul added a comment - - edited

        The patch has been applied on TRUNK (r1494083)

        Show
        Guillaume Bailleul added a comment - - edited The patch has been applied on TRUNK (r1494083)
        Guillaume Bailleul made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Andreas Lehmkühler made changes -
        Fix Version/s 2.0.0 [ 12319281 ]
        Andreas Lehmkühler made changes -
        Fix Version/s 1.8.3 [ 12324576 ]
        Hide
        Andreas Lehmkühler added a comment -

        Merged into 1.8-branch in revision 1542711

        Show
        Andreas Lehmkühler added a comment - Merged into 1.8-branch in revision 1542711
        Hide
        Andreas Lehmkühler added a comment -

        Closed after releasing 1.8.3

        Show
        Andreas Lehmkühler added a comment - Closed after releasing 1.8.3
        Andreas Lehmkühler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Guillaume Bailleul
            Reporter:
            William Palmer
          • Votes:
            7 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development