Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1685

Verify interpretation of rdf:about for PDF/A

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.8.4, 2.0.0
    • Preflight
    • None

    Description

      There was a discussion about handling rdf:about for PDF/A validation on the PDF Associations mailing list which I'm allowed to share:

      <snip>
      In this case we have a PDF with an XMP metadata stream containing two
      <rdf:RDF> entries, one with rdf:about set to a blank string, the other with
      it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2)
      simply says that the stream must conform to the "XMP specification 2004
      revision" which reads (p21):

      The rdf:about attribute on the rdf:Description element is a required
      attribute that identifies the resource whose metadata this XMP describes.
      The value of this attribute must follow URI syntax and may be either:

      ● an empty string (as in the example above), which means that the XMP is
      physically local to the resource being described. Applications must rely on
      knowledge of the file format to correctly associate the XMP with the
      resource.

      ● a unique instance ID that is generated every time a file is saved. The
      next section gives guidelines for creating instance IDs.

      The XMP packet must describe a single entity, and my reading of the above
      is a combination of empty-string and a unique UUID can meet this
      requirement - this is how both our software and Acrobat X and XI behave.
      However it's ambiguous, and this clause was revised in the 2012 revision
      (ISO 16684-1:2011(E) para 7.4) to this:

      If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI
      shall be the value of an rdf:about attribute in each top-level
      rdf:Description element. Otherwise, the rdf:about attributes for all top-
      level rdf:Description elements shall be present with an empty value. The
      rdf:about attribute shall not be used in more deeply nested rdf:Description
      elements.
      For compatibility with very early XMP usage, it is recommended that XMP
      readers tolerate a missing rdf:about attribute and treat it as present with
      an empty value. It is also recommended that XMP readers tolerate a mix of
      empty and non-empty rdf:about values, as long as all non-empty values are
      identical.

      Which means that an empty string and a unique UUID are technically
      incorrect, but it's recommended they be tolerated for compatibility
      purposes.

      </snip>

      I might be good to check our interpretation as

      <snip
      BFO and Acrobat X and XI think this is valid, PDFBox and
      pdf-tools.com online validator lean the other and classify this document
      as invalid.
      </snip>

      to see if we should change our interpretation. If there is new input on the pdfa.org mailinglist I'll capture it here too.

      Attachments

        1. test-bfo.pdf
          541 kB
          Maruan Sahyoun

        Activity

          People

            leleueri Eric Leleu
            msahyoun Maruan Sahyoun
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: