[PDFBOX-1685] Verify interpretation of rdf:about for PDF/A - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.8.4, 2.0.0
Component/s: Preflight
Labels:
None

Description

There was a discussion about handling rdf:about for PDF/A validation on the PDF Associations mailing list which I'm allowed to share:

<snip>
In this case we have a PDF with an XMP metadata stream containing two
<rdf:RDF> entries, one with rdf:about set to a blank string, the other with
it set to a UUID. The PDF/A specification (ISO-19005-1:2005(E) para 6.7.2)
simply says that the stream must conform to the "XMP specification 2004
revision" which reads (p21):

The rdf:about attribute on the rdf:Description element is a required
attribute that identifies the resource whose metadata this XMP describes.
The value of this attribute must follow URI syntax and may be either:

● an empty string (as in the example above), which means that the XMP is
physically local to the resource being described. Applications must rely on
knowledge of the file format to correctly associate the XMP with the
resource.

● a unique instance ID that is generated every time a file is saved. The
next section gives guidelines for creating instance IDs.

The XMP packet must describe a single entity, and my reading of the above
is a combination of empty-string and a unique UUID can meet this
requirement - this is how both our software and Acrobat X and XI behave.
However it's ambiguous, and this clause was revised in the 2012 revision
(ISO 16684-1:2011(E) para 7.4) to this:

If the XMP data model has an AboutURI (6.1, “XMP packets”), that same URI
shall be the value of an rdf:about attribute in each top-level
rdf:Description element. Otherwise, the rdf:about attributes for all top-
level rdf:Description elements shall be present with an empty value. The
rdf:about attribute shall not be used in more deeply nested rdf:Description
elements.
For compatibility with very early XMP usage, it is recommended that XMP
readers tolerate a missing rdf:about attribute and treat it as present with
an empty value. It is also recommended that XMP readers tolerate a mix of
empty and non-empty rdf:about values, as long as all non-empty values are
identical.

Which means that an empty string and a unique UUID are technically
incorrect, but it's recommended they be tolerated for compatibility
purposes.

</snip>

I might be good to check our interpretation as

to see if we should change our interpretation. If there is new input on the pdfa.org mailinglist I'll capture it here too.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

test-bfo.pdf
07/Aug/13 15:44
541 kB
Maruan Sahyoun

Activity

People

Assignee:: Eric Leleu

Reporter:: Maruan Sahyoun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Aug/13 15:41

Updated:: 31/Jan/14 06:46

Resolved:: 27/Nov/13 21:18