Description
On TIKA-1678, I was trying to generate a test PDF that had a dc:title in the XMP with XMPBox from PDFBox's trunk. I modified the code from CreatePDFA by adding this:
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle("this is the title");
The generated PDF doesn't appear to have a compliant dc:title entry in the XMP.
tilman noted the divergence from the standard here.
What PDFBox does:
<dc:title>
<rdf:Alt>
<dc:li>this is the title</dc:li>
</rdf:Alt>
</dc:title>
It should be:
<dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">this is the title</rdf:li> </rdf:Alt> </dc:title>
Error message from the PDF-Tools validator:
'dc:li' is not allowed in arrays. The elements must be rdf:li or rdf:_N, where N is a positive number.
There is only one RDF resource allowed in XMP.
Attachments
Issue Links
- relates to
-
PDFBOX-2897 Preflight not flagging bad xml generated by XMPBox for dc:title
- Open
-
TIKA-1678 PDF metadata extraction fails to spot UTF-16 encoded title
- Resolved