I have a PDF in which title has been set twice – once as Dublin core metadata:
and again in the PDF DocInfo section:
When I use Tika to transform the PDF into HTML
it outputs this metadata:
and this <title> tag:
meaning we no longer have access to the DocInfo title.
Is there some way you could adapt Tika to copy this PDF DocInfo forward during a conversion under a new type of metadata, e.g.