I have a PDF in which title has been set twice – once as Dublin core metadata:
Consumer credit cards - conditions of use
and again in the PDF DocInfo section:
/Title(Consumer Credit Card - Conditions of Use)
When I use Tika to transform the PDF into HTML
java -jar tika-app-1.13.jar int_Consumer_Conditions_of_use.pdf
it outputs this metadata:
<meta name="dc:title" content="Consumer credit cards - conditions of use"/>
and this <title> tag:
<title>Consumer credit cards - conditions of use</title>
meaning we no longer have access to the DocInfo title.
Is there some way you could adapt Tika to copy this PDF DocInfo forward during a conversion under a new type of metadata, e.g.
<meta name="docinfo:title" content="Consumer Credit Card - Conditions of Use"/>