Details
Description
The 2.x migration doc (here) mentions that author is generally, and automatically, mapped to it's dc:creator equivalent when returned by Tika 2.x. That doesn't seem to be happening for HTML files. Can this be fixed?
$ curl -X PUT --upload-file /mnt/c/tmp/author.html --header "Content-Disposition: attachment; filename=\"author.html\"" --header "Accept:Application/json" http://localhost:9998/rmeta/text | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1152 100 716 100 436 685 417 0:00:01 0:00:01 {}:{}:{} 1102
[
{
"Content-Encoding": "UTF-8",
"Content-Length": "436",
"Content-Type": "text/html; charset=UTF-8",
"X-TIKA:Parsed-By": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.html.HtmlParser"
],
"X-TIKA:Parsed-By-Full-Set": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.html.HtmlParser"
],
"X-TIKA:content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAll meta information goes in the head section...\n\n\n",
"X-TIKA:content_handler": "ToTextContentHandler",
"X-TIKA:embedded_depth": "0",
"X-TIKA:parse_time_millis": "886",
"author": "John Doe",
"description": "Free Web tutorials",
"keywords": "HTML,CSS,XML,JavaScript",
"resourceName": "author.html",
"title": "OldMetaTitle",
"viewport": "width=device-width, initial-scale=1.0"
}}
]
Attachments
Attachments
Issue Links
- links to