Affects Version/s: None
Fix Version/s: None
Whatever the input HTML meta are, tika's HTML meta can only have a "name" and a "content" attribute. This gives invalid HTML meta tags for in the output.
For instance, the following valid HTML file
is transformed into a SAX stream corresponding to the following HTML :
(the redirection, content-type, and content-encoding are all specified in a non-standard way)
The information that the original file had an "http-equiv" meta tag is lost, and replaced by a generic "meta name=" tag.
This is annoying when working with classes expecting valid meta redirection, such as Nutch's HTMLMetaProcessor, for instance.