Description
May be I'm doing something wrong, but it seems to me that NUTCH-1434 patch only works when using tika's parser. When using parser-html, "robots" metatag is only populated if parse-metatags plugin is enabled and it's done with the prefix "metatag.". So parseData.getMeta("robots") returns nothing if not using tika.
I guess the simplest solution would be to provide a fallback in case parseData.getMeta("robots") is null and then get parseData.getMeta("metatag.robots") in that case.
Also dependency of this property with parse-metadata plugin when using parse-html would be something interesting to document somewhere... (nutch-default.xml?)
Thanks!
Attachments
Attachments
Issue Links
- breaks
-
NUTCH-1434 Indexer to delete robots noIndex
- Closed