Html redirections using meta tags are supported in nutch. They work well when using parse-html to parse files. However, when using parse-tika, they are not detected.
This is because of https://issues.apache.org/jira/browse/TIKA-2652
Tika emits redirection meta tags as :
whereas org.apache.nutch.parse.tika.HTMLMetaProcessor expects meta tags having the following format :
The bug can be reproduced with the following nutch-site.xml:
fetching this url: http://www.google.com/policies/technologies/ads/
The resulting status is
whereas using parse-html, the resulting status is