Redirected robots.txt rules are also cached for the target host. That may cause that the correct robots.txt rules are never fetched. E.g., http://wyomingtheband.com/robots.txt redirects to https://www.facebook.com/wyomingtheband/robots.txt. Because fetching fails with a 404 bots are allowed to crawl wyomingtheband.com. The rules is erroneously also cached for the redirect target host www.facebook.com which is clear regarding their robots.txt rules and does not allow crawling.
Nutch may cache redirected robots.txt rules only if the path part (in doubt, including the query) of the redirect target URL is exactly /robots.txt.