Description
When Fetcher runs in parsing mode, CrawlDatum metadata is properly passed to a new CrawlDatum for content-level redirects (HTML meta tag "Refresh"). If Fetcher runs in non-parsing mode, and ParseSegment is run as a separate step, then metadata other than "repr" is not passed to the new CrawlDatum.
This means that any filter relying on metadata, such as DepthScoringFilter and URLMetaScoringFilter, will not work.
Attachments
Issue Links
- Blocked
-
NUTCH-685 Content-level redirect status lost in ParseSegment
- Open