Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
-
Patch Available
Description
Large crawls fail to restrict crawling non-html via suffix filter alone, due to URL's hiding mime-types. This issue only passes records with a Content-Type that match a regex.
Attachments
Attachments
Issue Links
- is superceded by
-
NUTCH-2231 Jexl support in generator job
- Closed