Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.10
-
None
-
None
Description
ParserChecker and IndexingFiltersChecker have evolved from simple tools to check parsers and parsefilters resp. indexing filters to powerful tools which emulate the crawling of a single URL/document:
- check robots.txt (
NUTCH-2002) - follow redirects (
NUTCH-2004)
Keeping both tools in sync takes extra work (cf. NUTCH-1757/NUTCH-2006, also NUTCH-2002, NUTCH-2004 are done only for parsechecker). It's time to merge them
- either into one general debugging tool, keeping parsechecker and indexchecker as aliases
- centralize common code in one utility class
Attachments
Issue Links
- incorporates
-
NUTCH-2145 parse/index checker fail to fetch valid percent-encoded URLs
- Closed
-
NUTCH-2554 parserchecker can't fetch some URLs
- Closed
- links to