Fetcher modifies the URLs being fetched (introduced with NUTCH-2375 in c93d908:
FetcherThread 22 fetching http://nutch.apache.org:-1/ (queue crawl delay=5000ms)
which makes it hard to trace the URLs in the log files and likely causes other issues because URLs in CrawlDb and segments do not match (http://nutch.apache.org/ in CrawlDb and http://nutch.apache.org:-1/ in segment).
Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
GitHub Pull Request #317