[NUTCH-2553] Fetcher not to modify URLs to be fetched - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.15
Fix Version/s: 1.15
Component/s: fetcher
Labels:
None

Description

Fetcher modifies the URLs being fetched (introduced with ~~NUTCH-2375~~ in c93d908:

FetcherThread 22 fetching http://nutch.apache.org:-1/ (queue crawl delay=5000ms)

which makes it hard to trace the URLs in the log files and likely causes other issues because URLs in CrawlDb and segments do not match (http://nutch.apache.org/ in CrawlDb and http://nutch.apache.org:-1/ in segment).

Attachments

Issue Links

is caused by

NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce

Closed

links to

GitHub Pull Request #317

Activity

People

Assignee:: Sebastian Nagel

Reporter:: Sebastian Nagel

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Apr/18 08:26

Updated:: 01/Oct/19 14:29

Resolved:: 21/Apr/18 16:36