[NUTCH-1284] Add site fetcher.max.crawl.delay as log output by default. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Trivial
Resolution: Fixed
Affects Version/s: nutchgora, 1.5
Fix Version/s: 1.7, 2.2
Component/s: fetcher
Labels:
None

Patch Info:

Patch Available

Description

Currently, when manually scanning our log output we cannot infer which pages are governed by a crawl delay between successive fetch attempts of any given page within the site. The value should be made available as something like:

2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching http://nutch.apache.org/ (crawl.delay=XXXms)

This way we can easily and quickly determine whether the fetcher is having to use this functionality or not.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NUTCH-1284-trunk.v1.patch
20/Jan/13 10:41
2 kB
Tejas Patil
NUTCH-1284-2.x.v1.patch
22/Jan/13 02:08
2 kB
Tejas Patil
NUTCH-1284.patch
22/Dec/12 10:52
1 kB
Tejas Patil

Issue Links

incorporates

NUTCH-1042 Fetcher.max.crawl.delay property not taken into account correctly when set to -1

Closed

Activity

People

Assignee:: Tejas Patil

Reporter:: Lewis John McGibbney

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Feb/12 18:58

Updated:: 22/May/13 03:54

Resolved:: 28/Jan/13 08:04