Description
The 'Modified time' in crawldb is invalid. It is set to (0-Timezone Difference)
How to verify/reproduce:
Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 'yy'
The following improvements can be done:
1. Set modified time by DefaultFetchSchedule
2. Set ProtocolStatus.lastModified if modified time is available in protocol response headers
This issue is also discussed in dev mailing lists: http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#
Attachments
Issue Links
- is duplicated by
-
NUTCH-2242 lastModified not always set
- Closed
- links to