Details
Description
During the map phase of the selection step, the generator rejects many (usually most of) items for various reasons:
- not yet time for a refetch (returned by the fetch scheduler)
- generator score too low
- status does not match restrict status
- Jexl expression not matched
and some more. It would be useful if the reasons are counted and logged, esp. when the CrawlDb gets bigger and multiple options to restrict the selection are used.
Attachments
Issue Links
- causes
-
NUTCH-2951 Crawl datum with metadata WRITABLE_GENERATE_TIME_KEY awaits fetching forever
- Closed
- links to