Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7
-
None
-
None
Description
IndexerMapReduce can skip not_modified or delete redirects and gone records but it only considers the first incoming fetchDatum. Instead, it should consider the last fetchDatum only based on CrawlDatum.fetchTime.
This affect indexing of multiple segments only.
Attachments
Issue Links
- is related to
-
NUTCH-1416 IndexerMapReduce can index older version of a document instead of latest one
- Reopened
- relates to
-
NUTCH-1616 SegmentMerger missing proper crawl_fetch datum
- Closed