Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
1.7
-
None
-
None
Description
Merged 26036 vs. unmerged 26038 indexed documents! There are two records on the merged segment that no longer have a crawl_fetch CrawlDatum with a fetch_success status. Instead, the only crawl_fetch CrawlDatum has status linked!
The original segment two crawl_fetch CrawlDatums with linked and the fetch_success status.
Without the fetch_success of not_modified status it is not going to be indexed.
Attachments
Attachments
Issue Links
- duplicates
-
NUTCH-1113 Merging segments causes URLs to vanish from crawldb/index?
- Closed
- is related to
-
NUTCH-1617 IndexerMapReduce to consider latest fetchDatum
- Open
- relates to
-
NUTCH-1113 Merging segments causes URLs to vanish from crawldb/index?
- Closed
-
NUTCH-1520 SegmentMerger looses records
- Closed