Details
Description
Sometimes when working with distributed team(s), we have found that we can 'loose' data structures which are currently considered as critical e.g. crawldb, linkdb and/or segments.
In my current scenario I have a requirement to index segment data with no accompanying crawldb or linkdb.
Absence of the latter is OK as linkdb is optional however currently in IndexerMapReduce crawldb is mandatory.
This ticket should enhance the IndexerMapReduce code to support the use case where you ONLY have segments and want to force an index for every record present.
Attachments
Attachments
Issue Links
- incorporates
-
NUTCH-2456 Allow to index pages/URLs not contained in CrawlDb
- Closed
- links to