Details
Description
When issuing recrawls it can happen that certain urls have expired (i.e. URLs that don't exist anymore and return 404).
This patch creates a new command in the indexer that scans the crawldb looking for these urls and issues delete commands to SOLR.
Attachments
Attachments
Issue Links
- relates to
-
NUTCH-979 Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND
- Closed