Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.9
-
None
-
None
Description
Using the standard bin/crawl script, Solr is never informed when a previously indexed document has been deleted.
"bin/nutch update" sets db_gone status in the crawl db for requests returning HTTP 404 status.
"bin/nutch dedup" remove entries with status db_gone from the crawl db .
As a result "bin/nutch clean" never sees the db_gone status, so does not inform Solr.