Redirect targets are indexed using "representative URL"
- in Fetcher repr URL is determined by URLUtil.chooseRepr() and stored in CrawlDatum (CrawlDb). Repr URL is either source or target URL of the redirect pair.
- NutchField "url" is filled by basic indexing filter with repr URL
- id field used as unique key is filled from url per solrindex-mapping.xml
Deletion of redirects is done in IndexerMapReduce.reduce() by key which is the URL of the redirect source. If the source URL is chosen as repr URL a redirect target may get erroneously deleted.