Description
I am using nutch to crawl sites and have combined it
with solr pushing the nutch index using the solrindex command. I have
set it up as specified on the wiki using the copyField url to id in the
schema. Whilst this works fine it is stuff's up my inputs from other
sources in solr (e.g. using the solr data import handler) as they have
both id's and url's. I have patch that implements a nutch xml schema
defining what basic nutch fields map to in your solr push.