The purpose of this simple initial version is to give people an idea of the functionality. It uses Hadoop contrib/index, which uses Hadoop mapred package. Future versions will be very different from this version. The main difference is that in this version, after a Solr input document is converted to a Lucene document, a Lucene index writer is used to build the index. In future versions, a Solr writer/core will be used.
Here are some pre-requisites for this issue:
- Hadoop 0.20. Hadoop 0.20 is to be released. There are two features in 0.20 that are important for this issue.
First is the new mapreduce package. The flexibility of the new mapreduce api makes it possible to use a Solr writer/core in mapper tasks.
Second is the upgrade to Jetty 6 (6.1.14). The current release 0.19 uses Jetty 5.
- There are a couple of changes required in Solr.
First is to make SolrCore support an indexing-only mode (i.e. no search). Only then is it feasible to use it for indexing in a map task.
Second is to upgrate from Jetty 6.1.3 to Jetty 6.1.14. Hadoop 0.20 uses a feature that is not available in 6.1.3.
What do you think about making "SolrCore support an indexing-only mode"?