- An HdfsDirectory implementation that uses a BlockDirectory to cache (read/write) hdfs blocks.
The default index codec currently supports append only filesystems, so impl is fairly straightforward and effective. It would be interesting if we could easily tell if a codec was append only.
- An HdfsDirectoryFactory to hook this into Solr.
Now that Directory is a first class citizen in Solr, allows pretty much everything to work on hdfs with few other tweaks, including Replication.
Adds a new option to DirectoryFactory to have Searchers explicitly reserve commits points - no delete on last close like unix and no delete while in use fails like windows.
- An HdfsUpdateLog that allows writing the transaction log to hdfs as well.
I talked to Yonik a while back and I think we are in agreement that we don't want to currently support making a pluggable UpdateLog - so this one is built in and triggers on using an hdfs:// prefixed update log path.
Simple impl to write lock files to hdfs rather than the local filesystem.
Includes the work for
SOLR-4566 - while a good general improvement, this is also important for this patch because we use the node name in hdfs paths - if a different machine takes over for that path, it's awkward to have the address for another machine as part of it.
There a few new tests specifically written for HDFS. There are also a bunch of new tests that simply run the current pertinent SolrCloud tests against hdfs. Because the SolrCloud tests are already so long, on a slower machine, this can greatly increase the test run time. It's actually almost no noticeable slow down on my 6 core machine, but it's pretty awful on my 2 core machine. To deal with this, in my patch, I have made the tests that are functionally equivalent to current tests but run against hdfs, only run nightly.