You just need to read these publicly right? Perhaps just write public accessors?
Testing of the HdfsCheckIndex looks pretty minimal...can we reuse TestCheckIndex in some way? I'm thinking like changing each test in there to just take a directory that you pass in. In lucene we use newDirectory, in your test we use an HdfsDirectory. Thoughts?
So... this is a good idea in theory, but in practice it gets really difficult to do. TestCheckIndex isn't visible from the Solr test classes unless we start publishing Lucene test artifacts, which I don't think we want to do. I think we can get away with minimal testing here because we aren't changing any of the functionality, and that's all covered in the Lucene test suite. For our purposes, I think it is enough to establish that if you have an HDFS cluster, you can point this tool at it, and it will run.
Any plans to write a MapReduce Tool to do this?
Sure, after this gets committed I'll open up a new JIRA and we can discuss there.