Okay, before we talk about the boot code, let me address some of the confusion about Hadoop.
In Hadoop, there are things called Jobs, which are a combination of a Map and a Reduce operation and the InputFormat configuration you specify which are then run across a bunch of machines.
A Task is an individual Map or Reduce operation run on one of those machines (so every Job has many Tasks). For every new Task needed, a new JVM is booted up.
This is actually okay, distributed-systems-wise, because it keeps all the Tasks from interfering with one another.
It does, however, make our jobs harder. There is no way for a Task (and thus this Hadoop code in these patches) to access the runtime of a Cassandra node already on the machine because they will be in separate JVMs!
HBase, as I mentioned above, solves this problem by first starting up HBase on those remote machines, and then having each Task create an HTable object from the InputSplit handed to it. This HTable object connects to the local HBase process. (Of course, this same thing happens in the JVM that creates the InputSplits.)
So, here's my deal. There is no way for this currently designed system to work efficiently in a distributed system. This is because we have to boot a brand new Cassandra process on machines that might already have (and need if hardware is limited) one running already. The boot up time for Cassandra alone is a big time sink. And consider how these nodes would interoperate with the "stable", non-Hadoop nodes that would start sending them data. Ugh.
We can avoid all of this boot time drama if we can come up with a good way of remotely accessing all of the internal information we need from the Cassandra node already running. I have not been able to come up with an alternative solution.
 There is something called "Task reuse" that can be configured into a Hadoop deployment. However, the "reuse" only means that a Task can be used more than once for the same Job. So, it's basically just
another complicating factor in our boot loading code (one of the reasons there is BootUp.boot() and BootUp.bootUnsafe()) but doesn't help us with our problem.