Comment from ravi:
Right now, the default reducer we use is KeyValueSortReducer . This aggregates the mapper output and write as HFiles into a directory that the user has passed as an argument to the job. After the map reduce job is complete, the LoadIncrementalHFiles class does the task to move the HFiles onto the HBase table.
Couple of possibilities that can happen in the above workflow
a) If all mappers succeed:
A happy path. HFiles get loaded into the HTable and we then update the state to ACTIVE.
b) If few mappers fail:
The job is marked as failure. There can be instances that few mappers have succeeded but since few(one or many) have failed, the job is marked as a Failure. In this case, the output that was produced by the successful mappers is discarded as part of the cleanup process in MapReduce. In this case, the entire job has to be re-run.
It'd be good to make the mapper task be a kind of independent chunk of work, which once complete, would not need to be run again in case the job is started again.
Are you referring to the second case above? If so, I am not sure of an easy way to have the output of mappers persist beyond the job. We can address this by pushing the output KeyValue pairs onto a persistent store like HDFS rather than writing to local filesystem directories. However, I see a challenge when the job is run again. The case here is we have few records(in no sequence order in the primary table) already processed and written to a persistent store and if we do not want them to be processed again in the mapper, there should be some flag or signal to avoid reading it. Mappers usually have region boundaries, if when the job is run again, if we have the same region boundaries, then we could definitely have a way to hack into the framework by saying to run the mappers on only those regions where the earlier mappers failed.