Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are).
Stack, first up, I didn't mean to start to flame - I'm sure you know that.
FWIW, talking to folks around, isolation and support for prioritization to ensure a single user/application cannot hog a HBase cluster (or parts thereof) is something I've heard as concern. This dovetails very well with our experience running both HDFS and MapReduce at scale, as a shared resource. Again, this isn't to claim it's a solved problem in Hadoop core, just something we've focussed on, for a while now.
Hence, my thinking was we could use YARN as an intermediate solution. I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker, maybe he was just being polite and has a great poker face!
Thanks for pointing me to
HBASE-4120, that seems related - I wasn't aware. It's a lot to digest, I'll try to spend some time on it. If the HBase community decides to focus on the multi-tenancy/isolation problem (via HBASE-4120 etc.) - great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution.
It really depends where you guys want to focus your energies.
Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase).
I'm sure - MR needs more work, I'm painfully aware of this!
We plan to go the cgroups route sometime right after we ship 0.23, we could share notes and ideas.
You want us to get into the nextgen mr container because then there is one place to go to do accounting?
The idea is that iff the HBase community wants to use this an an intermediate solution, using the RM will ensure the resource usage of HBase is accounted for w.r.t to the applications/queues/organizations etc.