HBase
  1. HBase
  2. HBASE-4329

Use NextGen Hadoop to deploy HBase

    Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).

      As an interim, I'd like to brainstorm using NextGen Hadoop (MAPREDUCE-279) to deploy HBase for focussed sets of applications/users/organizations. Thus, one could deploy a smaller instance of HBase (100s of nodes) in a large Hadoop cluster and use it for a set of applications.

      The other advantage is that the resource usage of HBase (master, region-server etc.) is accounted for in the overall utilization of the cluster and, conceivably, aid in resource tracking, capacity planning etc.


      Thoughts?

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          Potentially this is related to Andrew's ideas in HBASE-4047 for using NextGen Hadoop to run generic co-processors.

          Show
          Arun C Murthy added a comment - Potentially this is related to Andrew's ideas in HBASE-4047 for using NextGen Hadoop to run generic co-processors.
          Hide
          stack added a comment -

          Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes).

          Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are).

          Up to now, in our ignorance, we've been thinking a fat hbase install w/ multitenancy enabled via hbase security acls. Regards resources consumed by the running hbase, there is an interesting contribution over in HBASE-4120 that is provocative but I'm thinking needs a bit of work before it'd be committed. Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase).

          You want us to get into the nextgen mr container because then there is one place to go to do accounting? I need to do some background reading over on mapreduce-279 to see what we're missing.

          Good on you Arun.

          Show
          stack added a comment - Currently (circa 2011), with due respect, it's not practical to run shared, multi-tenant HBase clusters on the largest Hadoop installs (of 4000+ nodes). Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are). Up to now, in our ignorance, we've been thinking a fat hbase install w/ multitenancy enabled via hbase security acls. Regards resources consumed by the running hbase, there is an interesting contribution over in HBASE-4120 that is provocative but I'm thinking needs a bit of work before it'd be committed. Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase). You want us to get into the nextgen mr container because then there is one place to go to do accounting? I need to do some background reading over on mapreduce-279 to see what we're missing. Good on you Arun.
          Hide
          Arun C Murthy added a comment -

          Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are).

          Stack, first up, I didn't mean to start to flame - I'm sure you know that.

          FWIW, talking to folks around, isolation and support for prioritization to ensure a single user/application cannot hog a HBase cluster (or parts thereof) is something I've heard as concern. This dovetails very well with our experience running both HDFS and MapReduce at scale, as a shared resource. Again, this isn't to claim it's a solved problem in Hadoop core, just something we've focussed on, for a while now.

          Hence, my thinking was we could use YARN as an intermediate solution. I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker, maybe he was just being polite and has a great poker face!

          Thanks for pointing me to HBASE-4120, that seems related - I wasn't aware. It's a lot to digest, I'll try to spend some time on it. If the HBase community decides to focus on the multi-tenancy/isolation problem (via HBASE-4120 etc.) - great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution.

          It really depends where you guys want to focus your energies.

          Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase).

          I'm sure - MR needs more work, I'm painfully aware of this!

          We plan to go the cgroups route sometime right after we ship 0.23, we could share notes and ideas.

          You want us to get into the nextgen mr container because then there is one place to go to do accounting?

          The idea is that iff the HBase community wants to use this an an intermediate solution, using the RM will ensure the resource usage of HBase is accounted for w.r.t to the applications/queues/organizations etc.

          Show
          Arun C Murthy added a comment - Why you say that? (I don't disagree but a list of why's would help figure what the fit criteria for closing this issue are). Stack, first up, I didn't mean to start to flame - I'm sure you know that. FWIW, talking to folks around, isolation and support for prioritization to ensure a single user/application cannot hog a HBase cluster (or parts thereof) is something I've heard as concern. This dovetails very well with our experience running both HDFS and MapReduce at scale, as a shared resource. Again, this isn't to claim it's a solved problem in Hadoop core, just something we've focussed on, for a while now. Hence, my thinking was we could use YARN as an intermediate solution. I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker, maybe he was just being polite and has a great poker face! Thanks for pointing me to HBASE-4120 , that seems related - I wasn't aware. It's a lot to digest, I'll try to spend some time on it. If the HBase community decides to focus on the multi-tenancy/isolation problem (via HBASE-4120 etc.) - great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution. It really depends where you guys want to focus your energies. Meantime, where I work, mapreduce is the problem (smile). We're messing with cgroup containing mapreduce so it doesn't steal resources from hdfs (and hbase). I'm sure - MR needs more work, I'm painfully aware of this! We plan to go the cgroups route sometime right after we ship 0.23, we could share notes and ideas. You want us to get into the nextgen mr container because then there is one place to go to do accounting? The idea is that iff the HBase community wants to use this an an intermediate solution, using the RM will ensure the resource usage of HBase is accounted for w.r.t to the applications/queues/organizations etc.
          Hide
          stack added a comment -

          Hence, my thinking was we could use YARN as an intermediate solution.

          Why would it be only an intermediate soln Arun? What else needs to be done?

          If the HBase community decides to focus on the multi-tenancy/isolation problem ... great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution.

          Well, we want to play nice with the neighbours.

          I saw the show a few times – at the hadoop summit – but I haven't yet read the book. Will be back when I learn more about YARN container.

          Show
          stack added a comment - Hence, my thinking was we could use YARN as an intermediate solution. Why would it be only an intermediate soln Arun? What else needs to be done? If the HBase community decides to focus on the multi-tenancy/isolation problem ... great! We can close this discussion. If not, I'd like to brainstorm with you guys for an intermediate solution. Well, we want to play nice with the neighbours. I saw the show a few times – at the hadoop summit – but I haven't yet read the book. Will be back when I learn more about YARN container.
          Hide
          Andrew Purtell added a comment -

          I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker

          No, certainly not. With YARN, Hadoop generalizes resource management. It could well make sense to use YARN to partition resources for HBase or other components. It may not be the only story but it makes sense to look at certainly.

          Show
          Andrew Purtell added a comment - I discussed this idea with Andrew at the Summit and he didn't give me the impression that I was off my rocker No, certainly not. With YARN, Hadoop generalizes resource management. It could well make sense to use YARN to partition resources for HBase or other components. It may not be the only story but it makes sense to look at certainly.
          Hide
          Arun C Murthy added a comment -

          It may not be the only story but it makes sense to look at certainly.

          Agree completely. Thanks.

          Show
          Arun C Murthy added a comment - It may not be the only story but it makes sense to look at certainly. Agree completely. Thanks.
          Hide
          Devaraj Das added a comment -

          I've been thinking about it, and I'll upload a patch soon.

          Show
          Devaraj Das added a comment - I've been thinking about it, and I'll upload a patch soon.
          Hide
          Arun C Murthy added a comment -

          Awesome! Thanks for picking up one of my favorite jiras...

          Show
          Arun C Murthy added a comment - Awesome! Thanks for picking up one of my favorite jiras...

            People

            • Assignee:
              Devaraj Das
              Reporter:
              Arun C Murthy
            • Votes:
              1 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:

                Development