Bigtop
  1. Bigtop
  2. BIGTOP-955

HBase installation should advertise its location and configuration

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Integration between HBase and other data tools can be improved. Work on HIVE-2055, PIG-2786 and HCATALOG-621 will allow those tools' launch scripts (project-local bin/XXX) to include HBase jars and configuration on the classpath when HBASE_HOME and HBASE_CONF_DIR are defined. I believe it is BigTop's responsibility to provide that environment for HBase deployments, similar to how HADOOP_HOME is provided.

        Activity

        Hide
        Roman Shaposhnik added a comment -

        Nick, can you please elaborate on what kind of advertising are you talking about? One of the goals of Bigtop is to integrate well with the underlying OS, hence you don't have to guess where stuff is installed all the binary scripts are always in /usr/bin all the java bits are always in /usr/lib/<component> etc.

        Is this not enough for what you're trying to accomplish?

        Show
        Roman Shaposhnik added a comment - Nick, can you please elaborate on what kind of advertising are you talking about? One of the goals of Bigtop is to integrate well with the underlying OS, hence you don't have to guess where stuff is installed all the binary scripts are always in /usr/bin all the java bits are always in /usr/lib/<component> etc. Is this not enough for what you're trying to accomplish?
        Hide
        Andrew Purtell added a comment -

        Currently different components build the classpaths for launching out of binscripts or for MR job submissions in a brittle way that sometimes causes pain when dependencies change elsewhere. I wonder if it is possible to at least address the binscripts part and also to provide an environment that the components can use if available. What about an "alternatives" like way to contribute classpath fragments? Is that a crazy idea?

        Show
        Andrew Purtell added a comment - Currently different components build the classpaths for launching out of binscripts or for MR job submissions in a brittle way that sometimes causes pain when dependencies change elsewhere. I wonder if it is possible to at least address the binscripts part and also to provide an environment that the components can use if available. What about an "alternatives" like way to contribute classpath fragments? Is that a crazy idea?
        Hide
        Nick Dimiduk added a comment -

        I was hoping I could just provide a patch, but I think that'll take a bit more time than I hoped. Here's what I've found though.

        1. HBase should have an /etc/default/hbase, analogous to /etc/default/hadoop, that exports these two env variables. (BigTop provides)
        2. Pig, Hive, HCat, etc wrapper scripts (ie, ${PREFIX}/$BIN_DIR/hcat in install_hcatalog.sh:115) would source that file if it exists. (BigTop provides)
        3. project-specific bin scripts detect the presence of that file when HBase is installed, source it, and can do what they need to from there (project responsibility)

        This should be sufficient to replace the "workaround" in install_pig.sh:137.

        That said, I don't know how this relates to BIGTOP-477.

        Show
        Nick Dimiduk added a comment - I was hoping I could just provide a patch, but I think that'll take a bit more time than I hoped. Here's what I've found though. 1. HBase should have an /etc/default/hbase , analogous to /etc/default/hadoop , that exports these two env variables. (BigTop provides) 2. Pig, Hive, HCat, etc wrapper scripts (ie, ${PREFIX}/$BIN_DIR/hcat in install_hcatalog.sh:115 ) would source that file if it exists. (BigTop provides) 3. project-specific bin scripts detect the presence of that file when HBase is installed, source it, and can do what they need to from there (project responsibility) This should be sufficient to replace the "workaround" in install_pig.sh:137 . That said, I don't know how this relates to BIGTOP-477 .
        Hide
        Nick Dimiduk added a comment -

        Sorry, #3 should have been "project-specific bin scripts now have the environment variables set for them, can proceed to do with them what they must (project responsibility)."

        Basically, the idea is to continue to API already established by Hadoop for providing bare essential discoverability to dependent projects via the /etc/default/foo file and convention for environment variable names.

        Show
        Nick Dimiduk added a comment - Sorry, #3 should have been "project-specific bin scripts now have the environment variables set for them, can proceed to do with them what they must (project responsibility)." Basically, the idea is to continue to API already established by Hadoop for providing bare essential discoverability to dependent projects via the /etc/default/foo file and convention for environment variable names.
        Hide
        Nick Dimiduk added a comment -

        Is that a crazy idea?

        Andrew Purtell this is beyond what I was hoping for, but I like where your head is at. We (BigTop) should be normalizing the dependencies across the framework as much as possible, forcing version updates on projects at specific versions where necessary, and providing single deployment locations for those common dependencies in well-known locations. For example, every project bin script should be able to construct a cp that references back to that single copy and version of apache-commons-lang sitting in a well known place (/usr/lib/hadoop/lib, or maybe /usr/share/jvm/apache-commons-lang/...). I don't know precisely what the paths should be. The point is, include a single source of truth for dependencies in a well known location, just like a linux distro does with shared libraries.

        This is entirely beyond the scope of what I had in mind. Just a defaults file for HBase that the other projects can source will suit my needs

        Show
        Nick Dimiduk added a comment - Is that a crazy idea? Andrew Purtell this is beyond what I was hoping for, but I like where your head is at. We (BigTop) should be normalizing the dependencies across the framework as much as possible, forcing version updates on projects at specific versions where necessary, and providing single deployment locations for those common dependencies in well-known locations. For example, every project bin script should be able to construct a cp that references back to that single copy and version of apache-commons-lang sitting in a well known place ( /usr/lib/hadoop/lib , or maybe /usr/share/jvm/apache-commons-lang/... ). I don't know precisely what the paths should be. The point is, include a single source of truth for dependencies in a well known location, just like a linux distro does with shared libraries. This is entirely beyond the scope of what I had in mind. Just a defaults file for HBase that the other projects can source will suit my needs
        Hide
        Nick Dimiduk added a comment -

        I worked my way around the code. This is roughly what I had in mind.

        Show
        Nick Dimiduk added a comment - I worked my way around the code. This is roughly what I had in mind.
        Hide
        Nick Dimiduk added a comment -

        Bump. Any opinions on this patch?

        Show
        Nick Dimiduk added a comment - Bump. Any opinions on this patch?
        Hide
        stack added a comment -

        Patch LGTM. I defer to bigtoppers on whether it a pattern that is encouraged around these parts.

        Show
        stack added a comment - Patch LGTM. I defer to bigtoppers on whether it a pattern that is encouraged around these parts.
        Hide
        Sean Mackrory added a comment -

        I have a vague recollection of something similar to this being discouraged in the past, but as I can't find it right now I'll just go with my opinion. I would +1 the idea and the patch in general. Sometime in the next few days I will try it out a bit and commit it if there are no objections.

        Show
        Sean Mackrory added a comment - I have a vague recollection of something similar to this being discouraged in the past, but as I can't find it right now I'll just go with my opinion. I would +1 the idea and the patch in general. Sometime in the next few days I will try it out a bit and commit it if there are no objections.
        Hide
        Sean Mackrory added a comment -

        +1 (committer)! I've tested the patch, specifically with respect to Pig's integration, and the classpath is indeed built up the way it appears Pig expects it to be upstream. I will commit this if no one else in the community disagrees soon. This is kind of a new pattern so there may be those who disagree, but I see this as similar to the way we source /etc/default/hadoop for eveything. I think more and more HBase is seen as part of the ecosystem "kernel" because it's often used as the storage layer, and I think this practice is just a natural reflection of that. Everything I see in /etc/default/hbase is excplitly HBase-specific, but if it's common for people to throw generic JVM environment stuff in this files this could be confusing - hence me waiting for a bit to make sure I'm not insane. Otherwise I'll commit this today.

        Show
        Sean Mackrory added a comment - +1 (committer)! I've tested the patch, specifically with respect to Pig's integration, and the classpath is indeed built up the way it appears Pig expects it to be upstream. I will commit this if no one else in the community disagrees soon. This is kind of a new pattern so there may be those who disagree, but I see this as similar to the way we source /etc/default/hadoop for eveything. I think more and more HBase is seen as part of the ecosystem "kernel" because it's often used as the storage layer, and I think this practice is just a natural reflection of that. Everything I see in /etc/default/hbase is excplitly HBase-specific, but if it's common for people to throw generic JVM environment stuff in this files this could be confusing - hence me waiting for a bit to make sure I'm not insane. Otherwise I'll commit this today.
        Hide
        Nick Dimiduk added a comment -

        Thanks for the review Sean Mackrory. I agree on the idea of HBase being a core piece of infrastructure, but I'm biased! I'm also surprised by the idea of putting generic jvm settings into /etc/default/hbase, aren't there better places for such things? I can kind of see using /etc/default/hadoop to set configs for child processes, but that's an anti-pattern.

        Show
        Nick Dimiduk added a comment - Thanks for the review Sean Mackrory . I agree on the idea of HBase being a core piece of infrastructure, but I'm biased! I'm also surprised by the idea of putting generic jvm settings into /etc/default/hbase , aren't there better places for such things? I can kind of see using /etc/default/hadoop to set configs for child processes, but that's an anti-pattern.
        Hide
        Sean Mackrory added a comment -

        Yeah - I wouldn't expect there to be JVM settings in /etc/default/hbase - just sanity checking myself. I'm a little surprised that HBase showed up in Pig 0.11.1's classpath given that PIG-2786 appears to only be fixed for subsequent releases. I also see Hive doesn't appear to be trying to use either of these variables (although HCat does), so although I don't think it's dangerous to include this, I'm a bit confused on the purpose of sourcing it for Hive. Any comments?

        Show
        Sean Mackrory added a comment - Yeah - I wouldn't expect there to be JVM settings in /etc/default/hbase - just sanity checking myself. I'm a little surprised that HBase showed up in Pig 0.11.1's classpath given that PIG-2786 appears to only be fixed for subsequent releases. I also see Hive doesn't appear to be trying to use either of these variables (although HCat does), so although I don't think it's dangerous to include this, I'm a bit confused on the purpose of sourcing it for Hive. Any comments?
        Hide
        Nick Dimiduk added a comment -

        Funny you should ask

        I have a wider goal of making it transparent for users to consume HBase from {Pig,Hive,HCat}, at least as far as establishing classpath and dependency jars are concerned. They shouldn't need to know about HBase-specific classpath entries or dependencies shipped with their MR job. I want it to be as "easy" to use HBase as it is to use HDFS in this regard. If the distro (BigTop) can provide well-known "hooks," ie these environment variables, then the launch scripts can detect of the presence of HBase and inspect it for additional necessary details. There's more work to be done for both the launch scripts and the dependency shipping, but this is an important component.

        See also HIVE-2055, HIVE-2379, HCATALOG-621, PIG-2786, PIG-3285, HBASE-8438. Some of these will need readdressed for proper Windows support as well.

        Show
        Nick Dimiduk added a comment - Funny you should ask I have a wider goal of making it transparent for users to consume HBase from {Pig,Hive,HCat}, at least as far as establishing classpath and dependency jars are concerned. They shouldn't need to know about HBase-specific classpath entries or dependencies shipped with their MR job. I want it to be as "easy" to use HBase as it is to use HDFS in this regard. If the distro (BigTop) can provide well-known "hooks," ie these environment variables, then the launch scripts can detect of the presence of HBase and inspect it for additional necessary details. There's more work to be done for both the launch scripts and the dependency shipping, but this is an important component. See also HIVE-2055 , HIVE-2379 , HCATALOG-621 , PIG-2786 , PIG-3285 , HBASE-8438 . Some of these will need readdressed for proper Windows support as well.
        Hide
        Andrew Purtell added a comment -

        Just want to add a belated +1 from the HBase side. Good stuff.

        Show
        Andrew Purtell added a comment - Just want to add a belated +1 from the HBase side. Good stuff.
        Hide
        Sean Mackrory added a comment -

        Ah - I see the Hive-related issues appear to be in 0.11 releases and later. Your "wider goal" is certainly compatible with other goals of the Bigtop community, and since I don't see any regression that could be caused here, I think it's a good idea to include this even though the benefit may not be fully realized until Bigtop ships later releases of these components.

        Committed.

        Show
        Sean Mackrory added a comment - Ah - I see the Hive-related issues appear to be in 0.11 releases and later. Your "wider goal" is certainly compatible with other goals of the Bigtop community, and since I don't see any regression that could be caused here, I think it's a good idea to include this even though the benefit may not be fully realized until Bigtop ships later releases of these components. Committed.
        Hide
        Nick Dimiduk added a comment -

        It's a bit of a cart vs horse scenario. Thanks Sean Mackrory and Andrew Purtell!

        Show
        Nick Dimiduk added a comment - It's a bit of a cart vs horse scenario. Thanks Sean Mackrory and Andrew Purtell !

          People

          • Assignee:
            Sean Mackrory
            Reporter:
            Nick Dimiduk
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development