Bigtop
  1. Bigtop
  2. BIGTOP-811

Add /var/lib/bigtop as a location to install SQL connectors and other plug-ins

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      For components that require the installation of additional artifacts that Bigtop cannot distribute (due to license incompatability or other issues), Bigtop should provide both a standardized location to which the artifacts can be installed, and the ability to auto-detect the locations to which they may already be installed.

      Specifically, Sqoop and Hive (and possibly other components) require the installation of the MySQL-Java connector when using MySQL for imports, exports, or metadata storage. This needs to be done by copying the JAR to a directory that is already in the classpath for those components (such as /usr/lib/sqoop/lib and /usr/lib/hive/lib). Some repositories distribute packages that install the jar to /usr/share/java.

        Issue Links

          Activity

          Hide
          Sean Mackrory added a comment -

          After taking a closer look at the various components that would benefit from this, it appears that they all expect the JAR to already be in a specific location (<component>_HOME/lib). I propose that bigtop-detect-javahome start looking in likely locations for any extra artifacts (just mysql-connector-java at first) and create symlinks to them in /var/lib/bigtop (unless valid symlinks already exist). Any package that can use those artifacts can then ship with symlinks to the appropriate file in /var/lib/bigtop. Thoughts?

          Show
          Sean Mackrory added a comment - After taking a closer look at the various components that would benefit from this, it appears that they all expect the JAR to already be in a specific location (<component>_HOME/lib). I propose that bigtop-detect-javahome start looking in likely locations for any extra artifacts (just mysql-connector-java at first) and create symlinks to them in /var/lib/bigtop (unless valid symlinks already exist). Any package that can use those artifacts can then ship with symlinks to the appropriate file in /var/lib/bigtop. Thoughts?
          Hide
          Mark Grover added a comment -

          The intent and idea is great. Thanks, Sean!
          A couple things:
          1. May I propose that we provide a configuration property that can be used to override the location (unless we already do so) of mysql-connector-java.
          2. Should we call it mysql-connector-java or keep the generic like sql-connector-java so the same idea can apply to other sql connectors that aren't shipped with the distribution.
          3. Can you think of a scenario where different components would want to use different versions of connectors (arguably connecting to different databases)? As a good engineering practice, I'd expect users to use the same relational database for say sqoop and hive metastores but if they have a separate metastore for sqoop and separate metastore for hive and their versions mismatch, they would need separate connectors for these. If you consider this to be a reasonable use case, we obviously can't help them here so let's make sure we don't make their life more difficult either.
          Thoughts?

          Show
          Mark Grover added a comment - The intent and idea is great. Thanks, Sean! A couple things: 1. May I propose that we provide a configuration property that can be used to override the location (unless we already do so) of mysql-connector-java. 2. Should we call it mysql-connector-java or keep the generic like sql-connector-java so the same idea can apply to other sql connectors that aren't shipped with the distribution. 3. Can you think of a scenario where different components would want to use different versions of connectors (arguably connecting to different databases)? As a good engineering practice, I'd expect users to use the same relational database for say sqoop and hive metastores but if they have a separate metastore for sqoop and separate metastore for hive and their versions mismatch, they would need separate connectors for these. If you consider this to be a reasonable use case, we obviously can't help them here so let's make sure we don't make their life more difficult either. Thoughts?
          Hide
          Sean Mackrory added a comment -

          >> 1. May I propose that we provide a configuration property that can be used to override the location (unless we already do so) of mysql-connector-java.

          Absolutely - I'm envisioning a very similar thing to what we do with JAVA_HOME. you can set the variable in /etc/default/bigtop-utils and it'll override any auto-detection.

          >> 2. Should we call it mysql-connector-java or keep the generic like sql-connector-java so the same idea can apply to other sql connectors that aren't shipped with the distribution.

          I hadn't seen "sql-connector-java" anywhere so you may know best here how it's used. However, it's entirely possible that people will want to use MySQL specifically for Sqoop when they use Postgres for a Hive Metastore, perhaps - so even if we do support the general SQL connector link, I think there's definitely value in having all the specific links we can.

          >> 3. Can you think of a scenario where different components would want to use different versions of connectors (arguably connecting to different databases)? As a good engineering practice, I'd expect users to use the same relational database for say sqoop and hive metastores but if they have a separate metastore for sqoop and separate metastore for hive and their versions mismatch, they would need separate connectors for these. If you consider this to be a reasonable use case, we obviously can't help them here so let's make sure we don't make their life more difficult either.

          Interesting point - I'll have to think about that use case a bit and see what ideas I can come up with. Let me know if you have any other thoughts..

          Show
          Sean Mackrory added a comment - >> 1. May I propose that we provide a configuration property that can be used to override the location (unless we already do so) of mysql-connector-java. Absolutely - I'm envisioning a very similar thing to what we do with JAVA_HOME. you can set the variable in /etc/default/bigtop-utils and it'll override any auto-detection. >> 2. Should we call it mysql-connector-java or keep the generic like sql-connector-java so the same idea can apply to other sql connectors that aren't shipped with the distribution. I hadn't seen "sql-connector-java" anywhere so you may know best here how it's used. However, it's entirely possible that people will want to use MySQL specifically for Sqoop when they use Postgres for a Hive Metastore, perhaps - so even if we do support the general SQL connector link, I think there's definitely value in having all the specific links we can. >> 3. Can you think of a scenario where different components would want to use different versions of connectors (arguably connecting to different databases)? As a good engineering practice, I'd expect users to use the same relational database for say sqoop and hive metastores but if they have a separate metastore for sqoop and separate metastore for hive and their versions mismatch, they would need separate connectors for these. If you consider this to be a reasonable use case, we obviously can't help them here so let's make sure we don't make their life more difficult either. Interesting point - I'll have to think about that use case a bit and see what ideas I can come up with. Let me know if you have any other thoughts..
          Hide
          Sean Mackrory added a comment -

          Attached one possibility. I'm liking the idea of creating links less and less, because if Bigtop guesses wrong, then the user has to clean up. This method is very similar to JAVA_HOME detection. The user can manually set it however it wants and override detection. If it's not set manually, it will just slurp everything it can into one classpath. The only catch is getting this into the classpath for the daemon (not very easy, especially when using bigtop-tomcat). But for simplicity's sake, let's say the environment's classpath just bled through to the child environment, you could just do this:

          . /usr/lib/bigtop-utils/bigtop-detect-classpath
          export CLASSPATH=$BIGTOP_CLASSPATH
          

          If we do end up having to make links, this at least gives us a good starting point for finding the targets of said links.

          Show
          Sean Mackrory added a comment - Attached one possibility. I'm liking the idea of creating links less and less, because if Bigtop guesses wrong, then the user has to clean up. This method is very similar to JAVA_HOME detection. The user can manually set it however it wants and override detection. If it's not set manually, it will just slurp everything it can into one classpath. The only catch is getting this into the classpath for the daemon (not very easy, especially when using bigtop-tomcat). But for simplicity's sake, let's say the environment's classpath just bled through to the child environment, you could just do this: . /usr/lib/bigtop-utils/bigtop-detect-classpath export CLASSPATH=$BIGTOP_CLASSPATH If we do end up having to make links, this at least gives us a good starting point for finding the targets of said links.
          Hide
          Mark Grover added a comment -

          I took a quick look, can you please update https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/bigtop-utils/bigtop-utils.default with a comment like we do for other overridable environment variables?

          Show
          Mark Grover added a comment - I took a quick look, can you please update https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/bigtop-utils/bigtop-utils.default with a comment like we do for other overridable environment variables?
          Hide
          Sean Mackrory added a comment -

          Will do once I have an a component successfully using this mechanism!

          Show
          Sean Mackrory added a comment - Will do once I have an a component successfully using this mechanism!
          Hide
          Roman Shaposhnik added a comment -

          Guys, may I suggest a slightly different approach? Could it be possible to convince the upstream and/or hack it in Bigtop to follow plugins.d approach for registering plugin jars? Here's a good example of an upstream project following that rule: FLUME-1845

          Thoughts?

          Show
          Roman Shaposhnik added a comment - Guys, may I suggest a slightly different approach? Could it be possible to convince the upstream and/or hack it in Bigtop to follow plugins.d approach for registering plugin jars? Here's a good example of an upstream project following that rule: FLUME-1845 Thoughts?
          Hide
          Sean Mackrory added a comment -

          Roman Shaposhnik I think that would be great to do eventually, but it seems like for the immediate goal of simplifying JDBC driver installation, it's a bit much. I definitely think we should implement this solution with an eye toward making it more comprehensive in the future, though.

          Attached is an alternate idea that does two things:

          1. Search for these libraries in places where distros commonly put them (just /usr/share/java for now), and creates symlinks to them in /var/lib/bigtop. This behavior is easily disabled in /etc/defaults/bigtop-utils, and will not overwrite existing symlinks.

          2. Adds the contents of /var/lib/bigtop to the classpath for Hive and Oozie (the latter I was unable to fully verify because of some other problems, but it appears correct).

          I'd like to get some thoughts on this approach, and will then apply it to other components as appropriate (I think Sqoop 2, when committed, will be the only other one that would use this immediately).

          Some other ideas for improvement include component-specific sub-directories (although we already have that with /usr/lib/*, IMHO), and perhaps adding libraries to the classpath in reverse-alphabetical order to ensure higher versions take precedence, and something plugins.d-like later.

          Thoughts on this approach? I've heard concerns about things ending up in the classpath that shouldn't, so maybe we want to disable the symlinking by default (although I like the fact that in distros that ship it, the mysql-connector-java package will work for Bigtop out-of-the-box), and we should definitely document it's intended purpose well.

          Show
          Sean Mackrory added a comment - Roman Shaposhnik I think that would be great to do eventually, but it seems like for the immediate goal of simplifying JDBC driver installation, it's a bit much. I definitely think we should implement this solution with an eye toward making it more comprehensive in the future, though. Attached is an alternate idea that does two things: 1. Search for these libraries in places where distros commonly put them (just /usr/share/java for now), and creates symlinks to them in /var/lib/bigtop. This behavior is easily disabled in /etc/defaults/bigtop-utils, and will not overwrite existing symlinks. 2. Adds the contents of /var/lib/bigtop to the classpath for Hive and Oozie (the latter I was unable to fully verify because of some other problems, but it appears correct). I'd like to get some thoughts on this approach, and will then apply it to other components as appropriate (I think Sqoop 2, when committed, will be the only other one that would use this immediately). Some other ideas for improvement include component-specific sub-directories (although we already have that with /usr/lib/*, IMHO), and perhaps adding libraries to the classpath in reverse-alphabetical order to ensure higher versions take precedence, and something plugins.d-like later. Thoughts on this approach? I've heard concerns about things ending up in the classpath that shouldn't, so maybe we want to disable the symlinking by default (although I like the fact that in distros that ship it, the mysql-connector-java package will work for Bigtop out-of-the-box), and we should definitely document it's intended purpose well.
          Hide
          Sean Mackrory added a comment -

          Just a note that as the new Sqoop version has been committed, we'll need to take a similar approach with it as we do with Oozie, whenever we reach a consensus here.

          Show
          Sean Mackrory added a comment - Just a note that as the new Sqoop version has been committed, we'll need to take a similar approach with it as we do with Oozie, whenever we reach a consensus here.
          Hide
          Sean Mackrory added a comment -

          I'll take the silence as approval I've updated the patch so it can apply cleanly over BIGTOP-935, and so that it applies to Sqoop. Did some more exhaustive testing, and it all appears to work flawlessly.

          Show
          Sean Mackrory added a comment - I'll take the silence as approval I've updated the patch so it can apply cleanly over BIGTOP-935 , and so that it applies to Sqoop. Did some more exhaustive testing, and it all appears to work flawlessly.
          Hide
          Roman Shaposhnik added a comment -

          Sorry for the belated reply, but after thinking about this for a little while I think I'm against the harvesting approach that is currently in the first half of bigtop-detect-classpath. First of all this script would have to be run under the bigtop user for it to work properly and second of all I don't think it belongs to the same script that we'd recommend users to source unconditionally.

          Show
          Roman Shaposhnik added a comment - Sorry for the belated reply, but after thinking about this for a little while I think I'm against the harvesting approach that is currently in the first half of bigtop-detect-classpath. First of all this script would have to be run under the bigtop user for it to work properly and second of all I don't think it belongs to the same script that we'd recommend users to source unconditionally.
          Hide
          Sean Mackrory added a comment -

          Will implement Roman's feedback and remove the symlink-creation loop...

          Show
          Sean Mackrory added a comment - Will implement Roman's feedback and remove the symlink-creation loop...
          Hide
          Sean Mackrory added a comment - - edited

          So this really just provides a common place to install libraries for Hive, Sqoop and Oozie. To be able to detect these drivers in their default install locations consistently again, we'll need to make Tomcat deployments more dynamic (see BIGTOP-939). So I think this is just a good first step.

          Show
          Sean Mackrory added a comment - - edited So this really just provides a common place to install libraries for Hive, Sqoop and Oozie. To be able to detect these drivers in their default install locations consistently again, we'll need to make Tomcat deployments more dynamic (see BIGTOP-939 ). So I think this is just a good first step.
          Hide
          Sean Mackrory added a comment -

          Just changing the title of the JIRA to reflect the narrower focus. BIGTOP-939 is a larger change that will make more dynamic "auto-detection" more consistently possible - at which time I'll revisit the issue of auto-detecting packages like RHEL 6's mysql-connector-java. I think adding /var/lib/bigtop as I have is a good first step for now.

          Show
          Sean Mackrory added a comment - Just changing the title of the JIRA to reflect the narrower focus. BIGTOP-939 is a larger change that will make more dynamic "auto-detection" more consistently possible - at which time I'll revisit the issue of auto-detecting packages like RHEL 6's mysql-connector-java. I think adding /var/lib/bigtop as I have is a good first step for now.
          Hide
          Sean Mackrory added a comment -

          Just updating the patch to have the new title in the commit message.

          Show
          Sean Mackrory added a comment - Just updating the patch to have the new title in the commit message.
          Hide
          Mark Grover added a comment -

          Sean,
          This looks good to me overall.

          A few minor suggestions:
          1. The patch doesn't apply cleanly anymore. Can you please update it?
          2. I would still personally like to see a template entry for how to override BIGTOP_CLASSPATH in bigtop-utils.default (https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/bigtop-utils/bigtop-utils.default)
          May be something like:

          # Override Bigtop classpath for including common jars (e.g. database connectors, etc.) in classpath of various components of Bigtop distribution. Only needed for including jars outside of /var/lib/bigtop.
          # export BIGTOP_CLASSPATH
          

          What do you think?

          It's also worth pointing out somewhere in code/comment/documentation that BIGTOP_CLASSPATH currently only is used by Hive. Sqoop, Oozie bypass and directly add things to tomcat's classpath without observing what's in BIGTOP_CLASSPATH.

          Show
          Mark Grover added a comment - Sean, This looks good to me overall. A few minor suggestions: 1. The patch doesn't apply cleanly anymore. Can you please update it? 2. I would still personally like to see a template entry for how to override BIGTOP_CLASSPATH in bigtop-utils.default ( https://github.com/apache/bigtop/blob/master/bigtop-packages/src/common/bigtop-utils/bigtop-utils.default ) May be something like: # Override Bigtop classpath for including common jars (e.g. database connectors, etc.) in classpath of various components of Bigtop distribution. Only needed for including jars outside of / var /lib/bigtop. # export BIGTOP_CLASSPATH What do you think? It's also worth pointing out somewhere in code/comment/documentation that BIGTOP_CLASSPATH currently only is used by Hive. Sqoop, Oozie bypass and directly add things to tomcat's classpath without observing what's in BIGTOP_CLASSPATH.
          Hide
          Sean Mackrory added a comment -

          I will do #1 and #2. One of the goals of BIGTOP-939 is to make it possible for Sqoop and Oozie to observe BIGTOP_CLASSPATH as well - so hopefully we can settle on a solution there too.

          Show
          Sean Mackrory added a comment - I will do #1 and #2. One of the goals of BIGTOP-939 is to make it possible for Sqoop and Oozie to observe BIGTOP_CLASSPATH as well - so hopefully we can settle on a solution there too.
          Hide
          Mark Grover added a comment -

          Thanks! I will take a look at BIGTOP-939.

          Show
          Mark Grover added a comment - Thanks! I will take a look at BIGTOP-939 .
          Hide
          Sean Mackrory added a comment -

          Mark Grover: Updated the patch to apply cleanly. Also - there's already a template in /etc/default.

          Show
          Sean Mackrory added a comment - Mark Grover : Updated the patch to apply cleanly. Also - there's already a template in /etc/default.
          Hide
          Mark Grover added a comment -

          Thanks, Sean. Minor nit: git complains about a whitespace error (at line 58 of the patch) when applying it.

          <stdin>:58: new blank line at EOF.
          +
          warning: 1 line adds whitespace errors.
          

          Can you please remove the extra line at the end of bigtop-detect-classpath?

          Show
          Mark Grover added a comment - Thanks, Sean. Minor nit: git complains about a whitespace error (at line 58 of the patch) when applying it. <stdin>:58: new blank line at EOF. + warning: 1 line adds whitespace errors. Can you please remove the extra line at the end of bigtop-detect-classpath?
          Hide
          Sean Mackrory added a comment -

          I don't see a problem. UNIX text files are supposed to end with a newline.

          Show
          Sean Mackrory added a comment - I don't see a problem. UNIX text files are supposed to end with a newline.
          Hide
          Mark Grover added a comment -

          I don't really feel strongly about it. Let's see if anyone else has an opinion here. If we don't hear back from anyone, I can give you a +1.
          Konstantin Boudnik, Bruno Mahé any thoughts about the extra new line?

          Show
          Mark Grover added a comment - I don't really feel strongly about it. Let's see if anyone else has an opinion here. If we don't hear back from anyone, I can give you a +1. Konstantin Boudnik , Bruno Mahé any thoughts about the extra new line?
          Hide
          Sean Mackrory added a comment -

          Adding patch with no additional new-line at the end.

          Show
          Sean Mackrory added a comment - Adding patch with no additional new-line at the end.
          Hide
          Mark Grover added a comment -

          +1

          Show
          Mark Grover added a comment - +1
          Hide
          Mark Grover added a comment -

          Sean, looks good. Can you please commit it?

          Show
          Mark Grover added a comment - Sean, looks good. Can you please commit it?
          Hide
          Sean Mackrory added a comment -

          Comitted.

          Show
          Sean Mackrory added a comment - Comitted.
          Hide
          Konstantin Boudnik added a comment -

          yeah. looks fine. Thanks!

          Show
          Konstantin Boudnik added a comment - yeah. looks fine. Thanks!

            People

            • Assignee:
              Sean Mackrory
              Reporter:
              Sean Mackrory
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development