Hadoop Common
  1. Hadoop Common
  2. HADOOP-2867

Add a task's cwd to it's LD_LIBRARY_PATH

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added task's cwd to its LD_LIBRARY_PATH.

      Description

      HADOOP-1660 added the task's cwd to it's java.library.path which means only java task's can take advantage via System.load or System.loadLibrary... we should enhance it to support Hadoop Pipes applications by adding it to the LD_LIBRARY_PATH so they can use dlopen/dlsym etc.

      1. patch-2867.txt
        3 kB
        Amareshwari Sriramadasu
      2. patch-2867.txt
        3 kB
        Amareshwari Sriramadasu
      3. patch-2867.txt
        2 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Amareshwari Sriramadasu added a comment -

          Preliminary patch for adding LD_LIBRARY_PATH. Meanwhile I will do the testing.

          Show
          Amareshwari Sriramadasu added a comment - Preliminary patch for adding LD_LIBRARY_PATH. Meanwhile I will do the testing.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382104/patch-2867.txt
          against trunk revision 656585.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382104/patch-2867.txt against trunk revision 656585. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Amareshwari, please do not blindly add it to the environment - you need to append cwd to the existing LD_LIBRARY_PATH, especially in-light of HADOOP-2838.

          Show
          Arun C Murthy added a comment - Amareshwari, please do not blindly add it to the environment - you need to append cwd to the existing LD_LIBRARY_PATH, especially in-light of HADOOP-2838 .
          Hide
          Arun C Murthy added a comment -

          Also, please ensure you have a test-case for this. Thanks!

          Show
          Arun C Murthy added a comment - Also, please ensure you have a test-case for this. Thanks!
          Hide
          Arun C Murthy added a comment -

          Amareshwari, while you working on this I think it would be great to fix HADOOP-3828 too! smile

          Show
          Arun C Murthy added a comment - Amareshwari, while you working on this I think it would be great to fix HADOOP-3828 too! smile
          Hide
          Christian Kunz added a comment -

          The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache. Some of them might be shared libraries. So far, one has to create a wrapper to configure the appropriate LD_LIBRARY_PATH based on mapred.cache.localFiles.
          Therefore, it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH.

          Show
          Christian Kunz added a comment - The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache. Some of them might be shared libraries. So far, one has to create a wrapper to configure the appropriate LD_LIBRARY_PATH based on mapred.cache.localFiles. Therefore, it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH.
          Hide
          Amareshwari Sriramadasu added a comment -

          The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache.

          The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented.

          it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH.

          Christian, do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles?

          Show
          Amareshwari Sriramadasu added a comment - The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache. The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented. it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH. Christian, do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles?
          Hide
          Christian Kunz added a comment -

          The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented.

          We need the support for sure, whether implemented less hacky or not.

          do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles?

          files in mapred.cache.localFiles usually have their own specific subdirectory, i.e. when one of them is a shared library, then it should be sufficient to include its parent directory into LD_LIBRARY_PATH

          Show
          Christian Kunz added a comment - The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented. We need the support for sure, whether implemented less hacky or not. do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles? files in mapred.cache.localFiles usually have their own specific subdirectory, i.e. when one of them is a shared library, then it should be sufficient to include its parent directory into LD_LIBRARY_PATH
          Hide
          Amareshwari Sriramadasu added a comment -

          Here is a patch adding task's cwd and parents of mapred.cache.localFiles to the LD_LIBRARY_PATH. The patch is tested with simple c program which loads a shared library.

          Show
          Amareshwari Sriramadasu added a comment - Here is a patch adding task's cwd and parents of mapred.cache.localFiles to the LD_LIBRARY_PATH. The patch is tested with simple c program which loads a shared library.
          Hide
          Alejandro Abdelnur added a comment -

          @@ -170,6 +175,8 @@

          There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property.

          @@ -387,9 +394,15 @@

          You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one.

          You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir.

          For the separator make sure you use 'System.getProperty("path.separator")' instead ':'

          Show
          Alejandro Abdelnur added a comment - @@ -170,6 +175,8 @@ There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property. @@ -387,9 +394,15 @@ You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one. You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir. For the separator make sure you use 'System.getProperty("path.separator")' instead ':'
          Hide
          Amareshwari Sriramadasu added a comment -

          You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one.

          In org.apache.hadoop.util.Shell.java, the environment passed is set using

                builder.environment().putAll(this.environment);
          

          This (http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ProcessBuilder.html#environment()) say that the a process builder is created, the environment is initialized to a copy of the current process environment. And the returned object may be modified using ordinary Map operations. So, Map.putAll will add the new environment to the existing environment, will not replace.

          Show
          Amareshwari Sriramadasu added a comment - You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one. In org.apache.hadoop.util.Shell.java, the environment passed is set using builder.environment().putAll( this .environment); This ( http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ProcessBuilder.html#environment( )) say that the a process builder is created, the environment is initialized to a copy of the current process environment. And the returned object may be modified using ordinary Map operations. So, Map.putAll will add the new environment to the existing environment, will not replace.
          Hide
          Amareshwari Sriramadasu added a comment -

          Here is patch after incorporating review comments.

          @@ -170,6 +175,8 @@ There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property.

          On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles

          You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir.

          LD_LIBRARY_PATH is honored by Cygwin, though not by windows. And we support only Cygwin. So, we dont need a check for windows explicitly for LD_LIBRARY_PATH environment variable.

          Show
          Amareshwari Sriramadasu added a comment - Here is patch after incorporating review comments. @@ -170,6 +175,8 @@ There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property. On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir. LD_LIBRARY_PATH is honored by Cygwin, though not by windows. And we support only Cygwin. So, we dont need a check for windows explicitly for LD_LIBRARY_PATH environment variable.
          Hide
          Arun C Murthy added a comment -

          On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles

          +1

          Christian, does this work for you?

          Show
          Arun C Murthy added a comment - On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles +1 Christian, does this work for you?
          Hide
          Christian Kunz added a comment -

          I only asked for pre-configuring LD_LIBRARY_PATH by the framework for the convenience of the application programmer not to have to write a wrapper when the executable needs some shared libraries not already deployed on the nodes.
          No big deal.

          Show
          Christian Kunz added a comment - I only asked for pre-configuring LD_LIBRARY_PATH by the framework for the convenience of the application programmer not to have to write a wrapper when the executable needs some shared libraries not already deployed on the nodes. No big deal.
          Hide
          Christian Kunz added a comment -

          Arun just told me about mapred.create.symlink I was not aware about.
          All my comments in this issue can be completely disregarded.

          Show
          Christian Kunz added a comment - Arun just told me about mapred.create.symlink I was not aware about. All my comments in this issue can be completely disregarded.
          Hide
          Christian Kunz added a comment -

          +1 for the patch

          Show
          Christian Kunz added a comment - +1 for the patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382373/patch-2867.txt
          against trunk revision 658704.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382373/patch-2867.txt against trunk revision 658704. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amareshwari!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #499 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/499/ )
          Hide
          Amareshwari Sriramadasu added a comment -

          Testing this feature requires a shared library, which has to be copied to DistributedCache. Creating a sharedlib is platform dependent and hence I havent put a testcase for this. This change affects pipes and streaming also, since the environment of pipes and streaming tasks comes from the parent java tasks.
          I tested this feature by running a streaming application which loads shared library.

          Show
          Amareshwari Sriramadasu added a comment - Testing this feature requires a shared library, which has to be copied to DistributedCache. Creating a sharedlib is platform dependent and hence I havent put a testcase for this. This change affects pipes and streaming also, since the environment of pipes and streaming tasks comes from the parent java tasks. I tested this feature by running a streaming application which loads shared library.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development