Issue Details (XML | Word | Printable)

Key: HADOOP-2867
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Amareshwari Sriramadasu
Reporter: Arun C Murthy
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Add a task's cwd to it's LD_LIBRARY_PATH

Created: 21/Feb/08 05:14 AM   Updated: 08/Jul/09 04:52 PM
Component/s: None
Affects Version/s: 0.17.0
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works patch-2867.txt 2008-05-20 11:43 AM Amareshwari Sriramadasu 3 kB
Text File Licensed for inclusion in ASF works patch-2867.txt 2008-05-20 05:00 AM Amareshwari Sriramadasu 3 kB
Text File Licensed for inclusion in ASF works patch-2867.txt 2008-05-15 12:01 PM Amareshwari Sriramadasu 2 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Added task's cwd to its LD_LIBRARY_PATH.
Resolution Date: 22/May/08 04:16 AM


 Description  « Hide
HADOOP-1660 added the task's cwd to it's java.library.path which means only java task's can take advantage via System.load or System.loadLibrary... we should enhance it to support Hadoop Pipes applications by adding it to the LD_LIBRARY_PATH so they can use dlopen/dlsym etc.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Amareshwari Sriramadasu added a comment - 15/May/08 12:01 PM
Preliminary patch for adding LD_LIBRARY_PATH. Meanwhile I will do the testing.

Hadoop QA added a comment - 15/May/08 02:23 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12382104/patch-2867.txt
against trunk revision 656585.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2479/console

This message is automatically generated.


Arun C Murthy added a comment - 15/May/08 03:45 PM
Amareshwari, please do not blindly add it to the environment - you need to append cwd to the existing LD_LIBRARY_PATH, especially in-light of HADOOP-2838.

Arun C Murthy added a comment - 15/May/08 03:46 PM
Also, please ensure you have a test-case for this. Thanks!

Arun C Murthy added a comment - 16/May/08 04:15 PM
Amareshwari, while you working on this I think it would be great to fix HADOOP-3828 too! smile

Christian Kunz added a comment - 16/May/08 05:14 PM
The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache. Some of them might be shared libraries. So far, one has to create a wrapper to configure the appropriate LD_LIBRARY_PATH based on mapred.cache.localFiles.
Therefore, it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH.

Amareshwari Sriramadasu added a comment - 19/May/08 09:03 AM

The -program option in pipes command line allows to add files beyond the executable (comma separated) and they will all end up in the local cache.

The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented.

it would be nice if the task could be configured to add all parent directories of mapred.cache.localFiles to LD_LIBRARY_PATH.

Christian, do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles?


Christian Kunz added a comment - 19/May/08 01:56 PM

The -program option in pipes is expected to take only executable URI. Comma separated files ending up in local cache looks like a hack. Should this be supported? Either we support or not, I think this needs to be documented.

We need the support for sure, whether implemented less hacky or not.

do you mean the immediate parents of all mapred.cache.localFiles or the whole parent hierarchy of mapred.cache.localFiles?

files in mapred.cache.localFiles usually have their own specific subdirectory, i.e. when one of them is a shared library, then it should be sufficient to include its parent directory into LD_LIBRARY_PATH


Amareshwari Sriramadasu added a comment - 20/May/08 05:00 AM
Here is a patch adding task's cwd and parents of mapred.cache.localFiles to the LD_LIBRARY_PATH. The patch is tested with simple c program which loads a shared library.

Alejandro Abdelnur added a comment - 20/May/08 05:51 AM
@@ -170,6 +175,8 @@

There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property.

@@ -387,9 +394,15 @@

You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one.

You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir.

For the separator make sure you use 'System.getProperty("path.separator")' instead ':'


Amareshwari Sriramadasu added a comment - 20/May/08 06:42 AM

You should not start with an empty Map for the environment, that will trash the existing environment. You should get from ProcessBuilder the Map with the current environment and modify that one.

In org.apache.hadoop.util.Shell.java, the environment passed is set using

builder.environment().putAll(this.environment);

This (http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ProcessBuilder.html#environment()) say that the a process builder is created, the environment is initialized to a copy of the current process environment. And the returned object may be modified using ordinary Map operations. So, Map.putAll will add the new environment to the existing environment, will not replace.


Amareshwari Sriramadasu added a comment - 20/May/08 11:43 AM
Here is patch after incorporating review comments.

@@ -170,6 +175,8 @@ There is no need to add the native libs that have been softlinked to the LD_LIBRARY_PATH, they are already being added to the 'java.library.path' property.

On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles

You should get the current value of LD_LIBRARY_PATH (actually to do it properly you should check if you are in windows or unix and get the proper env variable) and append to it the current working dir.

LD_LIBRARY_PATH is honored by Cygwin, though not by windows. And we support only Cygwin. So, we dont need a check for windows explicitly for LD_LIBRARY_PATH environment variable.


Arun C Murthy added a comment - 20/May/08 05:01 PM

On second thought, adding the directories that may have stuff that is not native libs at all, kind of does not make senese to add them blindly. Since the users can provide symlinks in the cwd for the cached libraries. This is not necessary. I removed the code which adds parent directories of all mapred.cache.localFiles

+1

Christian, does this work for you?


Christian Kunz added a comment - 20/May/08 08:13 PM
I only asked for pre-configuring LD_LIBRARY_PATH by the framework for the convenience of the application programmer not to have to write a wrapper when the executable needs some shared libraries not already deployed on the nodes.
No big deal.

Christian Kunz added a comment - 20/May/08 09:41 PM
Arun just told me about mapred.create.symlink I was not aware about.
All my comments in this issue can be completely disregarded.

Christian Kunz added a comment - 20/May/08 09:41 PM
+1 for the patch

Hadoop QA added a comment - 21/May/08 06:20 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12382373/patch-2867.txt
against trunk revision 658704.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2507/console

This message is automatically generated.


Devaraj Das added a comment - 22/May/08 04:16 AM
I just committed this. Thanks, Amareshwari!

Hudson added a comment - 22/May/08 12:24 PM

Amareshwari Sriramadasu added a comment - 30/May/08 04:52 AM
Testing this feature requires a shared library, which has to be copied to DistributedCache. Creating a sharedlib is platform dependent and hence I havent put a testcase for this. This change affects pipes and streaming also, since the environment of pipes and streaming tasks comes from the parent java tasks.
I tested this feature by running a streaming application which loads shared library.