Hadoop Common
  1. Hadoop Common
  2. HADOOP-6056

Use java.net.preferIPv4Stack to force IPv4

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This was mentioned on HADOOP-3427, there is a property, java.net.preferIPv4Stack, which you set to true for the java net process to switch to IPv4 everywhere.

      As Hadoop doesn't work on IPv6, this should be set to true in the startup scripts. Hopefully this will ensure that Jetty will also pick it up.

      1. hadoop-6056.txt
        0.7 kB
        Todd Lipcon
      2. HADOOP-6056.patch
        1 kB
        Michele Catasta

        Issue Links

          Activity

          Hide
          steve_l added a comment -

          Correction, HADOOP-3437

          Show
          steve_l added a comment - Correction, HADOOP-3437
          Hide
          Todd Lipcon added a comment -

          Trivial patch. No tests included. Manual test: start a daemon, check ps to make sure the flag got passed.

          Show
          Todd Lipcon added a comment - Trivial patch. No tests included. Manual test: start a daemon, check ps to make sure the flag got passed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426841/hadoop-6056.txt
          against trunk revision 886645.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426841/hadoop-6056.txt against trunk revision 886645. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/166/console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          -1 tests included

          This is a modification to the shell wrapper scripts, and thus isn't under the domain of unit tests.

          Manual testing is slightly tricky, as on a well confgured cluster this won't have any particular impact. Steve, as the original reporter of the issue, do you have a system that you could test this on?

          Show
          Todd Lipcon added a comment - -1 tests included This is a modification to the shell wrapper scripts, and thus isn't under the domain of unit tests. Manual testing is slightly tricky, as on a well confgured cluster this won't have any particular impact. Steve, as the original reporter of the issue, do you have a system that you could test this on?
          Hide
          steve_l added a comment -

          I have this problem on recent linux distros that don't let you turn IPv6 off, happens in my own code often enough to cause problems. ( http://jira.smartfrog.org/jira/browse/SFOS-1182 ) . It's worse on Ubuntu as they always insert the local hostname mapped to localhost into /etc/hosts; if that is bonded to IPv6 then not only do you get services coming up on localhost only, they come up on IPv6 localhost, which is a worse kind of useless:

          I can now declare whether jetty can come up on IPv6 as part of the config, let me rollback when it doesnt yet gets some IPv6 address.

          We ought to have unit tests for shell scripts, I've mentioned that before. It's just not enough to exec the scripts, we need to make sure that IPv6 doesn't surface down the line.

          Show
          steve_l added a comment - I have this problem on recent linux distros that don't let you turn IPv6 off, happens in my own code often enough to cause problems. ( http://jira.smartfrog.org/jira/browse/SFOS-1182 ) . It's worse on Ubuntu as they always insert the local hostname mapped to localhost into /etc/hosts; if that is bonded to IPv6 then not only do you get services coming up on localhost only, they come up on IPv6 localhost, which is a worse kind of useless: I can now declare whether jetty can come up on IPv6 as part of the config, let me rollback when it doesnt yet gets some IPv6 address. We ought to have unit tests for shell scripts, I've mentioned that before. It's just not enough to exec the scripts, we need to make sure that IPv6 doesn't surface down the line.
          Hide
          Todd Lipcon added a comment -

          Steve, any chance you've had a chance to try this patch to see if it resolves the problems for you?

          I agree that these things should be tested better down the road, but for now let's not let that large project block a simple one-line patch

          Show
          Todd Lipcon added a comment - Steve, any chance you've had a chance to try this patch to see if it resolves the problems for you? I agree that these things should be tested better down the road, but for now let's not let that large project block a simple one-line patch
          Hide
          steve_l added a comment -

          I will have a look at this, thank's Todd. I could start the system on the command line and then use jps -v to verify that the right permissions were set

          Show
          steve_l added a comment - I will have a look at this, thank's Todd. I could start the system on the command line and then use jps -v to verify that the right permissions were set
          Hide
          Tom White added a comment -

          Moving to open while this is verified manually.

          What would need to be done to allow Hadoop to work with IPv6?

          Show
          Tom White added a comment - Moving to open while this is verified manually. What would need to be done to allow Hadoop to work with IPv6?
          Hide
          steve_l added a comment -

          >What would need to be done to allow Hadoop to work with IPv6?

          1. a valid reason. Within a datacentre, there's a cost to IPv6, as your packet size just got bigger. Most people use 10. subnets except for those of us with more than one Class A subnet to hand.
          2. it will complicate your test and release process setup no-end, as you now need to test on both network families.
          Show
          steve_l added a comment - >What would need to be done to allow Hadoop to work with IPv6? a valid reason. Within a datacentre, there's a cost to IPv6, as your packet size just got bigger. Most people use 10. subnets except for those of us with more than one Class A subnet to hand. it will complicate your test and release process setup no-end, as you now need to test on both network families.
          Hide
          steve_l added a comment -

          One thing that is triggered by IPv6 is HADOOP-3619

          Show
          steve_l added a comment - One thing that is triggered by IPv6 is HADOOP-3619
          Hide
          steve_l added a comment -

          With the proposed HADOOP-6474 diagnostics feature, we could automate the test for this; look for the JVM property in the output of the diagnostics .

          Show
          steve_l added a comment - With the proposed HADOOP-6474 diagnostics feature, we could automate the test for this; look for the JVM property in the output of the diagnostics .
          Hide
          Michele Catasta added a comment -

          net.ipv6.bindv6only = 1 is currently breaking Java networking:
          http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560044
          http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560056

          To let Hadoop run correctly on Debian Squeeze, I needed both Todd's patch and the workaround proposed in that bug report.
          Attaching a trivial patch that checks the parameter value, and exits gracefully if it's 1.
          You can comment out the exit part and leave it as a simple warning, but, given my previous experience, it would be a waste of time. Daemons take 2min in average to come up, and you realize something is not working only after launching an example job.

          Manually tested on Debian Squeeze and Ubuntu 9.10

          Show
          Michele Catasta added a comment - net.ipv6.bindv6only = 1 is currently breaking Java networking: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560044 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560056 To let Hadoop run correctly on Debian Squeeze, I needed both Todd's patch and the workaround proposed in that bug report. Attaching a trivial patch that checks the parameter value, and exits gracefully if it's 1. You can comment out the exit part and leave it as a simple warning, but, given my previous experience, it would be a waste of time. Daemons take 2min in average to come up, and you realize something is not working only after launching an example job. Manually tested on Debian Squeeze and Ubuntu 9.10
          Hide
          Michele Catasta added a comment -

          Re-opening the issue, I mistakenly clicked on "submit a patch" instead of "attach a file". Sorry, got fooled!

          Show
          Michele Catasta added a comment - Re-opening the issue, I mistakenly clicked on "submit a patch" instead of "attach a file". Sorry, got fooled!
          Hide
          steve_l added a comment -

          Michele, I can see this will be trouble.

          1. I think you should open a separate (related) bug for this, or the patch should merge in both changes.
          2. Also, use git diff --no-prefix for creating patches, its what Hudson needs
          3. I would prefer the message to point to a wiki page in wiki.apache.org/Hadoop where we can talk about Hadoop on IPv4, instead of straight to the debian bug. This lets us change the text over time, provide something integrated with the rest of Hadoop.
          Show
          steve_l added a comment - Michele, I can see this will be trouble. I think you should open a separate (related) bug for this, or the patch should merge in both changes. Also, use git diff --no-prefix for creating patches, its what Hudson needs I would prefer the message to point to a wiki page in wiki.apache.org/Hadoop where we can talk about Hadoop on IPv4, instead of straight to the debian bug. This lets us change the text over time, provide something integrated with the rest of Hadoop.
          Hide
          Michele Catasta added a comment -

          Steve, thanks for the feedback.

          1. I attached a diff which integrate also the previous one, so Todd can decide if he wants to push it upstream as a single patch. In case not, we can open a new issue.
          2. Done.
          3. I created http://wiki.apache.org/hadoop/HadoopIPv6 with some placeholder text and changed the link in the patch.

          Show
          Michele Catasta added a comment - Steve, thanks for the feedback. 1. I attached a diff which integrate also the previous one, so Todd can decide if he wants to push it upstream as a single patch. In case not, we can open a new issue. 2. Done. 3. I created http://wiki.apache.org/hadoop/HadoopIPv6 with some placeholder text and changed the link in the patch.
          Hide
          steve_l added a comment -

          OK, I've added more to the Hadoop page, we should mention it in a NoRouteToHostException page too. Workaround details still need to go on that page for anyone willing to explain what to do.

          Michele, if this patch works for you, I am +1 for this patch. There is no point (currently) trying to come up under IPv6, as things fail.

          Show
          steve_l added a comment - OK, I've added more to the Hadoop page, we should mention it in a NoRouteToHostException page too. Workaround details still need to go on that page for anyone willing to explain what to do. Michele, if this patch works for you, I am +1 for this patch. There is no point (currently) trying to come up under IPv6, as things fail.
          Hide
          steve_l added a comment -

          cross linked to/from the existing http://wiki.apache.org/hadoop/NoRouteToHost page. With the text in, this patch looks good to go.

          There is still a problem here, any client app that tries to talk to a hadoop cluster which doesn't use the scripts may have addressing/connectivity problems. Let's wait and see what happens there.

          Show
          steve_l added a comment - cross linked to/from the existing http://wiki.apache.org/hadoop/NoRouteToHost page. With the text in, this patch looks good to go. There is still a problem here, any client app that tries to talk to a hadoop cluster which doesn't use the scripts may have addressing/connectivity problems. Let's wait and see what happens there.
          Hide
          Todd Lipcon added a comment -

          The "exit 1" seems pretty aggressive. Is there no case that one could use Hadoop and it would work with this sysctl set? On my system, I can confirm that clients don't work with it set, but I want to be awfully sure before adding an exit 1 in the wrapper scripts.

          [here's a transcript that shows that it really does break clients]

          todd@todd-laptop:~$ sudo sysctl net.ipv6.bindv6only=1
          net.ipv6.bindv6only = 1
          todd@todd-laptop:~$ hadoop fs -ls /
          10/01/26 19:33:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
          ^Ctodd@todd-laptop:~$ sudo sysctl net.ipv6.bindv6only=0
          net.ipv6.bindv6only = 0
          todd@todd-laptop:~$ hadoop fs -ls /
          Found 2 items
          drwxr-xr-x   - todd supergroup          0 2010-01-26 15:06 /hadoop
          drwxr-xr-x   - todd supergroup          0 2010-01-26 16:23 /test
          
          Show
          Todd Lipcon added a comment - The "exit 1" seems pretty aggressive. Is there no case that one could use Hadoop and it would work with this sysctl set? On my system, I can confirm that clients don't work with it set, but I want to be awfully sure before adding an exit 1 in the wrapper scripts. [here's a transcript that shows that it really does break clients] todd@todd-laptop:~$ sudo sysctl net.ipv6.bindv6only=1 net.ipv6.bindv6only = 1 todd@todd-laptop:~$ hadoop fs -ls / 10/01/26 19:33:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s). ^Ctodd@todd-laptop:~$ sudo sysctl net.ipv6.bindv6only=0 net.ipv6.bindv6only = 0 todd@todd-laptop:~$ hadoop fs -ls / Found 2 items drwxr-xr-x - todd supergroup 0 2010-01-26 15:06 /hadoop drwxr-xr-x - todd supergroup 0 2010-01-26 16:23 /test
          Hide
          Michele Catasta added a comment -

          I was about to send the patch with exit 1 commented out, then I decided just to write on my Jira comment that you can get rid of it.
          It's difficult to know if they will solve the bug before 0.21, although the Debian netbase maintainer already admits that it won't be trivial: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560238#154
          OTOH, sooner or later it will be fixed (and, on the positive side, Ubuntu 10.04 still relies on netbase 4.35).

          I tried to reproduce what I saw in your transcript (my environment: debian testing, netbase 4.40, 2.6.32-trunk-amd64).
          Clients work, DFS interface and file preview works, example job fails:

          10/01/27 13:34:14 WARN mapred.JobClient: Error reading task output http://MY_NODE:50060/tasklog?plaintext=true&taskid=attempt_201001271246_0002_m_000387_0&filter=stdout
          10/01/27 13:34:14 WARN mapred.JobClient: Error reading task output http://MY_NODE:50060/tasklog?plaintext=true&taskid=attempt_201001271246_0002_m_000387_0&filter=stderr
          10/01/27 13:34:29 INFO mapred.JobClient: Task Id : attempt_201001271246_0002_r_000031_0, Status : FAILED
          java.io.IOException: Task process exit with nonzero status of 1.
          	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
          

          That's why I think it would be a waste of time in some cases: you wait till the NN exits safe mode, launch your job... and it fails unexpectedly.

          Show
          Michele Catasta added a comment - I was about to send the patch with exit 1 commented out, then I decided just to write on my Jira comment that you can get rid of it. It's difficult to know if they will solve the bug before 0.21, although the Debian netbase maintainer already admits that it won't be trivial: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=560238#154 OTOH, sooner or later it will be fixed (and, on the positive side, Ubuntu 10.04 still relies on netbase 4.35). I tried to reproduce what I saw in your transcript (my environment: debian testing, netbase 4.40, 2.6.32-trunk-amd64). Clients work, DFS interface and file preview works, example job fails: 10/01/27 13:34:14 WARN mapred.JobClient: Error reading task output http: //MY_NODE:50060/tasklog?plaintext= true &taskid=attempt_201001271246_0002_m_000387_0&filter=stdout 10/01/27 13:34:14 WARN mapred.JobClient: Error reading task output http: //MY_NODE:50060/tasklog?plaintext= true &taskid=attempt_201001271246_0002_m_000387_0&filter=stderr 10/01/27 13:34:29 INFO mapred.JobClient: Task Id : attempt_201001271246_0002_r_000031_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) That's why I think it would be a waste of time in some cases: you wait till the NN exits safe mode, launch your job... and it fails unexpectedly.
          Hide
          Todd Lipcon added a comment -

          Hi Michele,

          I think I'd be more comfortable with this going in if there were an environment variable to override the check and force the script to continue even if there bad setting is present. Perhaps something like $HADOOP_ALLOW_IPV6 ?

          Given that you managed to get a bit farther than I did, I don't think we know for sure that it's not worth giving it a go at all. There may be some users who can run with this on, and since it's a systemwide setting it's worth allowing users the chance to try if they're unable to change the sysctl.

          Show
          Todd Lipcon added a comment - Hi Michele, I think I'd be more comfortable with this going in if there were an environment variable to override the check and force the script to continue even if there bad setting is present. Perhaps something like $HADOOP_ALLOW_IPV6 ? Given that you managed to get a bit farther than I did, I don't think we know for sure that it's not worth giving it a go at all. There may be some users who can run with this on, and since it's a systemwide setting it's worth allowing users the chance to try if they're unable to change the sysctl.
          Hide
          Michele Catasta added a comment -

          Hi Todd,

          +1 on the trade-off you suggested. I added HADOOP_ALLOW_IPV6 in the attached patch.

          Should we also let it manage the preferIPv4Stack option? (e.g. if ALLOW, then preferIPv4stack=false). As of today, it won't be of any benefit.

          As a side note, I've no idea how the final merging of the hadoop-env.sh.template files happens, so I just modified the one in hadoop-common.

          Show
          Michele Catasta added a comment - Hi Todd, +1 on the trade-off you suggested. I added HADOOP_ALLOW_IPV6 in the attached patch. Should we also let it manage the preferIPv4Stack option? (e.g. if ALLOW, then preferIPv4stack=false). As of today, it won't be of any benefit. As a side note, I've no idea how the final merging of the hadoop-env.sh.template files happens, so I just modified the one in hadoop-common.
          Hide
          steve_l added a comment -

          michele, can you resubmit the patch with the .patch extension? Don't worry about versioning, jira should recognise it as being later

          Show
          steve_l added a comment - michele, can you resubmit the patch with the .patch extension? Don't worry about versioning, jira should recognise it as being later
          Hide
          Michele Catasta added a comment -

          Steve,
          thanks for the feedback. I didn't realize that my last patch was not correctly attached. Seems like JIRA got lost with the ".2" suffix.

          Re-attaching the latest patch now, getting rid of my old ones.

          Show
          Michele Catasta added a comment - Steve, thanks for the feedback. I didn't realize that my last patch was not correctly attached. Seems like JIRA got lost with the ".2" suffix. Re-attaching the latest patch now, getting rid of my old ones.
          Hide
          Steve Loughran added a comment -

          resubmitting

          Show
          Steve Loughran added a comment - resubmitting
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12439189/HADOOP-6056.patch
          against trunk revision 959501.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439189/HADOOP-6056.patch against trunk revision 959501. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/600/console This message is automatically generated.
          Hide
          Michele Catasta added a comment -

          bash-only patch: lack of javadoc is self-explanatory.
          @Steve, @Todd: could you please state if you tested this patch in your cluster as well?

          Show
          Michele Catasta added a comment - bash-only patch: lack of javadoc is self-explanatory. @Steve, @Todd: could you please state if you tested this patch in your cluster as well?
          Hide
          Steve Loughran added a comment -

          I don't start my machines via the shell scripts, so can't say whether or not it works; i've tweaked how my own JVMs start and added some in-JVM tests for IPV4

          looking at the code, my main concern is what happens if there isn't an /sbin/sysctl on the path, like on cygwin, OS/X or other systems. We probably need a check round there to look for that function before calling it...

          Show
          Steve Loughran added a comment - I don't start my machines via the shell scripts, so can't say whether or not it works; i've tweaked how my own JVMs start and added some in-JVM tests for IPV4 looking at the code, my main concern is what happens if there isn't an /sbin/sysctl on the path, like on cygwin, OS/X or other systems. We probably need a check round there to look for that function before calling it...
          Hide
          Todd Lipcon added a comment -

          @Steve, @Todd: could you please state if you tested this patch in your cluster as well?

          I've tested on my laptop, but not out in the wide world of varied deployments.

          Show
          Todd Lipcon added a comment - @Steve, @Todd: could you please state if you tested this patch in your cluster as well? I've tested on my laptop, but not out in the wide world of varied deployments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12439189/HADOOP-6056.patch
          against trunk revision 1031422.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439189/HADOOP-6056.patch against trunk revision 1031422. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/73//console This message is automatically generated.
          Hide
          Michele Catasta added a comment -

          Quoting a comment Todd made a few months ago: "this is a modification to the shell wrapper scripts, and thus isn't under the domain of unit tests."
          Patch should be still good to go.

          Show
          Michele Catasta added a comment - Quoting a comment Todd made a few months ago: "this is a modification to the shell wrapper scripts, and thus isn't under the domain of unit tests." Patch should be still good to go.
          Hide
          Steve Loughran added a comment -

          I see that the Facebook scrips have this option set in hadoop-env.sh, and point to a sun bug which causes sockets to hang when being opened on some linux systems as the reason for the setting.

          1. work-around for http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6483406
            export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

          Given that FB are running large hadoop clusters, we know the java.net option does work at scale, the only issue is is everyone happy with this patch going in to 0.22?

          Show
          Steve Loughran added a comment - I see that the Facebook scrips have this option set in hadoop-env.sh, and point to a sun bug which causes sockets to hang when being opened on some linux systems as the reason for the setting. work-around for http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6483406 export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" Given that FB are running large hadoop clusters, we know the java.net option does work at scale, the only issue is is everyone happy with this patch going in to 0.22?
          Hide
          dhruba borthakur added a comment -

          We at FB have been using -Djava.net.preferIPv4Stack=true since the beginning of time

          while on this topic, we also use -XX:+UseMembar to avoid other JVM bugs.

          Show
          dhruba borthakur added a comment - We at FB have been using -Djava.net.preferIPv4Stack=true since the beginning of time while on this topic, we also use -XX:+UseMembar to avoid other JVM bugs.
          Hide
          Todd Lipcon added a comment -

          I tested this patch a few ways:

          • echoed HADOOP_OPTS to verify new parameter is there
          • changed the sysctl path to an invalid one to simulate what happens on a system without sysctl - no error is printed or anything, just skips through due to 2>/dev/null
          • changed my sysctl value to 1 and verified error message printed
          • changed sysctl back to 0 and verified hadoop script still runs

          Unless I hear otherwise I plan to commit this in the next few days

          Show
          Todd Lipcon added a comment - I tested this patch a few ways: echoed HADOOP_OPTS to verify new parameter is there changed the sysctl path to an invalid one to simulate what happens on a system without sysctl - no error is printed or anything, just skips through due to 2>/dev/null changed my sysctl value to 1 and verified error message printed changed sysctl back to 0 and verified hadoop script still runs Unless I hear otherwise I plan to commit this in the next few days
          Hide
          Jakob Homan added a comment -

          Todd, are you then +1ing HADOOP-6056.patch from 2010-03-18 12:47 PM? If so, please note it.

          Show
          Jakob Homan added a comment - Todd, are you then +1ing HADOOP-6056 .patch from 2010-03-18 12:47 PM? If so, please note it.
          Hide
          Todd Lipcon added a comment -

          yes, +1.

          Show
          Todd Lipcon added a comment - yes, +1.
          Hide
          Konstantin Shvachko added a comment -

          +1 Marking it for 0.22. Do we need to make any related changes to hdfs and/or mapred scripts as well? Todd, did you have a chance to test this? May be start NN / DN with ipv6 on and get the error message?

          Show
          Konstantin Shvachko added a comment - +1 Marking it for 0.22. Do we need to make any related changes to hdfs and/or mapred scripts as well? Todd, did you have a chance to test this? May be start NN / DN with ipv6 on and get the error message?
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thank you Michele.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thank you Michele.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-22-branch #17 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/17/)
          HADOOP-6056. Merge -r 1062542:1062543 from trunk to branch 0.22.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-22-branch #17 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/17/ ) HADOOP-6056 . Merge -r 1062542:1062543 from trunk to branch 0.22.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #585 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/585/)
          HADOOP-6056. Use java.net.preferIPv4Stack to force IPv4. Contributed by Michele Catasta.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #585 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/585/ ) HADOOP-6056 . Use java.net.preferIPv4Stack to force IPv4. Contributed by Michele Catasta.
          Hide
          Steve Loughran added a comment -

          My main worry is what happens on cygwin, whether there is a /sbin/sysctl and what it says -that stuff may break.

          Does anyone build/test the trunk on a windows VM?

          Show
          Steve Loughran added a comment - My main worry is what happens on cygwin, whether there is a /sbin/sysctl and what it says -that stuff may break. Does anyone build/test the trunk on a windows VM?
          Hide
          Todd Lipcon added a comment -

          I didn't test on a VM, but I did edit the code to point to /sbin/sdfjsidjfs to see what happens. It gets happily ignored - the 2>/dev/null hides the error output.

          Show
          Todd Lipcon added a comment - I didn't test on a VM, but I did edit the code to point to /sbin/sdfjsidjfs to see what happens. It gets happily ignored - the 2>/dev/null hides the error output.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/ )

            People

            • Assignee:
              Michele Catasta
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development