Hadoop Common
  1. Hadoop Common
  2. HADOOP-6311

Add support for unix domain sockets to JNI libs

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: None
    • Component/s: native
    • Labels:
      None
    • Target Version/s:

      Description

      For HDFS-347 we need to use unix domain sockets. This JIRA is to include a library in common which adds a o.a.h.net.unix package based on the code from Android (apache 2 license)

      1. HADOOP-6311.028.patch
        118 kB
        Colin Patrick McCabe
      2. HADOOP-6311.027.patch
        118 kB
        Colin Patrick McCabe
      3. HADOOP-6311.024.patch
        118 kB
        Colin Patrick McCabe
      4. HADOOP-6311.023.patch
        116 kB
        Colin Patrick McCabe
      5. design.txt
        16 kB
        Colin Patrick McCabe
      6. HADOOP-6311.022.patch
        144 kB
        Colin Patrick McCabe
      7. HADOOP-6311.021.patch
        137 kB
        Colin Patrick McCabe
      8. HADOOP-6311.020b.patch
        137 kB
        Colin Patrick McCabe
      9. HADOOP-6311.020.patch
        137 kB
        Colin Patrick McCabe
      10. HADOOP-6311.018.patch
        126 kB
        Colin Patrick McCabe
      11. HADOOP-6311.016.patch
        126 kB
        Colin Patrick McCabe
      12. HADOOP-6311.014.patch
        125 kB
        Colin Patrick McCabe
      13. HADOOP-6311-1.patch
        77 kB
        Brock Noland
      14. HADOOP-6311-0.patch
        76 kB
        Brock Noland
      15. 6311-trunk-inprogress.txt
        72 kB
        Todd Lipcon
      16. hadoop-6311.txt
        1.72 MB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Attaching preliminary patch. The patch is large due to a lot of autotools-generated files.

          Most of this code comes from Android, with some modifications (added timeouts, got rid of android-specific stuff, cleaned up APIs a little)

          Show
          Todd Lipcon added a comment - Attaching preliminary patch. The patch is large due to a lot of autotools-generated files. Most of this code comes from Android, with some modifications (added timeouts, got rid of android-specific stuff, cleaned up APIs a little)
          Hide
          Owen O'Malley added a comment -

          -1

          Forking part of Android into Hadoop, while legal, is a really bad idea. We will become responsible for maintaining the code and will need to track all of the bug fixes that go into it. This is a huge maintenance burden.

          This might be acceptable if you can get the Android project to make releases that we can use. (Or if they split it out into a stand alone project that they also use...)

          For some related history, look at the pain that happened when we used a snapshot build from Apache Commons CLI. We eventually had to roll back to the released version including downgrading all of our usage of the new features.

          Show
          Owen O'Malley added a comment - -1 Forking part of Android into Hadoop, while legal, is a really bad idea. We will become responsible for maintaining the code and will need to track all of the bug fixes that go into it. This is a huge maintenance burden. This might be acceptable if you can get the Android project to make releases that we can use. (Or if they split it out into a stand alone project that they also use...) For some related history, look at the pain that happened when we used a snapshot build from Apache Commons CLI. We eventually had to roll back to the released version including downgrading all of our usage of the new features.
          Hide
          Doug Cutting added a comment -

          Forking is certainly often not an optimal path. That said, forking is quite different than using a snapshot build. In that case we committed the jar. Here, Todd's proposing we commit the source, properly forking. We'd be taking on the maintenance of this code, simply using the Android code as a starting point. That might be warranted if Android is not a project that's in the business of producing independent utility jar file releases, which it does not appear to be.

          If you agree with the above analysis, then the question has little to do with the source of the code but rather with whether Hadoop's codebase should include unix domain socket utilities. My general preference is to refactor later rather than earlier: until we have another use of unix domain sockets within Hadoop, or at least have another expected use, keep the code with its single user in HDFS. Do you have another expected use?

          Show
          Doug Cutting added a comment - Forking is certainly often not an optimal path. That said, forking is quite different than using a snapshot build. In that case we committed the jar. Here, Todd's proposing we commit the source, properly forking. We'd be taking on the maintenance of this code, simply using the Android code as a starting point. That might be warranted if Android is not a project that's in the business of producing independent utility jar file releases, which it does not appear to be. If you agree with the above analysis, then the question has little to do with the source of the code but rather with whether Hadoop's codebase should include unix domain socket utilities. My general preference is to refactor later rather than earlier: until we have another use of unix domain sockets within Hadoop, or at least have another expected use, keep the code with its single user in HDFS. Do you have another expected use?
          Hide
          Todd Lipcon added a comment -

          This might be acceptable if you can get the Android project to make releases that we can use. (Or if they split it out into a stand alone project that they also use...)

          I think there's little chance of this. The original code has (a) some android specific things like a special android socket namespace, (b) reliance on their custom build system, and (c) reliance on a bunch of other utility code elsewhere in Android. This is also such a small part of Android that I am confident they'd have no interest in splitting it out to a separate library - it would just be a waste of time for them.

          Regarding maintenance and bugs, I think if you look at the code you'll see that it's not super complicated; obviously there's always a chance for bugs, especially on different platforms, but I don't see this code evolving much from where it is today (in Android it has had only one edit in the year since it was contributed). Also, since Android targets a very specific platform (and specific nonstandard JVM, even) I don't think we'd see the same bugs they would in practice.

          until we have another use of unix domain sockets within Hadoop, or at least have another expected use, keep the code with its single user in HDFS.

          I put this code in common since that's the easiest integration point for additional native code. If I were to put it in an entirely separate JNI library, we'd have to make changes to (or duplicate) NativeCodeLoader, ask users to install multiple native libraries, add appropriate build infrastructure to the hdfs build, etc. I think it makes most sense to have a single "native-code based optimizations" library that contains these optional extensions in common even if there is currently only a single user.

          Show
          Todd Lipcon added a comment - This might be acceptable if you can get the Android project to make releases that we can use. (Or if they split it out into a stand alone project that they also use...) I think there's little chance of this. The original code has (a) some android specific things like a special android socket namespace, (b) reliance on their custom build system, and (c) reliance on a bunch of other utility code elsewhere in Android. This is also such a small part of Android that I am confident they'd have no interest in splitting it out to a separate library - it would just be a waste of time for them. Regarding maintenance and bugs, I think if you look at the code you'll see that it's not super complicated; obviously there's always a chance for bugs, especially on different platforms, but I don't see this code evolving much from where it is today (in Android it has had only one edit in the year since it was contributed). Also, since Android targets a very specific platform (and specific nonstandard JVM, even) I don't think we'd see the same bugs they would in practice. until we have another use of unix domain sockets within Hadoop, or at least have another expected use, keep the code with its single user in HDFS. I put this code in common since that's the easiest integration point for additional native code. If I were to put it in an entirely separate JNI library, we'd have to make changes to (or duplicate) NativeCodeLoader, ask users to install multiple native libraries, add appropriate build infrastructure to the hdfs build, etc. I think it makes most sense to have a single "native-code based optimizations" library that contains these optional extensions in common even if there is currently only a single user.
          Hide
          Doug Cutting added a comment -

          > I put this code in common since that's the easiest integration point for additional native code.

          Okay, but I'd then vote that we not commit this until either there's another dependency on it that's committed or we're about to commit HDFS-347.

          Longer term, perhaps we'll add native code to the HDFS project and HDFS-347 could become a self-contained HDFS-only patch.

          Show
          Doug Cutting added a comment - > I put this code in common since that's the easiest integration point for additional native code. Okay, but I'd then vote that we not commit this until either there's another dependency on it that's committed or we're about to commit HDFS-347 . Longer term, perhaps we'll add native code to the HDFS project and HDFS-347 could become a self-contained HDFS-only patch.
          Hide
          Doug Cutting added a comment -

          In other words, I view this as a sub-task of HDFS-347. If you agree, perhaps you could convert it?

          Show
          Doug Cutting added a comment - In other words, I view this as a sub-task of HDFS-347 . If you agree, perhaps you could convert it?
          Hide
          Todd Lipcon added a comment -

          Okay, but I'd then vote that we not commit this until either there's another dependency on it that's committed or we're about to commit HDFS-347.

          +1 on not committing until we're ready to do the whole thing.

          However, JIRA won't let me turn it into a subtask of HDFS-347: "Only non-sub-task issues from the same project (HADOOP) can be selected."

          Show
          Todd Lipcon added a comment - Okay, but I'd then vote that we not commit this until either there's another dependency on it that's committed or we're about to commit HDFS-347 . +1 on not committing until we're ready to do the whole thing. However, JIRA won't let me turn it into a subtask of HDFS-347 : "Only non-sub-task issues from the same project (HADOOP) can be selected."
          Hide
          Konstantin Shvachko added a comment -

          I am really hesitant we should commit to Hadoop sources from another project. As Owen mentioned the main concern is the support of the code. You propose to commit MBs of new code, which we don't know nothing about and take the burden of supporting it, if I understood Doug correctly.

          • Ok, so does the purpose justifies the sacrifice?

          Not sure that even 30% improvement in read performance justifies this. Looking at your benchmarks I think the number will go down to 10-20% for MapReduce in general because of data-node overhead, and since not all task reads are local.

          • Are there alternatives?

          You probably need at least to consider alternatives, and explain why they don't work. Seems like Dhruba and Raghu made attempts to turn this to solving problems inside data-node code, which has good chances imo to reach the same results without massive infusion of alien sources.

          Show
          Konstantin Shvachko added a comment - I am really hesitant we should commit to Hadoop sources from another project. As Owen mentioned the main concern is the support of the code. You propose to commit MBs of new code, which we don't know nothing about and take the burden of supporting it, if I understood Doug correctly. Ok, so does the purpose justifies the sacrifice? Not sure that even 30% improvement in read performance justifies this. Looking at your benchmarks I think the number will go down to 10-20% for MapReduce in general because of data-node overhead, and since not all task reads are local. Are there alternatives? You probably need at least to consider alternatives, and explain why they don't work. Seems like Dhruba and Raghu made attempts to turn this to solving problems inside data-node code, which has good chances imo to reach the same results without massive infusion of alien sources.
          Hide
          Todd Lipcon added a comment -

          You propose to commit MBs of new code, which we don't know nothing about and take the burden of supporting it, if I understood Doug correctly.

          As I said above, the patch is huge due to having to rerun autoconf/aclocal/etc. The actual amount of new code in the patch:

           build.xml                                          |   10 +
           .../org/apache/hadoop/net/unix/Credentials.java    |   52 +
           .../apache/hadoop/net/unix/LocalServerSocket.java  |  122 +
           .../org/apache/hadoop/net/unix/LocalSocket.java    |  288 +
           .../apache/hadoop/net/unix/LocalSocketAddress.java |   95 +
           .../apache/hadoop/net/unix/LocalSocketImpl.java    |  499 +
           .../src/org/apache/hadoop/net/unix/JNIHelp.cpp     |   85 +
           .../src/org/apache/hadoop/net/unix/JNIHelp.h       |   76 +
           .../src/org/apache/hadoop/net/unix/Makefile.am     |   51 +
           .../hadoop/net/unix/java_io_FileDescriptor.cpp     |  201 +
           .../org_apache_hadoop_net_unix_LocalSocketImpl.cpp | 1012 +
           .../apache/hadoop/net/unix/TestLocalSockets.java   |  208 +
          

          So, this isn't a tiny patch, but it's certainly not huge either. If you discount comments and whitespace, it's 867 lines of cpp, 628 lines of java (120 are test).

          If you prefer, I can rip out all of the code but the very minimal required, but I don't consider this a particularly large patch.

          As for alternatives, I think many of them were thrown out due to security concerns in HDFS-347. I will certainly look into Raghu's proposed improvement to FSInputChecker, and if that gets the same gains as this, I'm happy to throw out all of the localsockets complication.

          Show
          Todd Lipcon added a comment - You propose to commit MBs of new code, which we don't know nothing about and take the burden of supporting it, if I understood Doug correctly. As I said above, the patch is huge due to having to rerun autoconf/aclocal/etc. The actual amount of new code in the patch: build.xml | 10 + .../org/apache/hadoop/net/unix/Credentials.java | 52 + .../apache/hadoop/net/unix/LocalServerSocket.java | 122 + .../org/apache/hadoop/net/unix/LocalSocket.java | 288 + .../apache/hadoop/net/unix/LocalSocketAddress.java | 95 + .../apache/hadoop/net/unix/LocalSocketImpl.java | 499 + .../src/org/apache/hadoop/net/unix/JNIHelp.cpp | 85 + .../src/org/apache/hadoop/net/unix/JNIHelp.h | 76 + .../src/org/apache/hadoop/net/unix/Makefile.am | 51 + .../hadoop/net/unix/java_io_FileDescriptor.cpp | 201 + .../org_apache_hadoop_net_unix_LocalSocketImpl.cpp | 1012 + .../apache/hadoop/net/unix/TestLocalSockets.java | 208 + So, this isn't a tiny patch, but it's certainly not huge either. If you discount comments and whitespace, it's 867 lines of cpp, 628 lines of java (120 are test). If you prefer, I can rip out all of the code but the very minimal required, but I don't consider this a particularly large patch. As for alternatives, I think many of them were thrown out due to security concerns in HDFS-347 . I will certainly look into Raghu's proposed improvement to FSInputChecker, and if that gets the same gains as this, I'm happy to throw out all of the localsockets complication.
          Hide
          Owen O'Malley added a comment -

          I'll remove my -1, but stick by a -0. I think this is a very bad idea. Forking code out of existing projects is a very dangerous practice and this is a huge chunk of code to be incorporating into Hadoop. This will become critical code in the heart of HDFS that none of us developed. That is troubling.

          I agree that we should not commit this until HDFS-347 is ready to commit.

          I suspect that putting it into HDFS is harder than it is worth, since HDFS doesn't have a native component already.

          Show
          Owen O'Malley added a comment - I'll remove my -1, but stick by a -0. I think this is a very bad idea. Forking code out of existing projects is a very dangerous practice and this is a huge chunk of code to be incorporating into Hadoop. This will become critical code in the heart of HDFS that none of us developed. That is troubling. I agree that we should not commit this until HDFS-347 is ready to commit. I suspect that putting it into HDFS is harder than it is worth, since HDFS doesn't have a native component already.
          Hide
          Konstantin Shvachko added a comment -

          > I will certainly look into Raghu's proposed improvement ...

          Thanks, this would be really good. Let's try what we can with internal solutions then turn to external ones. Is Android the best one? There should be alternatives there as well. Btw., does this library work on Windows, Solaris, etc.?

          I think Dhruba is making a good point in HDFS-347 that until we really understand what causes the slowdown we cannot efficiently fight it.

          On a side note, this project becomes fairly complicated, should we consider branching it?

          Show
          Konstantin Shvachko added a comment - > I will certainly look into Raghu's proposed improvement ... Thanks, this would be really good. Let's try what we can with internal solutions then turn to external ones. Is Android the best one? There should be alternatives there as well. Btw., does this library work on Windows, Solaris, etc.? I think Dhruba is making a good point in HDFS-347 that until we really understand what causes the slowdown we cannot efficiently fight it. On a side note, this project becomes fairly complicated, should we consider branching it?
          Hide
          Sanjay Radia added a comment -

          Todd, have you considered simply providing a jar and c-library from( or is derived from) the the Android code and making it available for hadoop? This way one can track the bug fixes and improvements in original Andriod.

          Show
          Sanjay Radia added a comment - Todd, have you considered simply providing a jar and c-library from( or is derived from) the the Android code and making it available for hadoop? This way one can track the bug fixes and improvements in original Andriod.
          Hide
          Nichole Treadway added a comment -

          Has anyone tested this on the Hadoop 0.20 Append branch? We are trying to build this but haven't been able to get this to successfully build. It always complains about some Zlib dependency issues. We can build Hadoop-0.20-Append without the patch, but it fails when patched with 6311. What are the dependencies for building this?

          Show
          Nichole Treadway added a comment - Has anyone tested this on the Hadoop 0.20 Append branch? We are trying to build this but haven't been able to get this to successfully build. It always complains about some Zlib dependency issues. We can build Hadoop-0.20-Append without the patch, but it fails when patched with 6311. What are the dependencies for building this?
          Hide
          Todd Lipcon added a comment -

          I haven't tried applying the 20-based patch in a long time. Here's a patch against trunk from march or so, which compiles at least on my machine. It may be some work to backport onto 20-append.

          (note, this patch isn't particularly tested at the moment, but should be mostly there)

          Show
          Todd Lipcon added a comment - I haven't tried applying the 20-based patch in a long time. Here's a patch against trunk from march or so, which compiles at least on my machine. It may be some work to backport onto 20-append. (note, this patch isn't particularly tested at the moment, but should be mostly there)
          Hide
          Jason Rutherglen added a comment -

          It may be some work to backport onto 20-append

          I think I'll try to backport it to 20-append in order to test and benchmark with the HBASE-3529. Unless HBase is going to upgrade to a newer Hadoop version soon?

          Show
          Jason Rutherglen added a comment - It may be some work to backport onto 20-append I think I'll try to backport it to 20-append in order to test and benchmark with the HBASE-3529 . Unless HBase is going to upgrade to a newer Hadoop version soon?
          Hide
          Jason Rutherglen added a comment -

          Here's a patch for Hadoop 0.20 append. There's one test failure as noted below.

          ant run-test-core

          Test org.apache.hadoop.metrics2.impl.TestSinkQueue FAILED

          Show
          Jason Rutherglen added a comment - Here's a patch for Hadoop 0.20 append. There's one test failure as noted below. ant run-test-core Test org.apache.hadoop.metrics2.impl.TestSinkQueue FAILED
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12476942/6311-trunk-inprogress.txt
          against trunk revision 1102093.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/437//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476942/6311-trunk-inprogress.txt against trunk revision 1102093. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/437//console This message is automatically generated.
          Hide
          Jason Rutherglen added a comment -

          Sorry, misspoke, the patch option wasn't given in Jira so there's no file, however the patch was mistakenly for Hadoop trunk. The port to 0.20-append is a bit more complex.

          Show
          Jason Rutherglen added a comment - Sorry, misspoke, the patch option wasn't given in Jira so there's no file, however the patch was mistakenly for Hadoop trunk. The port to 0.20-append is a bit more complex.
          Hide
          Todd Lipcon added a comment -

          To reply to a really old comment that I missed:

          Todd, have you considered simply providing a jar and c-library from( or is derived from) the the Android code and making it available for hadoop? This way one can track the bug fixes and improvements in original Andriod.

          The issue is that Android is a giant monolithic project – this code comes from its equivalent of the JDK runtime, so it doesn't have its own standalone .so file or jar. I could certainly take this patch and put it on my github (or cloudera's github) as a standalone Java library "junixsockets" or something. In the past, people have been skeptical of this kind of contribution and have preferred to have code pulled into Hadoop itself.

          Taking the code out of Android and adapting it to our needs is also beneficial in that we can re-use other utility code we already have in Hadoop common. For example, the patch in-progress for trunk reuses the NativeIOException that was developed during the security work.

          Regarding tracking bug fixes from Android, it's worth noting that in the nearly two years since this ticket was filed, there was only one very-minor bug fix in the Android code: http://android.git.kernel.org/?p=platform/frameworks/base.git;a=history;f=core/java/android/net/LocalSocket.java;h=3ee8a80e9968e0bf8c8c8b4d0ee3076476db4ec9;hb=HEAD

          So, I don't really foresee major maintenance issues in duplicating the code.

          Show
          Todd Lipcon added a comment - To reply to a really old comment that I missed: Todd, have you considered simply providing a jar and c-library from( or is derived from) the the Android code and making it available for hadoop? This way one can track the bug fixes and improvements in original Andriod. The issue is that Android is a giant monolithic project – this code comes from its equivalent of the JDK runtime, so it doesn't have its own standalone .so file or jar. I could certainly take this patch and put it on my github (or cloudera's github) as a standalone Java library "junixsockets" or something. In the past, people have been skeptical of this kind of contribution and have preferred to have code pulled into Hadoop itself. Taking the code out of Android and adapting it to our needs is also beneficial in that we can re-use other utility code we already have in Hadoop common. For example, the patch in-progress for trunk reuses the NativeIOException that was developed during the security work. Regarding tracking bug fixes from Android, it's worth noting that in the nearly two years since this ticket was filed, there was only one very-minor bug fix in the Android code: http://android.git.kernel.org/?p=platform/frameworks/base.git;a=history;f=core/java/android/net/LocalSocket.java;h=3ee8a80e9968e0bf8c8c8b4d0ee3076476db4ec9;hb=HEAD So, I don't really foresee major maintenance issues in duplicating the code.
          Hide
          Robert Joseph Evans added a comment -

          For what it is worth I thought it might be good to review the patch as a second opinion on its quality and maintainability. I did not really look at any of the Makefile/Configure changes as they are mostly auto generated. The following are the issues that I came up with.

          LocalSocket.java remove methods with UnsupportedOperationException unless we plan on implementing them.

          LocalSocketImpl.java may be very slow for large byte array reads and writes. I'm not sure how likely this is but if the array is large enough the java splits it into multiple locations then it might copy the data multiple times to comply with the JNI interface. It is not a correctness issues, just potentially a performance issue.

          LocalSocketImpl.java Potential race condition between reads/writes and closes (there is inconsistent locking with fd).

          LocalSocketImpl.java remove supportsUrgentData and sendUrgentData unless we plan on implementing them in the future, similar with getSockAddress()

          LocalSocketImpl.java in getOption I understand having a pattern in place to be able to return different types, but it is always int so can we get rid of the switch or combine the returns together.

          JNIHelp.cpp does not have an apache license in it.

          org_apache_hadoop_net_unix_LocalSocketImpl.cpp in make_sockaddr_un and java_opt_to_real I don't really like it writing to STDERR, it would be better to try and throw an exception instead.

          org_apache_hadoop_net_unix_LocalSocketImpl.cpp in Java_org_apache_hadoop_net_unix_LocalSocketImpl_connectLocal if an Exception occurred when calling jniGetFDFromFileDescriptor then nameUtf8 appears to be leaked.

          org_apache_hadoop_net_unix_LocalSocketImpl.cpp in Java_org_apache_hadoop_net_unix_LocalSocketImpl_bindLocal unlink is called regardless of namespace, and all errors with it are ignored.

          org_apache_hadoop_net_unix_LocalSocketImpl.cpp in JNI_OnLoad still has a reference to android in an commented out error log/

          In the test please address the TODO you left in there about being a timeout exception instead of an IOException

          Show
          Robert Joseph Evans added a comment - For what it is worth I thought it might be good to review the patch as a second opinion on its quality and maintainability. I did not really look at any of the Makefile/Configure changes as they are mostly auto generated. The following are the issues that I came up with. LocalSocket.java remove methods with UnsupportedOperationException unless we plan on implementing them. LocalSocketImpl.java may be very slow for large byte array reads and writes. I'm not sure how likely this is but if the array is large enough the java splits it into multiple locations then it might copy the data multiple times to comply with the JNI interface. It is not a correctness issues, just potentially a performance issue. LocalSocketImpl.java Potential race condition between reads/writes and closes (there is inconsistent locking with fd). LocalSocketImpl.java remove supportsUrgentData and sendUrgentData unless we plan on implementing them in the future, similar with getSockAddress() LocalSocketImpl.java in getOption I understand having a pattern in place to be able to return different types, but it is always int so can we get rid of the switch or combine the returns together. JNIHelp.cpp does not have an apache license in it. org_apache_hadoop_net_unix_LocalSocketImpl.cpp in make_sockaddr_un and java_opt_to_real I don't really like it writing to STDERR, it would be better to try and throw an exception instead. org_apache_hadoop_net_unix_LocalSocketImpl.cpp in Java_org_apache_hadoop_net_unix_LocalSocketImpl_connectLocal if an Exception occurred when calling jniGetFDFromFileDescriptor then nameUtf8 appears to be leaked. org_apache_hadoop_net_unix_LocalSocketImpl.cpp in Java_org_apache_hadoop_net_unix_LocalSocketImpl_bindLocal unlink is called regardless of namespace, and all errors with it are ignored. org_apache_hadoop_net_unix_LocalSocketImpl.cpp in JNI_OnLoad still has a reference to android in an commented out error log/ In the test please address the TODO you left in there about being a timeout exception instead of an IOException
          Hide
          Amr Awadallah added a comment -

          I am out of office on vacation for next couple of days. I will be
          slower than usual in responding to emails. If this is urgent then
          please call my cell phone (or send an SMS), otherwise I will reply to
          your email when I get back.

          Thanks for your patience,

          – amr

          Show
          Amr Awadallah added a comment - I am out of office on vacation for next couple of days. I will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an SMS), otherwise I will reply to your email when I get back. Thanks for your patience, – amr
          Hide
          Todd Lipcon added a comment -

          Thanks for reviewing - I agree some more cleanup and removal of code could be done. The efficiency shouldn't matter, as the intended use case here is really only to transfer open file descriptors and not to actually transfer bulk data.

          If people are on board with this general idea, I'll continue the cleanup, reformat the code to conform to our standards, and deal with the issues pointed out above.

          Show
          Todd Lipcon added a comment - Thanks for reviewing - I agree some more cleanup and removal of code could be done. The efficiency shouldn't matter, as the intended use case here is really only to transfer open file descriptors and not to actually transfer bulk data. If people are on board with this general idea, I'll continue the cleanup, reformat the code to conform to our standards, and deal with the issues pointed out above.
          Hide
          Robert Joseph Evans added a comment -

          There has been no discussion on the patch lately. There are still some potential code cleanup to have happen too so canceling the patch.

          Show
          Robert Joseph Evans added a comment - There has been no discussion on the patch lately. There are still some potential code cleanup to have happen too so canceling the patch.
          Hide
          Brock Noland added a comment -

          Attached is the patch rebased on trunk with some minor modifications. The tests pass.

          Show
          Brock Noland added a comment - Attached is the patch rebased on trunk with some minor modifications. The tests pass.
          Hide
          Todd Lipcon added a comment -

          Thanks for updating this, Brock! A few comments looking over the patch:

          • get_pw_buflen now seems to show up twice in NativeIO.c
          • we need to add interface annotations to the new public classes - I guess LimitedPrivate for HDFS is the right choice
          • in java_opt_to_real, I think we should use javah-produced constants to get the numeric values corresponding to java.net.SocketOptions here. The case statements with magic number ints are a little hard to explain

          If you dont have time to do the above, I can rev the patch

          Show
          Todd Lipcon added a comment - Thanks for updating this, Brock! A few comments looking over the patch: get_pw_buflen now seems to show up twice in NativeIO.c we need to add interface annotations to the new public classes - I guess LimitedPrivate for HDFS is the right choice in java_opt_to_real, I think we should use javah-produced constants to get the numeric values corresponding to java.net.SocketOptions here. The case statements with magic number ints are a little hard to explain If you dont have time to do the above, I can rev the patch
          Hide
          Brock Noland added a comment -

          Agreed on all accounts. I will update the patch this morning and post this afternoon.

          Show
          Brock Noland added a comment - Agreed on all accounts. I will update the patch this morning and post this afternoon.
          Hide
          Brock Noland added a comment -

          Attached is the updated patch!

          Show
          Brock Noland added a comment - Attached is the updated patch!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528428/HADOOP-6311-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.net.unix.TestLocalSockets
          org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1016//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1016//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528428/HADOOP-6311-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.net.unix.TestLocalSockets org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1016//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1016//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          Hi Brock / Todd,

          Are you guys still working on this? If your attention has been taken by other things, I'll post an updated version of this so we can get it in.

          C.

          Show
          Colin Patrick McCabe added a comment - Hi Brock / Todd, Are you guys still working on this? If your attention has been taken by other things, I'll post an updated version of this so we can get it in. C.
          Hide
          Brock Noland added a comment -

          From my perspective, go ahead! FWIW, the tests that failed in TestLocalSockets passed on my RHEL 6.2 machine.

          Show
          Brock Noland added a comment - From my perspective, go ahead! FWIW, the tests that failed in TestLocalSockets passed on my RHEL 6.2 machine.
          Hide
          Colin Patrick McCabe added a comment -

          This patch abstracts out the file descriptor passing code into two classes, FdServer and FdClient. FdServer can publish file descriptors which the FdClient (possibly in another process) can retrieve.

          Unlike UNIX domain sockets, this is not platform-specific. As a side effect, there is no Android-derived code in this patch.

          The FdClient uses a 64-bit random number called a "cookie" to identify the file descriptor it wants to fetch. The idea is that the HDFS Client will receive the cookie from the DataNode via the usual TCP communication. Then, it can be used to fetch the FileDescriptor from the DataNode.

          There are two defenses against a malicious process trying to grab FileDescriptors from the DataNode without authorization: the randomly generated socket path, and the randomly generated 64-bit cookie, which serves as a kind of shared secret. The latter is the stronger defense.

          I also factored out the exception generating code into two separate files, exception.c and native_io_exception.c. The code was formerly integrated into NativeIO.c; however, I did not want to duplicate it. It should prove useful in general for code that needs to raise Java RuntimeException and IOException.

          tree.h is included to implement a red-black tree (similar to a TreeMap in Java). This code is already in the hadoop-hdfs project.

          Finally, there are some pretty extensive unit tests, including a multi-threaded one which really puts the server through its paces.

          Show
          Colin Patrick McCabe added a comment - This patch abstracts out the file descriptor passing code into two classes, FdServer and FdClient . FdServer can publish file descriptors which the FdClient (possibly in another process) can retrieve. Unlike UNIX domain sockets, this is not platform-specific. As a side effect, there is no Android-derived code in this patch. The FdClient uses a 64-bit random number called a "cookie" to identify the file descriptor it wants to fetch. The idea is that the HDFS Client will receive the cookie from the DataNode via the usual TCP communication. Then, it can be used to fetch the FileDescriptor from the DataNode . There are two defenses against a malicious process trying to grab FileDescriptors from the DataNode without authorization: the randomly generated socket path, and the randomly generated 64-bit cookie, which serves as a kind of shared secret. The latter is the stronger defense. I also factored out the exception generating code into two separate files, exception.c and native_io_exception.c . The code was formerly integrated into NativeIO.c; however, I did not want to duplicate it. It should prove useful in general for code that needs to raise Java RuntimeException and IOException . tree.h is included to implement a red-black tree (similar to a TreeMap in Java). This code is already in the hadoop-hdfs project. Finally, there are some pretty extensive unit tests, including a multi-threaded one which really puts the server through its paces.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542580/HADOOP-6311.014.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 12 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.io.nativeio.TestNativeIO
          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1363//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1363//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542580/HADOOP-6311.014.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 12 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.io.nativeio.TestNativeIO org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1363//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1363//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • catch the correct exception in TestNativeIO
          • test a few more edge cases like supplying null to native functions when we're not supposed to, etc.
          Show
          Colin Patrick McCabe added a comment - catch the correct exception in TestNativeIO test a few more edge cases like supplying null to native functions when we're not supposed to, etc.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542648/HADOOP-6311.016.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 12 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1367//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1367//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542648/HADOOP-6311.016.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 12 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1367//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1367//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • rebase on trunk
          • newRuntimeException: don't call va_end unless va_start was called previously.
          Show
          Colin Patrick McCabe added a comment - rebase on trunk newRuntimeException: don't call va_end unless va_start was called previously.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12544539/HADOOP-6311.017.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1430//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544539/HADOOP-6311.017.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1430//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          fix rebase

          Show
          Colin Patrick McCabe added a comment - fix rebase
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12544546/HADOOP-6311.018.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 12 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1433//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1433//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544546/HADOOP-6311.018.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 12 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1433//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1433//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • Use CLOCK_MONOTONIC for timeouts, not wall-clock time. This avoids problems when the local clock changes.
          • Implement timeouts for published file descriptors. This is to handle the case where the client requests a file descriptor, but then crashes before it can be sent.
          • Get rid of the FileDescriptor#incrementAndGetUseCount, etc stuff. It was deadcode.
          • The FdServer now logs debug and info messages using log4j.
          • get rid of waitForServerStartup. It was not needed, and buggy anyway (since writes to a pipe don't block unless the pipe is full.)
          • get rid of STATIC_ASSERT, since it wasn't needed.
          • TestFdServer: uncomment a lot of tests. Add tests for timeouts.
          Show
          Colin Patrick McCabe added a comment - Use CLOCK_MONOTONIC for timeouts, not wall-clock time. This avoids problems when the local clock changes. Implement timeouts for published file descriptors. This is to handle the case where the client requests a file descriptor, but then crashes before it can be sent. Get rid of the FileDescriptor#incrementAndGetUseCount, etc stuff. It was deadcode. The FdServer now logs debug and info messages using log4j. get rid of waitForServerStartup. It was not needed, and buggy anyway (since writes to a pipe don't block unless the pipe is full.) get rid of STATIC_ASSERT, since it wasn't needed. TestFdServer: uncomment a lot of tests. Add tests for timeouts.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546948/HADOOP-6311.020.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 12 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1539//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1539//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546948/HADOOP-6311.020.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 12 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1539//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1539//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Colin - thanks for pushing on this, kinda fell off my radar.

          The original patch from Todd had code derived from Android, is that still the case?

          Show
          Arun C Murthy added a comment - Colin - thanks for pushing on this, kinda fell off my radar. The original patch from Todd had code derived from Android, is that still the case?
          Hide
          Colin Patrick McCabe added a comment -

          This one has no code derived from Android.

          Show
          Colin Patrick McCabe added a comment - This one has no code derived from Android.
          Hide
          Colin Patrick McCabe added a comment -
          • Fix JavaDoc warnings
          Show
          Colin Patrick McCabe added a comment - Fix JavaDoc warnings
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547013/HADOOP-6311.020b.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverController
          org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1542//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1542//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547013/HADOOP-6311.020b.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1542//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1542//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          Incorporate some style changes suggested by Andy Isaacson.

          Show
          Colin Patrick McCabe added a comment - Incorporate some style changes suggested by Andy Isaacson.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548504/HADOOP-6311.021.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1595//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1595//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548504/HADOOP-6311.021.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1595//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1595//console This message is automatically generated.
          Hide
          Andy Isaacson added a comment -

          High level comments:

          • We need a design doc explaining how all of these bits work together. This Jira has gone on long enough that it does not serve as documentation.
          • I didn't review the red-black and splay tree implementations at all. I'm not sure why we expect this to be big/contended enough to deserve anything beyond a trivial hash table, which takes about 20 lines of C. (ah, I see the code comes from *BSD, so that's good at least. We should document where and what version it came from for future maintainers' sanity.)
          +++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/fd_server.h
          ...
          +/**
          + * This file includes some common utilities 
          + * for all native code used in hadoop.
          + */
          

          I don't think this comment is accurate.

          +#include <stdint.h>
          

          Please move #include}}s to the relevant {{.c unless they're needed in the .h directly. Doesn't look like it's needed here.

          +    memset(&addr, 0, sizeof(struct sockaddr_un));
          

          I prefer to say memset(&x, 0, sizeof(x)) so that the code is clearly using the correct size. I don't feel too strongly about this though.

          +    addr.sun_family = AF_UNIX;
          +    if (bind(ud->sock, (struct sockaddr*)&addr, sizeof(sa_family_t)) < 0) {
          

          This seems to be using the Linux-proprietary "abstract namespace". If we do this it should be a Linux-specific name, not "unixDomainSock" which implies that the code is portable to other UNIXes such as Darwin/Mac OS or Solaris or FreeBSD.

          The abstract socket API is documented at http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html

          (If I'm wrong and the abstract sockets are supported by other OSes then great! but I'm pretty sure they're not.)

          Talking to Colin offline we confirmed that abstract sockets are Linux-specific, but he pointed out that unixDomainSockCreateAndBind handles both abstract sockets and named sockets (in the if(jpath) branch). So the name is OK but we need a comment calling out the abstract socket use case. The Linux-specific code will compile OK on other OSes, but it might be useful if the exception message says "your OS requires an explicit path" on non-Linux (using an #ifndef _linux_ perhaps).

          The control flow is a little confusing but not too bad, it could use a comment perhaps something like /* Client requested abstract socket (see unix(7) for details) by setting path = null. */ in the abstract path case.

          +  if (!jpath) {
          ... 20 lines of code
          +  } else {
          ... 10 lines of code
          +  }
          

          I'd reorder them to

          if (jpath) { ... } else { /* !jpath */ ... } 

          as it's one less bit-flip to think about.

          Could you explain the benefits of using abstract sockets? Why is it better than a named socket? Ideally in a code comment near the implementation, or in this Jira.

          +      jthr = newNativeIOException(env, errno, "error autobinding "
          +                                  "PF_UNIX socket: ");
          

          I don't recognize the phrase "autobinding". Is that something specific to abstract sockets? if so it can be described in the documentation.

          +      jthr = newNativeIOException(env, EIO, "getsockname():  "
          +               "autobound abstract socket identifier started with / ");
          

          Most of your newnativeIOException texts end with : " but this one ends with / ". Best to be consistent.

          +jthrowable unixDomainSetupSockaddr(JNIEnv *env, const char *id,
          

          I think this function can be static, right?

          +#define RETRY_ON_EINTR(ret, expr) do { \
          +  ret = expr; \
          +} while ((ret == -1) && (errno == EINTR));
          

          This probably wants a maximum retry count (hardcoding 100 or thereabouts should be fine).

          +static ssize_t safe_write(int fd, const void *b, size_t c)
          +{
          +  int res;
          +
          +  while (c > 0) {
          +    res = write(fd, b, c);
          +    if (res < 0) {
          +      if (errno != EINTR)
          +        return -errno;
          +      continue;
          +    }
          +    c -= res;
          +    b = (char *)b + res;
          
          • I'd use a local char *p = b rather than having the cast in the loop.
          • if write returns a too large value (which "cannot happen" but bugs happen) and c underflows, since it's unsigned the loop will spin forever (2^63 is forever). I'd explicitly check if (res > c) return; before decrementing c.
          • write(2) returns the number written. Seems like we should do that here too. If the user calls safe_write(100), we write 50 and then get ENOSPC, we should return 50 I think. But I'm not sure, maybe that's not the right interface contract here.
          +  if (!cmsg->cmsg_type == SCM_RIGHTS) {
          

          Should be if (cmsg_type != SCM_RIGHTS).

          +  if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout,
          

          no need to cast the pointer, the fourth argument to setsockopt is void *.

          +  /** The raw FileDescriptor */
          +  int rawFd;
          

          I think "raw" means "Posix filedescriptor" here?

          +  pFd->rawFd = dup(rawFd);
          

          Is there any chance we'll see races with fseek on these filedescriptors? Unfortunately two {{dup()}}ed fds share a single file offset pointer. If this is something clients need to worry about we'll have to document it very prominently.

          +  // Unpublish any remaining file descriptors.
          +	RB_FOREACH_SAFE(pFd, publishedFdsByCookie,
          +                  &data->publishedByCookie, pFdTmp) {
          +    RB_REMOVE(publishedFdsByCookie, &data->publishedByCookie, pFd);
          +    publishedFdFree(env, pFd);
          +	}
          

          there's some indentation craziness going on here.

          +static jlong generateRandomCookie(void)
          +{
          +  uint64_t lo = ((uint64_t)mrand48()) & 0xffffffffULL;
          +  uint64_t hi = ((uint64_t)mrand48()) << 32;
          +  return (jlong)(lo | hi); 
          

          Given that this is potentially a security-critical cookie, we should use /dev/urandom.
          (not /dev/random due to the entropy pool blocking issues.)

          +    // See man cmsg3) for more details about how 'ancillary data' works.
          

          missing a ( I think. I would write See cmsg(3) for.

          
          

          Overall the code is pretty solid. I wish we were not open-coding so much JNI argument string stuff, it's just asking for trouble, but I don't have a better option. I'm not entirely convinced that this isn't overdesigned, but I'll wait for the design doc before coming to conclusions.

          Show
          Andy Isaacson added a comment - High level comments: We need a design doc explaining how all of these bits work together. This Jira has gone on long enough that it does not serve as documentation. I didn't review the red-black and splay tree implementations at all. I'm not sure why we expect this to be big/contended enough to deserve anything beyond a trivial hash table, which takes about 20 lines of C. (ah, I see the code comes from *BSD, so that's good at least. We should document where and what version it came from for future maintainers' sanity.) +++ hadoop-common-project/hadoop-common/src/main/ native /src/org/apache/hadoop/net/fd_server.h ... +/** + * This file includes some common utilities + * for all native code used in hadoop. + */ I don't think this comment is accurate. +#include <stdint.h> Please move #include}}s to the relevant {{.c unless they're needed in the .h directly. Doesn't look like it's needed here. + memset(&addr, 0, sizeof(struct sockaddr_un)); I prefer to say memset(&x, 0, sizeof(x)) so that the code is clearly using the correct size. I don't feel too strongly about this though. + addr.sun_family = AF_UNIX; + if (bind(ud->sock, (struct sockaddr*)&addr, sizeof(sa_family_t)) < 0) { This seems to be using the Linux-proprietary "abstract namespace". If we do this it should be a Linux-specific name, not "unixDomainSock" which implies that the code is portable to other UNIXes such as Darwin/Mac OS or Solaris or FreeBSD. The abstract socket API is documented at http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html (If I'm wrong and the abstract sockets are supported by other OSes then great! but I'm pretty sure they're not.) Talking to Colin offline we confirmed that abstract sockets are Linux-specific, but he pointed out that unixDomainSockCreateAndBind handles both abstract sockets and named sockets (in the if(jpath) branch). So the name is OK but we need a comment calling out the abstract socket use case. The Linux-specific code will compile OK on other OSes, but it might be useful if the exception message says "your OS requires an explicit path" on non-Linux (using an #ifndef _ linux _ perhaps). The control flow is a little confusing but not too bad, it could use a comment perhaps something like /* Client requested abstract socket (see unix(7) for details) by setting path = null. */ in the abstract path case. + if (!jpath) { ... 20 lines of code + } else { ... 10 lines of code + } I'd reorder them to if (jpath) { ... } else { /* !jpath */ ... } as it's one less bit-flip to think about. Could you explain the benefits of using abstract sockets? Why is it better than a named socket? Ideally in a code comment near the implementation, or in this Jira. + jthr = newNativeIOException(env, errno, "error autobinding " + "PF_UNIX socket: " ); I don't recognize the phrase "autobinding". Is that something specific to abstract sockets? if so it can be described in the documentation. + jthr = newNativeIOException(env, EIO, "getsockname(): " + "autobound abstract socket identifier started with / " ); Most of your newnativeIOException texts end with : " but this one ends with / " . Best to be consistent. +jthrowable unixDomainSetupSockaddr(JNIEnv *env, const char *id, I think this function can be static, right? +#define RETRY_ON_EINTR(ret, expr) do { \ + ret = expr; \ +} while ((ret == -1) && (errno == EINTR)); This probably wants a maximum retry count (hardcoding 100 or thereabouts should be fine). + static ssize_t safe_write( int fd, const void *b, size_t c) +{ + int res; + + while (c > 0) { + res = write(fd, b, c); + if (res < 0) { + if (errno != EINTR) + return -errno; + continue ; + } + c -= res; + b = ( char *)b + res; I'd use a local char *p = b rather than having the cast in the loop. if write returns a too large value (which "cannot happen" but bugs happen) and c underflows, since it's unsigned the loop will spin forever (2^63 is forever). I'd explicitly check if (res > c) return; before decrementing c. write(2) returns the number written. Seems like we should do that here too. If the user calls safe_write(100), we write 50 and then get ENOSPC, we should return 50 I think. But I'm not sure, maybe that's not the right interface contract here. + if (!cmsg->cmsg_type == SCM_RIGHTS) { Should be if (cmsg_type != SCM_RIGHTS) . + if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, ( char *)&timeout, no need to cast the pointer, the fourth argument to setsockopt is void * . + /** The raw FileDescriptor */ + int rawFd; I think "raw" means "Posix filedescriptor" here? + pFd->rawFd = dup(rawFd); Is there any chance we'll see races with fseek on these filedescriptors? Unfortunately two {{dup()}}ed fds share a single file offset pointer. If this is something clients need to worry about we'll have to document it very prominently. + // Unpublish any remaining file descriptors. + RB_FOREACH_SAFE(pFd, publishedFdsByCookie, + &data->publishedByCookie, pFdTmp) { + RB_REMOVE(publishedFdsByCookie, &data->publishedByCookie, pFd); + publishedFdFree(env, pFd); + } there's some indentation craziness going on here. + static jlong generateRandomCookie(void) +{ + uint64_t lo = ((uint64_t)mrand48()) & 0xffffffffULL; + uint64_t hi = ((uint64_t)mrand48()) << 32; + return (jlong)(lo | hi); Given that this is potentially a security-critical cookie, we should use /dev/urandom. (not /dev/random due to the entropy pool blocking issues.) + // See man cmsg3) for more details about how 'ancillary data' works. missing a ( I think. I would write See cmsg(3) for . Overall the code is pretty solid. I wish we were not open-coding so much JNI argument string stuff, it's just asking for trouble, but I don't have a better option. I'm not entirely convinced that this isn't overdesigned, but I'll wait for the design doc before coming to conclusions.
          Hide
          Colin Patrick McCabe added a comment -

          Thanks for the review, Andy.

          [red-black, splay tree implementations]

          We already use the same red-black tree implementation in libhdfs. It comes directly out of the BSD kernel and is released under a liberal license. It has to be a tree because hash tables are unordered, and we have to expire old file descriptors that were not claimed within a certain time period. That means ordering by time. The trees are also much more flexible in a lot of ways.

          I think [ unixDomainSetupSockaddr ] can be static, right?

          It's also used in fd_client.c.

          Put a maximum count on [RETRY_ON_EINTR]

          This is basically my version of TEMP_FAILURE_RETRY, a fine macro which happens to be a Linux-ism. http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives TEMP_FAILURE_RETRY doesn't have a maximum count, and as far as I know, there's no reason why you would need one.

          [safe write]

          I'll add the extra check for too-large write() results.

          The contract is "write exactly the given amount or return an error." If we get an error while writing to the socket, the exact amount written prior to that is not interesting (in this case at least.)

          [dup / fseek issues]

          I don't think it's an issue in the particular use cases we have planned (i.e., HDFS-347.) But it should definitely be documented.

          [Use /dev/urandom]

          It's a Linux-ism. but I guess we could do it on that platform.

          Show
          Colin Patrick McCabe added a comment - Thanks for the review, Andy. [red-black, splay tree implementations] We already use the same red-black tree implementation in libhdfs. It comes directly out of the BSD kernel and is released under a liberal license. It has to be a tree because hash tables are unordered, and we have to expire old file descriptors that were not claimed within a certain time period. That means ordering by time. The trees are also much more flexible in a lot of ways. I think [ unixDomainSetupSockaddr ] can be static, right? It's also used in fd_client.c. Put a maximum count on [RETRY_ON_EINTR] This is basically my version of TEMP_FAILURE_RETRY , a fine macro which happens to be a Linux-ism. http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives TEMP_FAILURE_RETRY doesn't have a maximum count, and as far as I know, there's no reason why you would need one. [safe write] I'll add the extra check for too-large write() results. The contract is "write exactly the given amount or return an error." If we get an error while writing to the socket, the exact amount written prior to that is not interesting (in this case at least.) [dup / fseek issues] I don't think it's an issue in the particular use cases we have planned (i.e., HDFS-347 .) But it should definitely be documented. [Use /dev/urandom] It's a Linux-ism. but I guess we could do it on that platform.
          Hide
          Colin Patrick McCabe added a comment -

          We need a design doc explaining how all of these bits work together. This Jira has gone on long enough that it does not serve as documentation.

          I think maybe a long-form comment in fd_server.c might be best? It's conceptually simple, but there are some details which might make more sense in context.

          Show
          Colin Patrick McCabe added a comment - We need a design doc explaining how all of these bits work together. This Jira has gone on long enough that it does not serve as documentation. I think maybe a long-form comment in fd_server.c might be best? It's conceptually simple, but there are some details which might make more sense in context.
          Hide
          Colin Patrick McCabe added a comment -
          • use /dev/urandom on platforms where it's available
          • add comments to fd_server.c, unix_domain.c, including overview comment and comment about abstract socket namespace
          • style fixes (indentation, etc)
          Show
          Colin Patrick McCabe added a comment - use /dev/urandom on platforms where it's available add comments to fd_server.c, unix_domain.c, including overview comment and comment about abstract socket namespace style fixes (indentation, etc)
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548810/HADOOP-6311.022.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1614//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1614//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548810/HADOOP-6311.022.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1614//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1614//console This message is automatically generated.
          Hide
          Andy Isaacson added a comment -

          Thinking about this further, it seems like the native code should be the minimal possible, and the database of which FDs are accessible should be managed in Java code. This would remove the need for the red-black tree in C, and it would make the whole patch much smaller. Todd suggested a way to completely avoid needing the cookie, as well.

          Show
          Andy Isaacson added a comment - Thinking about this further, it seems like the native code should be the minimal possible, and the database of which FDs are accessible should be managed in Java code. This would remove the need for the red-black tree in C, and it would make the whole patch much smaller. Todd suggested a way to completely avoid needing the cookie, as well.
          Hide
          Colin Patrick McCabe added a comment -

          I posted a design document which should clarify some of this.

          Show
          Colin Patrick McCabe added a comment - I posted a design document which should clarify some of this.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548835/design.txt
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1617//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548835/design.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1617//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Hi Colin,

          Thanks for writing up the design doc. I think it probably should actually go on HDFS-347, which is the overall feature JIRA, rather than this one, which is about one of the implementation subtasks. But, anyway, here are some comments:

          • Portable. Hadoop supports multiple operating systems and environments,
            including Linux, Solaris, and Windows.

          IMO, there is not a requirement that performance enhancements work on all systems. It is up to the maintainers of each port to come up with the most efficient way to do things. Though there is an active effort to get Hadoop working on Windows, it is not yet a requirement.

          So long as we maintain the current TCP-based read (which we have to, anyway, for remote access), we'll have portability. If the Windows port doesn't initially offer this feature, that seems acceptable to me, as they could later add whatever mechanism makes the most sense for them.

          • High performance. If performance is compromised, there is no point to any of
            this work, because clients could simply use the existing, non-short-circuit
            write pathways to access data.

          Should clarify that the performance of the mechanism by which FDs are passed is less important, since the client will cache the open FDs and just re-use them for subsequent random reads against the same file (the primary use case for this improvement). So long as the overhead of passing the FDs isn't huge, we should be OK.

          There are other problems. How would the datanode clients and the server decide
          on a socket path? If it asks every time prior to connecting, that could be
          slow. If the DFSClient cached this socket path, how long should it cache it
          before expiring the cache? What happens if the administrator does not properly
          set up the socket path, as discussed earlier? What happens if the
          administrator wants to put multiple DataNodes on the same node?

          Per above, slowness here is not a concern, since we only need to do the socket-passing on file open. HDFS applications generally open a file once and then perform many many reads against the same block before opening the next block.

          As for how the socket path is communicated, why not do it via an RPC? For example, in your solution #3, we're using an RPC to communicate a cookie. Instead of that, it can just return its abstract namespace socket name. (You seem to propose this under solution #3 below, but here in solution #1 reject it)

          Another option would be to add a new field to the DatanodeId/DatanodeRegistration: when the client gets block locations it could also include the socket paths.

          The response is not a path, but a 64-bit cookie. The DFSClient then connects
          to the DN via a UNIX domain socket, and presents the cookie. In response, he
          receives the file descriptor.

          I don't see the purpose of the cookie, still, since it adds yet another opaque token, and requires the DN code to "publish" the file descriptor with a cookie, and we end up with extra data structures, cached open files, cache expiration policies, etc.


          Choice #3. Blocking FdServer versus non-blocking FdServer.
          Non-blocking servers in C are somewhat more complex than blocking servers.
          However, if I used a blocking server, there would be no obvious way to
          determine how many threads it should use. Because it depends on how busy the
          server is expected to be, only the system administrator can know ahead of time.
          Additionally, many schedulers do not deal well with a large number of threads,
          especially on older versions of Linux and commercial UNIX variants.
          Coincidentally, these happen to be the exactly kind of systems many of our
          users run.

          I don't really buy this. The socket only needs to be active long enough to pass a single fd, which should take a few milliseconds. The number of requests for fd-passing is based on the number of block opens, not the number of reads. So a small handful of threads should be able to handle even significant workloads just fine. We also do fine with threads on the data xceiver path, often configured into the hundreds or thousands.

          Another problem with blocking servers is that shutting them down can be
          difficult. Since there is no time limit on blocking I/O, a message sent to the
          server to terminate may take a while, or possibly forever, to be acted on.
          This may seem like a trivial or unimportant problem, but it is a very real one
          in unit tests. Socket receive and send timeouts can reduce the extra time
          needed to shut down, but never quite eliminate it.

          Again I don't buy it, we do fine with blocking IO everywhere else.. Why is this context different?


          Wire protocol

          The wire protocol should use protobufs, so it can be evolved in the future. I also am still against the cookie approach. I think a more sensible protocol would be the following:

          0) The client somehow obtains the socket path (either by RPC to the DN or by it being part of the DatanodeId)
          1) Client connects to the socket, and sends a message which is a fully formed protobuf message, including the block ID and block token
          2) JNI server receives the message and passes it back to Java, which parses it, opens the block, and passes file descriptors (or an error) back to JNI.
          3) JNI server sends the file descriptors along with a response protobuf
          4) JNI client receives protobuf data and optionally fds (in success), forwards them back to Java where protobuf decode happens

          This eliminates the need for a new cookie construct, and a bunch of data structures on the server side. The JNI code becomes stateless except for its job of accept(), read(), and write(). It's also extensible due to the use of protobufs (which is helpful if the server later needs to provide some extra information about the format of the files, etc)

          The APIs would then become something like:

          class FdResponse {
            FileDescriptor []fds;
            byte[] responseData;
          }
          
          interface FdProvider {
            FdResponse handleRequest(byte[] request);
          }
          
          class FdServer { // implementation in JNI
            FdServer(FdProvider provider);
          }
          

          (the FdServer in C calls back into the FdProvider interface to handle the requests)


          Security:

          One question: if you use the autobinding in the abstract namespace, does that prevent a later attacker from explicitly picking the same name? ie is the "autobind" namespace fully distinct from the other one? Otherwise I'm afraid of the following attack:

          • malicious client acts like a reader, finds out the socket path of the local DN
          • client starts a while-loop, trying to bind that same socket
          • DN eventually restarts due to normal operations. client seizes the socket
          • other clients on that machine have the cached socket path and get man-in-the-middled

          I think this could be somewhat mitigated by using the credentials-checking facility of domain sockets: the client can know the proper uid of the datanode, and verify against that. Otherwise perhaps there is some token-like scheme we could use. But we should make sure this is foolproof.

          If the above is indeed a possibility, I'm not sure the abstract namespace buys us anything over using a well-defined path (eg inside /var/run), since we'd need to do the credentials check anyway.


          Overall thoughts

          I agree in general with your arguments about the complexity of the unix socket approach and its non-portability, but let me also bring to the table a couple arguments for it:

          As history shows, I was originally one of the proponents of the short-circuit read approach (in HDFS-347, etc). It provided much better performance than loopback TCP, especially for random read. For sequential read, it also uses much less CPU: in particular I see a lot of system CPU being used by loopback TCP sockets for both read and write on the non-short-circuit code paths. Short circuit avoids this.

          But my experience with HDFS-2246 and in thinking about some other upcoming improvements, there are a couple downsides as well:

          1) Metrics: because the client gains full control of the file descriptors, the datanode no longer knows anything about the amount of disk IO being used by each of its clients, or even the total aggregate. This makes the metrics under-reported, and I don't see any way around it.

          In addition to throughput metrics, we'll also end up lacking latency statistics against the local disk in the datanode. We now have latency percentiles for disk access, and that will be useful to identify dying/slow disks for applications like HBase. We don't get that with short-circuit.

          2) Fault handling: if there's an actual IO error on the underlying disk, the client is the one who sees it instead of the datanode. This means that the datanode won't necessarily mark the disk as bad. We should figure out how to address this for the short-circuit path (e.g the client could send an RPC to the DN which asks it to move a given block to the front of the block scanner queue upon hitting an error)

          3) Client thickness: there has recently been talk of changing the on-disk format in the datanode, for example to introduce inline checksums in the block data. Currently, such changes would be datanode-only, but with short circuit the client also needs an update. We need to ensure that whatever short circuit code we write, we have suitable fallbacks: eg by transferring some kind of disk format version identifier in the RPC, and having the DN reject the request if the client won't be able to handle the storage format.

          4) Future QoS enhancements: currently there is no QoS/prioritization construct within the datanode, but as mixed workload clusters become more popular (eg HBase + MR) we would like to be able to introduce QoS features. If all IO goes through the datanode, then this is far more feasible. With short-circuit, once we hand out an fd, we've lost all ability to throttle a client.

          So, I'm not against the fd passing approach, but I think these downsides should be called out in the docs, and if there's any way we can think of to mitigate some of them, that would be good to consider.

          I'd also be really interested to see data on performance of the existing datanode code running on either unix sockets or on a system patched with the new "TCP friends" feature which eliminates a lot of the loopback overhead. According to some benchmarks I've read about, it should cut CPU usage by a factor of 2-3, which might make the win of short-circuit much smaller. Has anyone done any prototypes in this area? The other advantage of this approach (non-short-circuit unix sockets instead of TCP) is that it would improve performance of the write pipeline as well, where I currently see a ton of overhead due to TCP in the kernel.

          Show
          Todd Lipcon added a comment - Hi Colin, Thanks for writing up the design doc. I think it probably should actually go on HDFS-347 , which is the overall feature JIRA, rather than this one, which is about one of the implementation subtasks. But, anyway, here are some comments: Portable. Hadoop supports multiple operating systems and environments, including Linux, Solaris, and Windows. IMO, there is not a requirement that performance enhancements work on all systems. It is up to the maintainers of each port to come up with the most efficient way to do things. Though there is an active effort to get Hadoop working on Windows, it is not yet a requirement. So long as we maintain the current TCP-based read (which we have to, anyway, for remote access), we'll have portability. If the Windows port doesn't initially offer this feature, that seems acceptable to me, as they could later add whatever mechanism makes the most sense for them. High performance. If performance is compromised, there is no point to any of this work, because clients could simply use the existing, non-short-circuit write pathways to access data. Should clarify that the performance of the mechanism by which FDs are passed is less important, since the client will cache the open FDs and just re-use them for subsequent random reads against the same file (the primary use case for this improvement). So long as the overhead of passing the FDs isn't huge, we should be OK. There are other problems. How would the datanode clients and the server decide on a socket path? If it asks every time prior to connecting, that could be slow. If the DFSClient cached this socket path, how long should it cache it before expiring the cache? What happens if the administrator does not properly set up the socket path, as discussed earlier? What happens if the administrator wants to put multiple DataNodes on the same node? Per above, slowness here is not a concern, since we only need to do the socket-passing on file open. HDFS applications generally open a file once and then perform many many reads against the same block before opening the next block. As for how the socket path is communicated, why not do it via an RPC? For example, in your solution #3, we're using an RPC to communicate a cookie. Instead of that, it can just return its abstract namespace socket name. (You seem to propose this under solution #3 below, but here in solution #1 reject it) Another option would be to add a new field to the DatanodeId/DatanodeRegistration: when the client gets block locations it could also include the socket paths. The response is not a path, but a 64-bit cookie. The DFSClient then connects to the DN via a UNIX domain socket, and presents the cookie. In response, he receives the file descriptor. I don't see the purpose of the cookie, still, since it adds yet another opaque token, and requires the DN code to "publish" the file descriptor with a cookie, and we end up with extra data structures, cached open files, cache expiration policies, etc. Choice #3. Blocking FdServer versus non-blocking FdServer. Non-blocking servers in C are somewhat more complex than blocking servers. However, if I used a blocking server, there would be no obvious way to determine how many threads it should use. Because it depends on how busy the server is expected to be, only the system administrator can know ahead of time. Additionally, many schedulers do not deal well with a large number of threads, especially on older versions of Linux and commercial UNIX variants. Coincidentally, these happen to be the exactly kind of systems many of our users run. I don't really buy this. The socket only needs to be active long enough to pass a single fd, which should take a few milliseconds. The number of requests for fd-passing is based on the number of block opens, not the number of reads. So a small handful of threads should be able to handle even significant workloads just fine. We also do fine with threads on the data xceiver path, often configured into the hundreds or thousands. Another problem with blocking servers is that shutting them down can be difficult. Since there is no time limit on blocking I/O, a message sent to the server to terminate may take a while, or possibly forever, to be acted on. This may seem like a trivial or unimportant problem, but it is a very real one in unit tests. Socket receive and send timeouts can reduce the extra time needed to shut down, but never quite eliminate it. Again I don't buy it, we do fine with blocking IO everywhere else.. Why is this context different? Wire protocol The wire protocol should use protobufs, so it can be evolved in the future. I also am still against the cookie approach. I think a more sensible protocol would be the following: 0) The client somehow obtains the socket path (either by RPC to the DN or by it being part of the DatanodeId) 1) Client connects to the socket, and sends a message which is a fully formed protobuf message, including the block ID and block token 2) JNI server receives the message and passes it back to Java, which parses it, opens the block, and passes file descriptors (or an error) back to JNI. 3) JNI server sends the file descriptors along with a response protobuf 4) JNI client receives protobuf data and optionally fds (in success), forwards them back to Java where protobuf decode happens This eliminates the need for a new cookie construct, and a bunch of data structures on the server side. The JNI code becomes stateless except for its job of accept(), read(), and write(). It's also extensible due to the use of protobufs (which is helpful if the server later needs to provide some extra information about the format of the files, etc) The APIs would then become something like: class FdResponse { FileDescriptor []fds; byte [] responseData; } interface FdProvider { FdResponse handleRequest( byte [] request); } class FdServer { // implementation in JNI FdServer(FdProvider provider); } (the FdServer in C calls back into the FdProvider interface to handle the requests) Security: One question: if you use the autobinding in the abstract namespace, does that prevent a later attacker from explicitly picking the same name? ie is the "autobind" namespace fully distinct from the other one? Otherwise I'm afraid of the following attack: malicious client acts like a reader, finds out the socket path of the local DN client starts a while-loop, trying to bind that same socket DN eventually restarts due to normal operations. client seizes the socket other clients on that machine have the cached socket path and get man-in-the-middled I think this could be somewhat mitigated by using the credentials-checking facility of domain sockets: the client can know the proper uid of the datanode, and verify against that. Otherwise perhaps there is some token-like scheme we could use. But we should make sure this is foolproof. If the above is indeed a possibility, I'm not sure the abstract namespace buys us anything over using a well-defined path (eg inside /var/run), since we'd need to do the credentials check anyway. Overall thoughts I agree in general with your arguments about the complexity of the unix socket approach and its non-portability, but let me also bring to the table a couple arguments for it: As history shows, I was originally one of the proponents of the short-circuit read approach (in HDFS-347 , etc). It provided much better performance than loopback TCP, especially for random read. For sequential read, it also uses much less CPU: in particular I see a lot of system CPU being used by loopback TCP sockets for both read and write on the non-short-circuit code paths. Short circuit avoids this. But my experience with HDFS-2246 and in thinking about some other upcoming improvements, there are a couple downsides as well: 1) Metrics: because the client gains full control of the file descriptors, the datanode no longer knows anything about the amount of disk IO being used by each of its clients, or even the total aggregate. This makes the metrics under-reported, and I don't see any way around it. In addition to throughput metrics, we'll also end up lacking latency statistics against the local disk in the datanode. We now have latency percentiles for disk access, and that will be useful to identify dying/slow disks for applications like HBase. We don't get that with short-circuit. 2) Fault handling: if there's an actual IO error on the underlying disk, the client is the one who sees it instead of the datanode. This means that the datanode won't necessarily mark the disk as bad. We should figure out how to address this for the short-circuit path (e.g the client could send an RPC to the DN which asks it to move a given block to the front of the block scanner queue upon hitting an error) 3) Client thickness: there has recently been talk of changing the on-disk format in the datanode, for example to introduce inline checksums in the block data. Currently, such changes would be datanode-only, but with short circuit the client also needs an update. We need to ensure that whatever short circuit code we write, we have suitable fallbacks: eg by transferring some kind of disk format version identifier in the RPC, and having the DN reject the request if the client won't be able to handle the storage format. 4) Future QoS enhancements: currently there is no QoS/prioritization construct within the datanode, but as mixed workload clusters become more popular (eg HBase + MR) we would like to be able to introduce QoS features. If all IO goes through the datanode, then this is far more feasible. With short-circuit, once we hand out an fd, we've lost all ability to throttle a client. So, I'm not against the fd passing approach, but I think these downsides should be called out in the docs, and if there's any way we can think of to mitigate some of them, that would be good to consider. I'd also be really interested to see data on performance of the existing datanode code running on either unix sockets or on a system patched with the new "TCP friends" feature which eliminates a lot of the loopback overhead. According to some benchmarks I've read about, it should cut CPU usage by a factor of 2-3, which might make the win of short-circuit much smaller. Has anyone done any prototypes in this area? The other advantage of this approach (non-short-circuit unix sockets instead of TCP) is that it would improve performance of the write pipeline as well, where I currently see a ton of overhead due to TCP in the kernel.
          Hide
          Colin Patrick McCabe added a comment -

          Thanks for the comments.

          With respect to security: there is always a possibility for a client to open a socket with the same name as the server would have used. This is similar to the problem with TCP/IP sockets of a malicious program grabbing the port before the DataNode could get it (or after the DataNode has died.)

          I guess this is a problem that actually is worse with the abstract socket namespace. With path-based sockets, you can set up the path so that the permissions of the path itself prevent this attack. However, with the abstract socket namespace, there's no way to prevent another process from grabbing the port first.

          I agree that there are downsides to the short-circuit approach. I was very careful to maintain the ability for the server to decline to offer short-circuit local reads in my patch set. This is obviously important for our future flexibility. It might be advisable to allow this on a file-by-file basis as well.

          I don't think that on-disk format changes are that big of a deal for the short-circuit pathway. We tell old clients they can't use short-circuit reads on those files, and fix new clients to understand the new format.

          We should definitely have a way for short-circuit clients to report statistics, disk errors, etc. to the DataNode. However, let's not gate this change on features like that. They can easily be added as features later and aren't really related to the core issue of fixing local reads + security. I think I'll open a separate JIRA for that.

          TCP optimizations are pretty cool, but not when you run on RHEL6, as many folks do Maybe we should open a separate JIRA to investigate things like TCP fast open, changing TCP kernel options, etc. might be used with Hadoop in the future. There are also certain performance improvements we could do in the read and write paths on the DataNode, but again, that's out of scope for this JIRA, I think.

          Show
          Colin Patrick McCabe added a comment - Thanks for the comments. With respect to security: there is always a possibility for a client to open a socket with the same name as the server would have used. This is similar to the problem with TCP/IP sockets of a malicious program grabbing the port before the DataNode could get it (or after the DataNode has died.) I guess this is a problem that actually is worse with the abstract socket namespace. With path-based sockets, you can set up the path so that the permissions of the path itself prevent this attack. However, with the abstract socket namespace, there's no way to prevent another process from grabbing the port first. I agree that there are downsides to the short-circuit approach. I was very careful to maintain the ability for the server to decline to offer short-circuit local reads in my patch set. This is obviously important for our future flexibility. It might be advisable to allow this on a file-by-file basis as well. I don't think that on-disk format changes are that big of a deal for the short-circuit pathway. We tell old clients they can't use short-circuit reads on those files, and fix new clients to understand the new format. We should definitely have a way for short-circuit clients to report statistics, disk errors, etc. to the DataNode. However, let's not gate this change on features like that. They can easily be added as features later and aren't really related to the core issue of fixing local reads + security. I think I'll open a separate JIRA for that. TCP optimizations are pretty cool, but not when you run on RHEL6, as many folks do Maybe we should open a separate JIRA to investigate things like TCP fast open, changing TCP kernel options, etc. might be used with Hadoop in the future. There are also certain performance improvements we could do in the read and write paths on the DataNode, but again, that's out of scope for this JIRA, I think.
          Hide
          Todd Lipcon added a comment -

          With respect to security: there is always a possibility for a client to open a socket with the same name as the server would have used. This is similar to the problem with TCP/IP sockets of a malicious program grabbing the port before the DataNode could get it (or after the DataNode has died.)

          That's why secure clusters use low (privileged) ports for the data transfer protocol.

          I don't think that on-disk format changes are that big of a deal for the short-circuit pathway. We tell old clients they can't use short-circuit reads on those files, and fix new clients to understand the new format.

          Agreed, just need to make sure the "deny" pathway works and ideally some kind of version number exposed.

          TCP optimizations are pretty cool, but not when you run on RHEL6, as many folks do Maybe we should open a separate JIRA to investigate things like TCP fast open, changing TCP kernel options, etc. might be used with Hadoop in the future. There are also certain performance improvements we could do in the read and write paths on the DataNode, but again, that's out of scope for this JIRA, I think.

          Agreed, but my question is more this: let's assume that unix sockets for the data path are 3x as fast as local TCP sockets. If that's the case, then do we still get a big benefit from short-circuit? I think the answer is probably yes for random read, but no for sequential. The point about trying the "tcp friends" in future versions is just one potential way of evaluating this without having to write all the code for a unix socket data path. If "tcp friends" is comparable to short circuit, then unix sockets would probably also be comparable.

          Show
          Todd Lipcon added a comment - With respect to security: there is always a possibility for a client to open a socket with the same name as the server would have used. This is similar to the problem with TCP/IP sockets of a malicious program grabbing the port before the DataNode could get it (or after the DataNode has died.) That's why secure clusters use low (privileged) ports for the data transfer protocol. I don't think that on-disk format changes are that big of a deal for the short-circuit pathway. We tell old clients they can't use short-circuit reads on those files, and fix new clients to understand the new format. Agreed, just need to make sure the "deny" pathway works and ideally some kind of version number exposed. TCP optimizations are pretty cool, but not when you run on RHEL6, as many folks do Maybe we should open a separate JIRA to investigate things like TCP fast open, changing TCP kernel options, etc. might be used with Hadoop in the future. There are also certain performance improvements we could do in the read and write paths on the DataNode, but again, that's out of scope for this JIRA, I think. Agreed, but my question is more this: let's assume that unix sockets for the data path are 3x as fast as local TCP sockets. If that's the case, then do we still get a big benefit from short-circuit? I think the answer is probably yes for random read, but no for sequential. The point about trying the "tcp friends" in future versions is just one potential way of evaluating this without having to write all the code for a unix socket data path. If "tcp friends" is comparable to short circuit, then unix sockets would probably also be comparable.
          Hide
          Colin Patrick McCabe added a comment -

          With regard to security: after further reflection, I think we will need to ask system administrators establish secure directories to hold the UNIX domain sockets. In practice, this means a directory owned by hdfs, where neither it nor any of its parent directories are vulnerable to attack.

          ... my question is more this: let's assume that unix sockets for the data path are 3x as fast as local TCP sockets. If that's the case, then do we still get a big benefit from short-circuit?

          Oh, I misinterpreted. You were talking about using UNIX domain instead of TCP for data traffic. Yeah, it could be interesting. It's a time-honored way to get better performance on UNIXes. I'll do some tests if I can get the UNIX domain sockets to implement the "standard" interface (and if the resulting combination actually works.) I think there is a good chance that it will...

          Show
          Colin Patrick McCabe added a comment - With regard to security: after further reflection, I think we will need to ask system administrators establish secure directories to hold the UNIX domain sockets. In practice, this means a directory owned by hdfs, where neither it nor any of its parent directories are vulnerable to attack. ... my question is more this: let's assume that unix sockets for the data path are 3x as fast as local TCP sockets. If that's the case, then do we still get a big benefit from short-circuit? Oh, I misinterpreted. You were talking about using UNIX domain instead of TCP for data traffic. Yeah, it could be interesting. It's a time-honored way to get better performance on UNIXes. I'll do some tests if I can get the UNIX domain sockets to implement the "standard" interface (and if the resulting combination actually works.) I think there is a good chance that it will...
          Hide
          Colin Patrick McCabe added a comment -

          This latest patch is a new approach, based on the ideas we discussed here. Basically, the idea is to add a set of generic Socket classes that implement UNIX domain sockets.

          This patch contains no code from Android. The advantages over the previous approaches are:

          • The previous approaches didn't create classes which actually inherited from ServerSocket and Socket. This patch does. This means that DomainSockets can be used in all of the places where InetSockets were previously used (see the HDFS-347 patch for the application.)
          • This patch has unit tests.
          • We skip implementing a few things that we don't really need, like credential-passing
          • These classes are fully thread-safe.
          Show
          Colin Patrick McCabe added a comment - This latest patch is a new approach, based on the ideas we discussed here. Basically, the idea is to add a set of generic Socket classes that implement UNIX domain sockets. This patch contains no code from Android. The advantages over the previous approaches are: The previous approaches didn't create classes which actually inherited from ServerSocket and Socket . This patch does. This means that DomainSockets can be used in all of the places where InetSockets were previously used (see the HDFS-347 patch for the application.) This patch has unit tests. We skip implementing a few things that we don't really need, like credential-passing These classes are fully thread-safe.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552207/HADOOP-6311.023.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.net.unix.TestDomainSocket
          org.apache.hadoop.hdfs.server.namenode.TestBackupNode

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552207/HADOOP-6311.023.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.net.unix.TestDomainSocket org.apache.hadoop.hdfs.server.namenode.TestBackupNode +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1710//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • use shorter paths for the test socket paths
            (yes, there is a built-in limitation)

          I think the BackupNode test failure resulted from "port in use" type noise.

          Show
          Colin Patrick McCabe added a comment - use shorter paths for the test socket paths (yes, there is a built-in limitation) I think the BackupNode test failure resulted from "port in use" type noise.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552310/HADOOP-6311.024.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552310/HADOOP-6311.024.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1712//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • fix JavaDoc warnings
          • fix close() bug
          • suppress findbugs warning
          Show
          Colin Patrick McCabe added a comment - fix JavaDoc warnings fix close() bug suppress findbugs warning
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552330/HADOOP-6311.027.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1713//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1713//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552330/HADOOP-6311.027.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1713//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1713//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          fix another JavaDoc warning.

          Show
          Colin Patrick McCabe added a comment - fix another JavaDoc warning.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552900/HADOOP-6311.028.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1727//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1727//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552900/HADOOP-6311.028.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1727//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1727//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          This support has been added as part of the HDFS-347 branch, which was merged recently. Thanks, everyone, for your contributions.

          Show
          Colin Patrick McCabe added a comment - This support has been added as part of the HDFS-347 branch, which was merged recently. Thanks, everyone, for your contributions.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development