Hadoop Common
  1. Hadoop Common
  2. HADOOP-2239

Security: Need to be able to encrypt Hadoop socket connections

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      This patch adds a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.

      Description

      We need to be able to use hadoop over hostile networks, both internally and externally to the enterpise. While authentication prevents unauthorized access, encryption should be used to prevent such things as packet snooping across the wire. This means that hadoop client connections, distcp, etc, would use something such as SSL to protect the TCP/IP packets. Post-Kerberos, it would be useful to use something similar to NFS's krb5p option.

      1. 2239-0.patch
        15 kB
        Chris Douglas
      2. 2239-1.patch
        15 kB
        Chris Douglas
      3. 2239-2.patch
        15 kB
        Chris Douglas
      4. 2239-3.patch
        15 kB
        Chris Douglas

        Issue Links

          Activity

          Allen Wittenauer created issue -
          Robert Chansler made changes -
          Field Original Value New Value
          Component/s dfs [ 12310710 ]
          Hide
          Owen O'Malley added a comment -

          More precisely, what we need is the hftp file system to optionally go through ssl.

          Show
          Owen O'Malley added a comment - More precisely, what we need is the hftp file system to optionally go through ssl.
          Hide
          Doug Cutting added a comment -

          This can be triggered when an "hsftp:" uri is used.

          The following would be required to implement this:

          1. Extend the namenode and datanode's http servers to respond to https connections, adding new configuration properties to name https ports, and configuring jetty accordingly (adding a Jetty SslListener in our StatusHttpServer, specifying certs, etc.).
          2. Change HftpFileSystem to use https connections when the "hsftp" scheme is used.
          3. Change FileDataServlet to redirect to https when https is used to access it.

          Does that sound right?

          Show
          Doug Cutting added a comment - This can be triggered when an "hsftp:" uri is used. The following would be required to implement this: Extend the namenode and datanode's http servers to respond to https connections, adding new configuration properties to name https ports, and configuring jetty accordingly (adding a Jetty SslListener in our StatusHttpServer, specifying certs, etc.). Change HftpFileSystem to use https connections when the "hsftp" scheme is used. Change FileDataServlet to redirect to https when https is used to access it. Does that sound right?
          Hide
          Allen Wittenauer added a comment -

          While we really need secure distcp Right Now(tm), it would be good if in the future we could encrypt dfs put/get. I'm not so concerned about connectivity between grid nodes (at least right now).

          I suspect this JIRA needs to get broken down into multiple subtasks.

          Show
          Allen Wittenauer added a comment - While we really need secure distcp Right Now(tm), it would be good if in the future we could encrypt dfs put/get. I'm not so concerned about connectivity between grid nodes (at least right now). I suspect this JIRA needs to get broken down into multiple subtasks.
          Hide
          Raghu Angadi added a comment -

          should it be "hftps:" rather than "hsftp:"?

          Show
          Raghu Angadi added a comment - should it be "hftps:" rather than "hsftp:"?
          Hide
          Doug Cutting added a comment -

          > it would be good if in the future we could encrypt dfs put/get [ ... ]

          Hftp: uris can be used most places that hdfs: uris can be used, including put/get. The big limitations at present are that hftp does not support:

          1. random access – hftp mapreduce inputs cannot be split; and
          2. locality hints – hftp mapreduce inputs are not localized.

          Currently, to use an hftp filesystem as a mapreduce input, one would have to set mapred.min.split.size=0xffffffffffffffff, but, other than that, it should work.

          So I think making the hftp protocol secure will permit the uses you have in mind, no?

          Show
          Doug Cutting added a comment - > it would be good if in the future we could encrypt dfs put/get [ ... ] Hftp: uris can be used most places that hdfs: uris can be used, including put/get. The big limitations at present are that hftp does not support: random access – hftp mapreduce inputs cannot be split; and locality hints – hftp mapreduce inputs are not localized. Currently, to use an hftp filesystem as a mapreduce input, one would have to set mapred.min.split.size=0xffffffffffffffff, but, other than that, it should work. So I think making the hftp protocol secure will permit the uses you have in mind, no?
          Hide
          Doug Cutting added a comment -

          > should it be "hftps:" rather than "hsftp:"?

          Whatever. Both SFTP and FTPS refer to secure file-transfer protocols. The former does not require certs, while the latter (like HTTPS) does. So maybe FTPS is a better analogy, but HSFTP reads better to me, as the Hadoop Secure File Transfer Protocol.

          Show
          Doug Cutting added a comment - > should it be "hftps:" rather than "hsftp:"? Whatever. Both SFTP and FTPS refer to secure file-transfer protocols. The former does not require certs, while the latter (like HTTPS) does. So maybe FTPS is a better analogy, but HSFTP reads better to me, as the Hadoop Secure File Transfer Protocol.
          Hide
          Chris Douglas added a comment -

          This roughly follows what Doug outlined above. I added a SunJsseListener to the namenode and datanode StatusHttpServer, initialized iff the keystore location is specified. The keystore properties- including passwords- are specified in another resource, specified in the config. I added HsftpFileSystem to handle the client-side and included a redirect to a ssl-capable datanode port from the NameNode servlet, assumed to be static (avoiding the protocol version bump).

          Show
          Chris Douglas added a comment - This roughly follows what Doug outlined above. I added a SunJsseListener to the namenode and datanode StatusHttpServer, initialized iff the keystore location is specified. The keystore properties- including passwords- are specified in another resource, specified in the config. I added HsftpFileSystem to handle the client-side and included a redirect to a ssl-capable datanode port from the NameNode servlet, assumed to be static (avoiding the protocol version bump).
          Chris Douglas made changes -
          Attachment 2239-0.patch [ 12375742 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          2239-0.patch: Codes are good. It even make the origin codes better. Below are some comments

          • It may not be good to store passwords in the config file. We should, at least having a option, to let the user enter passwords in runtime (during starting namenode, datanodes).
          • need descriptions for new config properties
          • need javadoc for protected members.
          • need unit tests
          Show
          Tsz Wo Nicholas Sze added a comment - 2239-0.patch: Codes are good. It even make the origin codes better. Below are some comments It may not be good to store passwords in the config file. We should, at least having a option, to let the user enter passwords in runtime (during starting namenode, datanodes). need descriptions for new config properties need javadoc for protected members. need unit tests
          Hide
          Chris Douglas added a comment -

          This patch adds some documentation, per Nicholas's recommendation. It does not include any test cases, as the requirements for configuring ssl are somewhat onerous and- in my limited experience- not amenable to automation in a test case. Lacking certs, it was tested with Firefox and appears correct. The passwords are stored in a config file, which is regrettable, but the resource storing them need only be on the classpath. Getting this information is out-of-band as it is, and an auxiliary config file seemed the most expedient and mostly-correct option available. For Right Now(tm), it should suffice.

          Show
          Chris Douglas added a comment - This patch adds some documentation, per Nicholas's recommendation. It does not include any test cases, as the requirements for configuring ssl are somewhat onerous and- in my limited experience- not amenable to automation in a test case. Lacking certs, it was tested with Firefox and appears correct. The passwords are stored in a config file, which is regrettable, but the resource storing them need only be on the classpath. Getting this information is out-of-band as it is, and an auxiliary config file seemed the most expedient and mostly-correct option available. For Right Now(tm), it should suffice.
          Chris Douglas made changes -
          Attachment 2239-1.patch [ 12376886 ]
          Chris Douglas made changes -
          Fix Version/s 0.17.0 [ 12312913 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12376886/2239-1.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac -1. The applied patch generated 616 javac compiler warnings (more than the trunk's current 614 warnings).

          release audit -1. The applied patch generated 191 release audit warnings (more than the trunk's current 190 warnings).

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/checkstyle-errors.html
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12376886/2239-1.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 616 javac compiler warnings (more than the trunk's current 614 warnings). release audit -1. The applied patch generated 191 release audit warnings (more than the trunk's current 190 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/checkstyle-errors.html Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/current/releaseAuditDiffWarnings.txt Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/console This message is automatically generated.
          Owen O'Malley made changes -
          Assignee Chris Douglas [ chris.douglas ]
          Owen O'Malley made changes -
          Link This issue is blocked by HADOOP-2981 [ HADOOP-2981 ]
          Hide
          Owen O'Malley added a comment -

          We need to do the paper work part of putting encryption into Hadoop before this can be committed.

          Show
          Owen O'Malley added a comment - We need to do the paper work part of putting encryption into Hadoop before this can be committed.
          Hide
          Chris Douglas added a comment -

          Replace deprecated SunJsseListener with SslListener

          Show
          Chris Douglas added a comment - Replace deprecated SunJsseListener with SslListener
          Chris Douglas made changes -
          Attachment 2239-2.patch [ 12377862 ]
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Chris Douglas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12377862/2239-2.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit -1. The applied patch generated 194 release audit warnings (more than the trunk's current 193 warnings).

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/checkstyle-errors.html
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377862/2239-2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit -1. The applied patch generated 194 release audit warnings (more than the trunk's current 193 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/checkstyle-errors.html Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/current/releaseAuditDiffWarnings.txt Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 Codes look good.

          • The hsftp port is fixed across a cluster. So this feature will not work for running more than one datanodes in the same machine. Moreover, we have to make sure that more than one datanodes can run in the same machine when this feature is disabled.
          • We probably should support the random hsftp ports later.
          • We need to add more documentation later.

          Other minor comments:

          • Remove "import org.mortbay.http.JsseListener" in StatusHttpServer.
          • The codes for initializing infoServer in FSNamesystem and DataNode are similar. Why not make a static method in StatusHttpServer?
          Show
          Tsz Wo Nicholas Sze added a comment - +1 Codes look good. The hsftp port is fixed across a cluster. So this feature will not work for running more than one datanodes in the same machine. Moreover, we have to make sure that more than one datanodes can run in the same machine when this feature is disabled. We probably should support the random hsftp ports later. We need to add more documentation later. Other minor comments: Remove "import org.mortbay.http.JsseListener" in StatusHttpServer. The codes for initializing infoServer in FSNamesystem and DataNode are similar. Why not make a static method in StatusHttpServer?
          Chris Douglas made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Chris Douglas made changes -
          Attachment 2239-3.patch [ 12378194 ]
          Hide
          Chris Douglas added a comment -

          I just committed this

          Show
          Chris Douglas added a comment - I just committed this
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #433 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/433/ )
          Chris Douglas made changes -
          Release Note This patch adds a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]

            People

            • Assignee:
              Chris Douglas
              Reporter:
              Allen Wittenauer
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development