Issue Details (XML | Word | Printable)

Key: HADOOP-2239
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris Douglas
Reporter: Allen Wittenauer
Votes: 2
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Security: Need to be able to encrypt Hadoop socket connections

Created: 20/Nov/07 11:07 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 2239-0.patch 2008-02-16 02:47 AM Chris Douglas 15 kB
Text File Licensed for inclusion in ASF works 2239-1.patch 2008-03-01 01:04 AM Chris Douglas 15 kB
Text File Licensed for inclusion in ASF works 2239-2.patch 2008-03-14 03:25 AM Chris Douglas 15 kB
Text File Licensed for inclusion in ASF works 2239-3.patch 2008-03-19 01:43 AM Chris Douglas 15 kB
Issue Links:
Blocker
 

Release Note: This patch adds a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
Resolution Date: 19/Mar/08 01:44 AM


 Description  « Hide
We need to be able to use hadoop over hostile networks, both internally and externally to the enterpise. While authentication prevents unauthorized access, encryption should be used to prevent such things as packet snooping across the wire. This means that hadoop client connections, distcp, etc, would use something such as SSL to protect the TCP/IP packets. Post-Kerberos, it would be useful to use something similar to NFS's krb5p option.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Owen O'Malley added a comment - 13/Feb/08 10:18 PM
More precisely, what we need is the hftp file system to optionally go through ssl.

Doug Cutting added a comment - 14/Feb/08 06:07 PM
This can be triggered when an "hsftp:" uri is used.

The following would be required to implement this:

  1. Extend the namenode and datanode's http servers to respond to https connections, adding new configuration properties to name https ports, and configuring jetty accordingly (adding a Jetty SslListener in our StatusHttpServer, specifying certs, etc.).
  2. Change HftpFileSystem to use https connections when the "hsftp" scheme is used.
  3. Change FileDataServlet to redirect to https when https is used to access it.

Does that sound right?


Allen Wittenauer added a comment - 14/Feb/08 06:33 PM
While we really need secure distcp Right Now(tm), it would be good if in the future we could encrypt dfs put/get. I'm not so concerned about connectivity between grid nodes (at least right now).

I suspect this JIRA needs to get broken down into multiple subtasks.


Raghu Angadi added a comment - 14/Feb/08 06:44 PM
should it be "hftps:" rather than "hsftp:"?

Doug Cutting added a comment - 14/Feb/08 06:56 PM
> it would be good if in the future we could encrypt dfs put/get [ ... ]

Hftp: uris can be used most places that hdfs: uris can be used, including put/get. The big limitations at present are that hftp does not support:

  1. random access – hftp mapreduce inputs cannot be split; and
  2. locality hints – hftp mapreduce inputs are not localized.

Currently, to use an hftp filesystem as a mapreduce input, one would have to set mapred.min.split.size=0xffffffffffffffff, but, other than that, it should work.

So I think making the hftp protocol secure will permit the uses you have in mind, no?


Doug Cutting added a comment - 14/Feb/08 07:01 PM
> should it be "hftps:" rather than "hsftp:"?

Whatever. Both SFTP and FTPS refer to secure file-transfer protocols. The former does not require certs, while the latter (like HTTPS) does. So maybe FTPS is a better analogy, but HSFTP reads better to me, as the Hadoop Secure File Transfer Protocol.


Chris Douglas added a comment - 16/Feb/08 02:47 AM
This roughly follows what Doug outlined above. I added a SunJsseListener to the namenode and datanode StatusHttpServer, initialized iff the keystore location is specified. The keystore properties- including passwords- are specified in another resource, specified in the config. I added HsftpFileSystem to handle the client-side and included a redirect to a ssl-capable datanode port from the NameNode servlet, assumed to be static (avoiding the protocol version bump).

Tsz Wo (Nicholas), SZE added a comment - 20/Feb/08 09:09 PM
2239-0.patch: Codes are good. It even make the origin codes better. Below are some comments
  • It may not be good to store passwords in the config file. We should, at least having a option, to let the user enter passwords in runtime (during starting namenode, datanodes).
  • need descriptions for new config properties
  • need javadoc for protected members.
  • need unit tests

Chris Douglas added a comment - 01/Mar/08 01:04 AM
This patch adds some documentation, per Nicholas's recommendation. It does not include any test cases, as the requirements for configuring ssl are somewhat onerous and- in my limited experience- not amenable to automation in a test case. Lacking certs, it was tested with Firefox and appears correct. The passwords are stored in a config file, which is regrettable, but the resource storing them need only be on the classpath. Getting this information is out-of-band as it is, and an auxiliary config file seemed the most expedient and mostly-correct option available. For Right Now(tm), it should suffice.

Hadoop QA added a comment - 01/Mar/08 02:53 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376886/2239-1.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 616 javac compiler warnings (more than the trunk's current 614 warnings).

release audit -1. The applied patch generated 191 release audit warnings (more than the trunk's current 190 warnings).

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/build/test/checkstyle-errors.html
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/artifact/trunk/current/releaseAuditDiffWarnings.txt
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1881/console

This message is automatically generated.


Owen O'Malley added a comment - 09/Mar/08 08:24 AM
We need to do the paper work part of putting encryption into Hadoop before this can be committed.

Chris Douglas added a comment - 14/Mar/08 03:25 AM
Replace deprecated SunJsseListener with SslListener

Hadoop QA added a comment - 15/Mar/08 08:01 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377862/2239-2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit -1. The applied patch generated 194 release audit warnings (more than the trunk's current 193 warnings).

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/build/test/checkstyle-errors.html
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/artifact/trunk/current/releaseAuditDiffWarnings.txt
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1969/console

This message is automatically generated.


Tsz Wo (Nicholas), SZE added a comment - 19/Mar/08 01:15 AM
+1 Codes look good.
  • The hsftp port is fixed across a cluster. So this feature will not work for running more than one datanodes in the same machine. Moreover, we have to make sure that more than one datanodes can run in the same machine when this feature is disabled.
  • We probably should support the random hsftp ports later.
  • We need to add more documentation later.

Other minor comments:

  • Remove "import org.mortbay.http.JsseListener" in StatusHttpServer.
  • The codes for initializing infoServer in FSNamesystem and DataNode are similar. Why not make a static method in StatusHttpServer?

Chris Douglas added a comment - 19/Mar/08 01:44 AM
I just committed this

Hudson added a comment - 19/Mar/08 11:37 AM