Issue Details (XML | Word | Printable)

Key: HADOOP-4453
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Kan Zhang
Reporter: Chris Douglas
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Improve ssl handling for distcp

Created: 18/Oct/08 12:33 AM   Updated: 08/Jul/09 04:43 PM
Component/s: None
Affects Version/s: 0.20.0
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File 4453-0.patch 2008-10-18 12:35 AM Chris Douglas 18 kB
Text File Licensed for inclusion in ASF works 4453-2.patch 2008-10-18 03:33 AM Kan Zhang 18 kB
Text File Licensed for inclusion in ASF works 4453-3.patch 2008-11-07 07:22 AM Kan Zhang 18 kB
Text File Licensed for inclusion in ASF works 4453-4.patch 2008-11-07 10:23 PM Kan Zhang 18 kB
Text File Licensed for inclusion in ASF works 4453-5.patch 2008-11-07 11:36 PM Chris Douglas 18 kB
Issue Links:
Reference

Hadoop Flags: Reviewed
Resolution Date: 07/Nov/08 11:37 PM


 Description  « Hide
HsftpFileSystem is an ad hoc way to read from HDFS over ssl, targeting distcp. Its organization can be improved and its support of ssl features expanded.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Chris Douglas added a comment - 18/Oct/08 12:35 AM
Attaching ssl-related section of 4284 on Kan's behalf.

Kan Zhang added a comment - 18/Oct/08 03:41 AM
Chris, thanks for helping me splitting the patch!

Hadoop QA added a comment - 20/Oct/08 04:29 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12392402/4453-2.patch
against trunk revision 705942.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3496/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3496/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3496/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3496/console

This message is automatically generated.


Tsz Wo (Nicholas), SZE added a comment - 22/Oct/08 08:39 PM
setupSsl(...) should not be done in DistCp since DistCp should only uses file system API. How about move the codes to HsftpFileSystem, so that these codes will be executed when a HsftpFileSystem is initialized?

Kan Zhang added a comment - 22/Oct/08 09:14 PM
SSL properties are system properties that affect all SSL connections that the client make, not just those used in HsftpFileSystem.

Chris Douglas added a comment - 22/Oct/08 10:51 PM

SSL properties are system properties that affect all SSL connections that the client make, not just those used in HsftpFileSystem.

This is a good point, but it also highlights the extent of this change. Since the purpose of this JIRA is to improve the existing ssl support for HsftpFileSystem, its side-effects shouldn't extend to all map/reduce tasks. Ideally, it should also maintain distcp's FileSystem agnosticism.

The server code looks good. The client code in Child and DistCp can probably be moved into HsftpFileSystem::initialize, which should also warn if it's called more than once in the same JVM (assuming a second HsftpFileSystem handle can invalidate the first; FileSystem cache hits should be benign). Unfortunately, this creates a dilemma for distcp: its client and task resources may be at different locations, but HsftpFileSystem will use the same property. This is going to be true for many users of HsftpFileSystem.

Lacking a general need for similar, asymmetric configuration, I'd propose adding a static config method to HsftpFileSystem:

public static void setSslConfigLocation(String loc, Configuration conf) {
  conf.set("dfs.https.client.keystore.resource", loc);
}

And calling this before submitting the job from DistCp, if -mapredSslLoc <loc> is specified. If there is a general need, we can look into extending GenericOptionsParser to support something more clever.


Kan Zhang added a comment - 07/Nov/08 06:54 AM
Though I'm not fully convinced, I attached a new patch that follows Chris' suggestion.

Hadoop QA added a comment - 07/Nov/08 08:15 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393492/4453-3.patch
against trunk revision 712102.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3550/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3550/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3550/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3550/console

This message is automatically generated.


Chris Douglas added a comment - 07/Nov/08 08:52 PM
Hsftp::initialize should call super.initialize(name, conf); instead of repeating code from Hftp::initialize, but otherwise +1

Has this been tested with with FsShell?


Kan Zhang added a comment - 07/Nov/08 10:24 PM
> Hsftp::initialize should call super.initialize(name, conf);
corrected in the new patch.

> Has this been tested with with FsShell?
Yes.


Chris Douglas added a comment - 07/Nov/08 11:36 PM
HADOOP-4575 conflicted with this patch; this is the diff committed to trunk.

Chris Douglas added a comment - 07/Nov/08 11:37 PM
I just committed this. Thanks, Kan

Hudson added a comment - 08/Nov/08 04:47 PM
Integrated in Hadoop-trunk #655 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/655/)
. Improve ssl configuration and handling in HsftpFileSystem,
particularly when used with DistCp. Contributed by Kan Zhang.