|
Also note that, this patch changes the way default directories "/logs" and "/static" are added to the Jetty Server. They are now added through the addWebApplication interface instead of the addContext interface. The reason is that we want to get back a WebApplicationHandler object so that we can add filters to it. As a side effect, the observation from
The component for this issue should not be "dfs". It should be something like "http" but it does not exist. Setting it to unknown.
> The component for this issue should not be "dfs".
The feature requested (intercept HSFTP requests) is a dfs-related feature. The implementation is more generic, but I wonder if that's best. The problem with generic URL filters like this is that they can be difficult to maintain. HSFTP URLs are probably stable, since HSFTP is designed to support cross-version access. But other SSL-based services may consider their urls internal, and if folks used this mechanism to filter them then their filters will break when the urls change. I'd prefer that this patch only added extensible filters for HSFTP rather than for all https urls. Doug, I'm not quite sure what you meant by "extensible" filters for HSFTP. If you meant the new addGlobalFilter interface should only add user filters to HSFTP urls, but not to all urls, then it won't serve our purpose, since our security policy requires that we block all other https url requests, and only allow certain HSFTP requests. That means all urls has to be filtered (and blocked if needed), not just HSFTP urls. A client can potentially access via https all files and servlets it can access via http.
> our security policy requires that we block all other https url requests [ ... ]
I see. So maybe another way to address my concern is to add a public method that returns a regex that matches HSFTP urls. That would future-proof filters from changes in HSFTP urls, and provide an example for other services. To properly support such filters we'd need such a method for each service. So, for extra-credit, you could add these for, e.g., the shuffle, the datanode, the namenode, etc. In most cases we should probably define a constant and use it both when adding the servlets and in this new method. Does that make sense? Is this overkill? It would permit us to change our internal urls without breaking filters. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12391035/4284_20080926_79.patch against trunk revision 699444. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/testReport/ This message is automatically generated. Doug, I think you do have a valid point and we should try to address it whenever possible. However, in this case, adding a simple regex that matches HSFTP urls may not help. Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile. The filter needs to know which particular servlet the request is for, not just the fact that the request is a HSFTP request, since the way to obtain request file path (for authorization checking) is different for different servlets. To make it work, we have to standardize on the way file paths are sent in the https requests and any other data we may want to filter on. I feel this is a bigger task than this jira.
> Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile.
Perhaps we need three regexs, with the convention that the first captured group is the URL? > I feel this is a bigger task than this jira. This issue in effect exposes a new public interface to Hadoop that must be maintained. Its simpler for us to maintain a back-compatible Java API than a URL API, particularly because we already have mechanisms for this. > Perhaps we need three regexs, with the convention that the first captured group is the URL?
I'm not sure I get your point. The /streamFile servlet retrieves its file path from the query string, not from the URL as is the case with the other 2 servlets. Here is what I can do. Add 2 regexes. The first enables the filter to find out if the request is an HSFTP request. The second enables the filter to find out if file path is contained in the query string. If not, then file path must be part of the URL and the filter should get it by calling getPathInfo(). The benefits are 1) if we ever change the names of the servlets, filters are not affected; only need to update the regexes. 2) if we add new servlets or remove old ones, as long as we don't add new ways of getting the file path, filters don't need to be changed; only need to update the regexes. Doug, what do you think?
Kan, this sounds good. Thanks for bearing with me here.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12391176/4284_20080929_83.patch against trunk revision 700322. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/testReport/ This message is automatically generated. This looks good. A few suggestions:
Has this been tested at scale? Chris, thanks for your detailed comments.
> dfs.https.permission.file.recheck.interval probably belongs in ssl-server.xml instead of hadoop-default.xml > Would it be possible to make this available to FsShell as well as DistCp using the ssl.client.* config? > The parsing of the X509 distinguished name using String::split > Has this been tested at scale? Your other comments are incorporated in a new patch 4284_20081007_85.patch. Please take a look.
Attached a new patch with added javadoc for DummyHostnameVerifier.
> DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values. > There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes? >> DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values.
> Any suggestion? I think it belongs to DistCp.setup(...). Since the new codes are not short, you might want to define a method called setupSsl(...) and involve it in setup(...). >Where to put such a method so that it can be conveniently shared? BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples.
x.template files are automatically copied to x if x doesn't exist (build.xml:240), which is unnecessary, even undesirable here. This patch fixes the ssl setup so it no longer loads if the config is merely present, but .example is probably still preferred.
The possibilities here are pretty meager... the most straightforward refactoring might take a prefix for the config properties, but that doesn't exactly make the code more readable. A public class for a utility method shared between client, datanode, and namesystem seems like a lot for so little payoff, but I don't have strong objections to this- or another, better- approach.
Users will miss this functionality... since using System.setProperty mandates that there can be only one active instance of HsftpFileSystem, would it make sense to move the client initialization into setConf or initialize? Unfortunately, this complicates the client/mapred.child separation, since HsftpFileSystem would need to know its context. The current organization is OK for now, I think; the quirks of HsftpFileSystem are sufficient to keep it sequestered to distcp. Overall, this looks good to me. There's still no documentation, though. It can be a separate JIRA, but please mark it as a blocker for 0.20. Attached a new patch 4284_20081016_96.patch
> I think it belongs to DistCp.setup(...). > How about defining a new class, say SslUtil in org.apache.hadoop.security? > BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12392321/4284_20081016_96.patch against trunk revision 705430. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/testReport/ This message is automatically generated. > I did refactor it into a new method setupSsl() in the new patch.
setupSsl() should not be invoked in DistCp.checkSrcPath(...). Please find a better place. It will be hard to understand why checkSrcPath(...) somehow changes the system properties for setting up ssl. BTW, this issue is supposed to due with global filters but not ssl as stated in the title. Could you create another issue for ssl? Otherwise, parties interested in ssl may miss this.
+1 created +1 This looks good.
I just committed this. Thanks, Kan Integrated in Hadoop-trunk #640 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/640/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Note that this patch does not fix the bug identified in
HADOOP-4282, which affects the browser facing filters ofHADOOP-3854, but not the global filters introduced in this patch.