Hadoop Common
  1. Hadoop Common
  2. HADOOP-4284

Support for user configurable global filters on HttpServer

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduced HttpServer method to support global filters.

      Description

      HADOOP-3854 introduced a framework for adding filters to filter browser facing urls. Sometimes, there is a need to filter all urls. For example, at Yahoo, we need to open an SSL port on the HttpServer and only accept hsftp requests from clients who can authenticate themselves using client certificate and is authorized according to certain policy file. For this to happen, we need a method to add a user configurable "global" filter, which filters on all client requests. For our purposes, such a global filter will block all https requests except those accessing the hsftp interface (it will let all http requests go through, so accesses through the normal http ports are unaffected). Moreover, those hsftp requests will be subject to further authorization checking according to the policy file.

      1. 4284-7.patch
        13 kB
        Chris Douglas
      2. 4284_20081016_96.patch
        31 kB
        Kan Zhang
      3. 4284_20081016_94.patch
        31 kB
        Kan Zhang
      4. 4284_20081016_93.patch
        31 kB
        Kan Zhang
      5. 4284_20081007_85.patch
        42 kB
        Kan Zhang
      6. 4284_20080929_83.patch
        42 kB
        Kan Zhang
      7. 4284_20080926_79.patch
        40 kB
        Kan Zhang
      8. 4284_20080925_78.patch
        40 kB
        Kan Zhang

        Issue Links

          Activity

          Hide
          Kan Zhang added a comment -

          This patch adds an interface method on HttpServer for adding a global filter. It also includes a global filter example, which filters hsftp requests based on a configurable policy file. The policy file defines the mappings from user names to list of comma separated directories/files that the corresponding users are authorized to access. The policy file is periodically checked for updates at user configurable interval.

          Note that this patch does not fix the bug identified in HADOOP-4282, which affects the browser facing filters of HADOOP-3854, but not the global filters introduced in this patch.

          Show
          Kan Zhang added a comment - This patch adds an interface method on HttpServer for adding a global filter. It also includes a global filter example, which filters hsftp requests based on a configurable policy file. The policy file defines the mappings from user names to list of comma separated directories/files that the corresponding users are authorized to access. The policy file is periodically checked for updates at user configurable interval. Note that this patch does not fix the bug identified in HADOOP-4282 , which affects the browser facing filters of HADOOP-3854 , but not the global filters introduced in this patch.
          Hide
          Kan Zhang added a comment -

          changed junit test to try to access files in /static and /logs directories.

          Show
          Kan Zhang added a comment - changed junit test to try to access files in /static and /logs directories.
          Hide
          Kan Zhang added a comment -

          Also note that, this patch changes the way default directories "/logs" and "/static" are added to the Jetty Server. They are now added through the addWebApplication interface instead of the addContext interface. The reason is that we want to get back a WebApplicationHandler object so that we can add filters to it. As a side effect, the observation from HADOOP-4282 that you have to test accessing an existent file to catch the bug is no longer valid. With this patch, whether accessing an existent file or non-existent file in the /logs or /static directory doesn't make a difference. Both accesses will fail the test (TestServletFilter.java) and catch the bug (HADOOP-4282). Therefore, in the junit test for this patch (TestGlobalFilter.java) we only tried to access non-existent files. For example, we tested accessing non-existent file "/logs/a.log" in TestGlobalFilter and it passes the test. I also tried to do the same in TestServletFilter, it failed the test and hence caught the bug.

          Show
          Kan Zhang added a comment - Also note that, this patch changes the way default directories "/logs" and "/static" are added to the Jetty Server. They are now added through the addWebApplication interface instead of the addContext interface. The reason is that we want to get back a WebApplicationHandler object so that we can add filters to it. As a side effect, the observation from HADOOP-4282 that you have to test accessing an existent file to catch the bug is no longer valid. With this patch, whether accessing an existent file or non-existent file in the /logs or /static directory doesn't make a difference. Both accesses will fail the test (TestServletFilter.java) and catch the bug ( HADOOP-4282 ). Therefore, in the junit test for this patch (TestGlobalFilter.java) we only tried to access non-existent files. For example, we tested accessing non-existent file "/logs/a.log" in TestGlobalFilter and it passes the test. I also tried to do the same in TestServletFilter, it failed the test and hence caught the bug.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The component for this issue should not be "dfs". It should be something like "http" but it does not exist. Setting it to unknown.

          Show
          Tsz Wo Nicholas Sze added a comment - The component for this issue should not be "dfs". It should be something like "http" but it does not exist. Setting it to unknown.
          Hide
          Doug Cutting added a comment -

          > The component for this issue should not be "dfs".

          The feature requested (intercept HSFTP requests) is a dfs-related feature. The implementation is more generic, but I wonder if that's best.

          The problem with generic URL filters like this is that they can be difficult to maintain. HSFTP URLs are probably stable, since HSFTP is designed to support cross-version access. But other SSL-based services may consider their urls internal, and if folks used this mechanism to filter them then their filters will break when the urls change. I'd prefer that this patch only added extensible filters for HSFTP rather than for all https urls.

          Show
          Doug Cutting added a comment - > The component for this issue should not be "dfs". The feature requested (intercept HSFTP requests) is a dfs-related feature. The implementation is more generic, but I wonder if that's best. The problem with generic URL filters like this is that they can be difficult to maintain. HSFTP URLs are probably stable, since HSFTP is designed to support cross-version access. But other SSL-based services may consider their urls internal, and if folks used this mechanism to filter them then their filters will break when the urls change. I'd prefer that this patch only added extensible filters for HSFTP rather than for all https urls.
          Hide
          Kan Zhang added a comment -

          Doug, I'm not quite sure what you meant by "extensible" filters for HSFTP. If you meant the new addGlobalFilter interface should only add user filters to HSFTP urls, but not to all urls, then it won't serve our purpose, since our security policy requires that we block all other https url requests, and only allow certain HSFTP requests. That means all urls has to be filtered (and blocked if needed), not just HSFTP urls. A client can potentially access via https all files and servlets it can access via http.

          Show
          Kan Zhang added a comment - Doug, I'm not quite sure what you meant by "extensible" filters for HSFTP. If you meant the new addGlobalFilter interface should only add user filters to HSFTP urls, but not to all urls, then it won't serve our purpose, since our security policy requires that we block all other https url requests, and only allow certain HSFTP requests. That means all urls has to be filtered (and blocked if needed), not just HSFTP urls. A client can potentially access via https all files and servlets it can access via http.
          Hide
          Doug Cutting added a comment -

          > our security policy requires that we block all other https url requests [ ... ]

          I see.

          So maybe another way to address my concern is to add a public method that returns a regex that matches HSFTP urls. That would future-proof filters from changes in HSFTP urls, and provide an example for other services. To properly support such filters we'd need such a method for each service. So, for extra-credit, you could add these for, e.g., the shuffle, the datanode, the namenode, etc. In most cases we should probably define a constant and use it both when adding the servlets and in this new method. Does that make sense? Is this overkill? It would permit us to change our internal urls without breaking filters.

          Show
          Doug Cutting added a comment - > our security policy requires that we block all other https url requests [ ... ] I see. So maybe another way to address my concern is to add a public method that returns a regex that matches HSFTP urls. That would future-proof filters from changes in HSFTP urls, and provide an example for other services. To properly support such filters we'd need such a method for each service. So, for extra-credit, you could add these for, e.g., the shuffle, the datanode, the namenode, etc. In most cases we should probably define a constant and use it both when adding the servlets and in this new method. Does that make sense? Is this overkill? It would permit us to change our internal urls without breaking filters.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12391035/4284_20080926_79.patch
          against trunk revision 699444.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12391035/4284_20080926_79.patch against trunk revision 699444. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3382/console This message is automatically generated.
          Hide
          Kan Zhang added a comment -

          Doug, I think you do have a valid point and we should try to address it whenever possible. However, in this case, adding a simple regex that matches HSFTP urls may not help. Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile. The filter needs to know which particular servlet the request is for, not just the fact that the request is a HSFTP request, since the way to obtain request file path (for authorization checking) is different for different servlets. To make it work, we have to standardize on the way file paths are sent in the https requests and any other data we may want to filter on. I feel this is a bigger task than this jira.

          Show
          Kan Zhang added a comment - Doug, I think you do have a valid point and we should try to address it whenever possible. However, in this case, adding a simple regex that matches HSFTP urls may not help. Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile. The filter needs to know which particular servlet the request is for, not just the fact that the request is a HSFTP request, since the way to obtain request file path (for authorization checking) is different for different servlets. To make it work, we have to standardize on the way file paths are sent in the https requests and any other data we may want to filter on. I feel this is a bigger task than this jira.
          Hide
          Doug Cutting added a comment -

          > Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile.

          Perhaps we need three regexs, with the convention that the first captured group is the URL?

          > I feel this is a bigger task than this jira.

          This issue in effect exposes a new public interface to Hadoop that must be maintained. Its simpler for us to maintain a back-compatible Java API than a URL API, particularly because we already have mechanisms for this.

          Show
          Doug Cutting added a comment - > Currently, there are 3 servlets serving the HSFTP interface, namely /listPaths, /data, and /streamFile. Perhaps we need three regexs, with the convention that the first captured group is the URL? > I feel this is a bigger task than this jira. This issue in effect exposes a new public interface to Hadoop that must be maintained. Its simpler for us to maintain a back-compatible Java API than a URL API, particularly because we already have mechanisms for this.
          Hide
          Kan Zhang added a comment -

          > Perhaps we need three regexs, with the convention that the first captured group is the URL?

          I'm not sure I get your point. The /streamFile servlet retrieves its file path from the query string, not from the URL as is the case with the other 2 servlets.

          Show
          Kan Zhang added a comment - > Perhaps we need three regexs, with the convention that the first captured group is the URL? I'm not sure I get your point. The /streamFile servlet retrieves its file path from the query string, not from the URL as is the case with the other 2 servlets.
          Hide
          Kan Zhang added a comment -

          Here is what I can do. Add 2 regexes. The first enables the filter to find out if the request is an HSFTP request. The second enables the filter to find out if file path is contained in the query string. If not, then file path must be part of the URL and the filter should get it by calling getPathInfo(). The benefits are 1) if we ever change the names of the servlets, filters are not affected; only need to update the regexes. 2) if we add new servlets or remove old ones, as long as we don't add new ways of getting the file path, filters don't need to be changed; only need to update the regexes. Doug, what do you think?

          Show
          Kan Zhang added a comment - Here is what I can do. Add 2 regexes. The first enables the filter to find out if the request is an HSFTP request. The second enables the filter to find out if file path is contained in the query string. If not, then file path must be part of the URL and the filter should get it by calling getPathInfo(). The benefits are 1) if we ever change the names of the servlets, filters are not affected; only need to update the regexes. 2) if we add new servlets or remove old ones, as long as we don't add new ways of getting the file path, filters don't need to be changed; only need to update the regexes. Doug, what do you think?
          Hide
          Doug Cutting added a comment -

          Kan, this sounds good. Thanks for bearing with me here.

          Show
          Doug Cutting added a comment - Kan, this sounds good. Thanks for bearing with me here.
          Hide
          Kan Zhang added a comment -

          Attaching the latest patch, 4284_20080929_83.patch. Added the 2 regexes discussed above and fixed the findbugs warning. Also, made the ssl permission file parsing more user friendly. Doug, thanks for your suggestions.

          Show
          Kan Zhang added a comment - Attaching the latest patch, 4284_20080929_83.patch. Added the 2 regexes discussed above and fixed the findbugs warning. Also, made the ssl permission file parsing more user friendly. Doug, thanks for your suggestions.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12391176/4284_20080929_83.patch
          against trunk revision 700322.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12391176/4284_20080929_83.patch against trunk revision 700322. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3401/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          This looks good. A few suggestions:

          • This could use some forrest documentation (maybe in the distcp guide?) and javadoc for public classes/methods
          • dfs.https.permission.file.recheck.interval probably belongs in ssl-server.xml instead of hadoop-default.xml
          • Iterator/Enumeration loops in HttpsFilter are more readable as foreach loops
          • Would it be possible to make this available to FsShell as well as DistCp using the ssl.client.* config?
          • The comment on HFTP_PATTERN should be javadoc
          • Run with assertions disabled, there's a possible NPE/AIOOBE in HttpsFilter::doFilter. The assertion should be replaced with a runtime check. Also, instead of checking for filterConfig == null, checking isRunning might be more readable/reliable (as volatile)
          • The parsing of the X509 distinguished name using String::split may hit some corner cases in RFC 2253 (the quoting and whitespace, in particular, could be troublesome). Using CANONICAL as the string representation may handle more of the corner cases, which is probably sufficient.

          Has this been tested at scale?

          Show
          Chris Douglas added a comment - This looks good. A few suggestions: This could use some forrest documentation (maybe in the distcp guide?) and javadoc for public classes/methods dfs.https.permission.file.recheck.interval probably belongs in ssl-server.xml instead of hadoop-default.xml Iterator/Enumeration loops in HttpsFilter are more readable as foreach loops Would it be possible to make this available to FsShell as well as DistCp using the ssl.client.* config? The comment on HFTP_PATTERN should be javadoc Run with assertions disabled, there's a possible NPE/AIOOBE in HttpsFilter::doFilter. The assertion should be replaced with a runtime check. Also, instead of checking for filterConfig == null, checking isRunning might be more readable/reliable (as volatile) The parsing of the X509 distinguished name using String::split may hit some corner cases in RFC 2253 (the quoting and whitespace, in particular, could be troublesome). Using CANONICAL as the string representation may handle more of the corner cases, which is probably sufficient. Has this been tested at scale?
          Hide
          Kan Zhang added a comment -

          Chris, thanks for your detailed comments.

          > dfs.https.permission.file.recheck.interval probably belongs in ssl-server.xml instead of hadoop-default.xml
          I think it's better in hadoop-default.xml since it's a property of the filter, which is independent of SSL listeners.

          > Would it be possible to make this available to FsShell as well as DistCp using the ssl.client.* config?
          Yes, other clients can also make use of ssl.client.* configs. But the scope of this JIRA is limited to DistCp.

          > The parsing of the X509 distinguished name using String::split
          Those corner cases wouldn't arise for this application since leading and trailing whitespaces in names can't be accommodated anyway (the name field in ssl-permission.xml will strip them off).

          > Has this been tested at scale?
          No.

          Your other comments are incorporated in a new patch 4284_20081007_85.patch. Please take a look.

          Show
          Kan Zhang added a comment - Chris, thanks for your detailed comments. > dfs.https.permission.file.recheck.interval probably belongs in ssl-server.xml instead of hadoop-default.xml I think it's better in hadoop-default.xml since it's a property of the filter, which is independent of SSL listeners. > Would it be possible to make this available to FsShell as well as DistCp using the ssl.client.* config? Yes, other clients can also make use of ssl.client.* configs. But the scope of this JIRA is limited to DistCp. > The parsing of the X509 distinguished name using String::split Those corner cases wouldn't arise for this application since leading and trailing whitespaces in names can't be accommodated anyway (the name field in ssl-permission.xml will strip them off). > Has this been tested at scale? No. Your other comments are incorporated in a new patch 4284_20081007_85.patch. Please take a look.
          Hide
          Kan Zhang added a comment -

          Adding a new patch. The sample filter in the previous patch is excluded. We are thinking of using a different filter and will not be contributing it this time.

          Show
          Kan Zhang added a comment - Adding a new patch. The sample filter in the previous patch is excluded. We are thinking of using a different filter and will not be contributing it this time.
          Hide
          Kan Zhang added a comment -

          re-submit to trigger automatic testing

          Show
          Kan Zhang added a comment - re-submit to trigger automatic testing
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values.
          • There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes?
          • New protected class DummyHostnameVerifier need javadoc.
          Show
          Tsz Wo Nicholas Sze added a comment - DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values. There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes? New protected class DummyHostnameVerifier need javadoc.
          Hide
          Kan Zhang added a comment -

          Attached a new patch with added javadoc for DummyHostnameVerifier.

          > DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values.
          Any suggestion?

          > There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes?
          Where to put such a method so that it can be conveniently shared?

          Show
          Kan Zhang added a comment - Attached a new patch with added javadoc for DummyHostnameVerifier. > DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values. Any suggestion? > There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes? Where to put such a method so that it can be conveniently shared?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >> DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values.
          > Any suggestion?
          I think it belongs to DistCp.setup(...). Since the new codes are not short, you might want to define a method called setupSsl(...) and involve it in setup(...).

          >Where to put such a method so that it can be conveniently shared?
          How about defining a new class, say SslUtil in org.apache.hadoop.security?

          BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples.

          Show
          Tsz Wo Nicholas Sze added a comment - >> DistCp.checkSrcPath(...) seems not a good place to set the conf and system properties values. > Any suggestion? I think it belongs to DistCp.setup(...). Since the new codes are not short, you might want to define a method called setupSsl(...) and involve it in setup(...). >Where to put such a method so that it can be conveniently shared? How about defining a new class, say SslUtil in org.apache.hadoop.security? BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples.
          Hide
          Chris Douglas added a comment -

          BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples.

          x.template files are automatically copied to x if x doesn't exist (build.xml:240), which is unnecessary, even undesirable here. This patch fixes the ssl setup so it no longer loads if the config is merely present, but .example is probably still preferred.

          There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes?

          The possibilities here are pretty meager... the most straightforward refactoring might take a prefix for the config properties, but that doesn't exactly make the code more readable. A public class for a utility method shared between client, datanode, and namesystem seems like a lot for so little payoff, but I don't have strong objections to this- or another, better- approach.

          Yes, other clients can also make use of ssl.client.* configs. But the scope of this JIRA is limited to DistCp.

          Users will miss this functionality... since using System.setProperty mandates that there can be only one active instance of HsftpFileSystem, would it make sense to move the client initialization into setConf or initialize? Unfortunately, this complicates the client/mapred.child separation, since HsftpFileSystem would need to know its context. The current organization is OK for now, I think; the quirks of HsftpFileSystem are sufficient to keep it sequestered to distcp.

          Overall, this looks good to me. There's still no documentation, though. It can be a separate JIRA, but please mark it as a blocker for 0.20.

          Show
          Chris Douglas added a comment - BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples. x .template files are automatically copied to x if x doesn't exist (build.xml:240), which is unnecessary, even undesirable here. This patch fixes the ssl setup so it no longer loads if the config is merely present, but .example is probably still preferred. There are repeating System.setProperty(...) calls in the patch. Could we define a method for all these common codes? The possibilities here are pretty meager... the most straightforward refactoring might take a prefix for the config properties, but that doesn't exactly make the code more readable. A public class for a utility method shared between client, datanode, and namesystem seems like a lot for so little payoff, but I don't have strong objections to this- or another, better- approach. Yes, other clients can also make use of ssl.client.* configs. But the scope of this JIRA is limited to DistCp. Users will miss this functionality... since using System.setProperty mandates that there can be only one active instance of HsftpFileSystem, would it make sense to move the client initialization into setConf or initialize ? Unfortunately, this complicates the client/mapred.child separation, since HsftpFileSystem would need to know its context. The current organization is OK for now, I think; the quirks of HsftpFileSystem are sufficient to keep it sequestered to distcp. Overall, this looks good to me. There's still no documentation, though. It can be a separate JIRA, but please mark it as a blocker for 0.20.
          Hide
          Kan Zhang added a comment -

          Attached a new patch 4284_20081016_96.patch

          > I think it belongs to DistCp.setup(...).
          It's too late to put it in setup(), since checkSrcPath() needs to use it before setup() is called. I did refactor it into a new method setupSsl() in the new patch. Thanks!

          > How about defining a new class, say SslUtil in org.apache.hadoop.security?
          I don't think it's worth the trouble. Note that although DistCp and Child set the same set of System properties, they use different ssl-client conf options. On the server side, most of the options are not set as System properties, but used to call addSslListener().

          > BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples.
          I renamed them to be templates.

          Show
          Kan Zhang added a comment - Attached a new patch 4284_20081016_96.patch > I think it belongs to DistCp.setup(...). It's too late to put it in setup(), since checkSrcPath() needs to use it before setup() is called. I did refactor it into a new method setupSsl() in the new patch. Thanks! > How about defining a new class, say SslUtil in org.apache.hadoop.security? I don't think it's worth the trouble. Note that although DistCp and Child set the same set of System properties, they use different ssl-client conf options. On the server side, most of the options are not set as System properties, but used to call addSslListener(). > BTW, what are the files ssl-client.xml.example and ssl-server.xml.example for? They seem templates but not examples. I renamed them to be templates.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12392321/4284_20081016_96.patch
          against trunk revision 705430.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12392321/4284_20081016_96.patch against trunk revision 705430. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3480/console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I did refactor it into a new method setupSsl() in the new patch.
          setupSsl() should not be invoked in DistCp.checkSrcPath(...). Please find a better place. It will be hard to understand why checkSrcPath(...) somehow changes the system properties for setting up ssl.

          BTW, this issue is supposed to due with global filters but not ssl as stated in the title. Could you create another issue for ssl? Otherwise, parties interested in ssl may miss this.

          Show
          Tsz Wo Nicholas Sze added a comment - > I did refactor it into a new method setupSsl() in the new patch. setupSsl() should not be invoked in DistCp.checkSrcPath(...). Please find a better place. It will be hard to understand why checkSrcPath(...) somehow changes the system properties for setting up ssl. BTW, this issue is supposed to due with global filters but not ssl as stated in the title. Could you create another issue for ssl? Otherwise, parties interested in ssl may miss this.
          Hide
          Chris Douglas added a comment -

          BTW, this issue is supposed to due with global filters but not ssl as stated in the title. Could you create another issue for ssl? Otherwise, parties interested in ssl may miss this.

          +1 created HADOOP-4453 for the improvements to HsftpFileSystem and ssl support.

          Show
          Chris Douglas added a comment - BTW, this issue is supposed to due with global filters but not ssl as stated in the title. Could you create another issue for ssl? Otherwise, parties interested in ssl may miss this. +1 created HADOOP-4453 for the improvements to HsftpFileSystem and ssl support.
          Hide
          Chris Douglas added a comment -

          +1 This looks good.

          I just committed this. Thanks, Kan

          Show
          Chris Douglas added a comment - +1 This looks good. I just committed this. Thanks, Kan
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #640 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/640/ )

            People

            • Assignee:
              Kan Zhang
              Reporter:
              Kan Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development