Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduced independent HSFTP proxy server for authenticated access to clusters.

      Description

      Currently, an HDFS cluster itself exposes an HTTP/HTTPS interface for reading files (i.e., HFTP/HSFTP). However, there is a need for an HTTPS proxy server running independently from the cluster and serving data to users who otherwise can't access the cluster directly. This proxy will authenticate its users via SSL certificates and implements the HSFTP interface for reading files.

      1. lib.tgz
        1.78 MB
        Kan Zhang
      2. 4575-1.patch
        114 kB
        Kan Zhang
      3. 4575-1-contrib.patch
        94 kB
        Kan Zhang
      4. 4575-1-core.patch
        20 kB
        Kan Zhang
      5. 4575-2-contrib.patch
        105 kB
        Kan Zhang
      6. 4575-2-core.patch
        9 kB
        Kan Zhang
      7. 4575-2.patch
        114 kB
        Kan Zhang
      8. 4575-3-contrib.patch
        106 kB
        Kan Zhang
      9. 4575-3-core.patch
        9 kB
        Kan Zhang
      10. 4575-3.patch
        115 kB
        Kan Zhang

        Issue Links

          Activity

          Hide
          Chris Douglas added a comment -

          On core changes:

          • There is a race condition in UserGroupInformationManager::getUgiForUser; the critical section protects the get and put, but simultaneous requests for an absent ugi will likely create duplicate entries and insert the last acquirer of the latter lock. o.a.h.mapred.IndexCache uses a standard idiom that applies here. Alternatively, simply making the method synchronized would perform adequately and avoid considerable complication, particularly given the cleanup semantics for this cache.
          • A cleanup thread is overkill. It should be sufficient to verify the timestamp of the item out of the cache and discard it if it's too old (refreshing the ugi in the cache). If the size of the cache is a concern, setting a threshold triggering cleanup on insertion is a sound strategy. The ugi+timestamp is almost certainly less than 100 bytes of memory, and closer to half that in most cases. If the limit is only checked on insertion and entries aren't pushed out of the cache, there's no risk for the degenerate case where the cache is scanned for each access. Further, only valid users are inserted, so the DoS threat is low.
          • Since there are no references to UserGroupInformationManager in core, perhaps it can live with the proxy in contrib (including the changes to hadoop-default.xml)
          • I don't understand the changes to FileDataServlet. It introduces a race condition in the jspHelper field initialization, which should be fixed. Is this getting around a static initialization order problem? If so, a proper pattern is preferred.
          • UnixUserGroupInformation::getUgiForUser is a dangerous method:
            
              public static UnixUserGroupInformation getUgiForUser(String userName)
                  throws IOException {
                String[] cmd = new String[] { "bash", "-c", "id -Gn " + userName };
                String[] groups = Shell.execCommand(cmd).split("\\s+");
                return new UnixUserGroupInformation(userName, groups);
              }
            

            Though the proxy does a fair amount of sanity checking and input sanitizing, future callers may not be so careful (e.g. userName = "foo ; rm -rf /"). Further, the login functionality is more general and a public static method on UnixUGI will impede future design. As a separate issue, replacing Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with String::split are sound improvements. For this issue, though I can see why this functionality is associated with UnixUGI, looking up groups for arbitrary users belongs in the proxy code and is not required in core. The method should also do some of its own sanity checks.

          • HftpFileSystem.ran should be final

          On contrib:

          • Why does ProxyHttpServer not extend or own an HttpServer? There is considerable overlap in their interfaces and implementations.
          • I don't understand why the proxy would return a status of 402 when the client doesn't provide a cert. Is there a canonical response for this case?
          • Given that authentication is bypassed for testing:
            +    } else { // http request, set ugi for servlets, only for testing purposes
            +      String ugi = rqst.getParameter("ugi");
            +      rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi
            +          .split(",")));
            +    }
            

            if a non-ssl listener is attached, this should probably log at WARN level on init at least.

          • Calling System.exit from createHdfsProxy is unnecessarily forceful; returning null for success and throwing on failure is clearer. Similarly, the switch stmt is less readable than humble if stmts.
          Show
          Chris Douglas added a comment - On core changes: There is a race condition in UserGroupInformationManager::getUgiForUser; the critical section protects the get and put, but simultaneous requests for an absent ugi will likely create duplicate entries and insert the last acquirer of the latter lock. o.a.h.mapred.IndexCache uses a standard idiom that applies here. Alternatively, simply making the method synchronized would perform adequately and avoid considerable complication, particularly given the cleanup semantics for this cache. A cleanup thread is overkill. It should be sufficient to verify the timestamp of the item out of the cache and discard it if it's too old (refreshing the ugi in the cache). If the size of the cache is a concern, setting a threshold triggering cleanup on insertion is a sound strategy. The ugi+timestamp is almost certainly less than 100 bytes of memory, and closer to half that in most cases. If the limit is only checked on insertion and entries aren't pushed out of the cache, there's no risk for the degenerate case where the cache is scanned for each access. Further, only valid users are inserted, so the DoS threat is low. Since there are no references to UserGroupInformationManager in core, perhaps it can live with the proxy in contrib (including the changes to hadoop-default.xml) I don't understand the changes to FileDataServlet. It introduces a race condition in the jspHelper field initialization, which should be fixed. Is this getting around a static initialization order problem? If so, a proper pattern is preferred. UnixUserGroupInformation::getUgiForUser is a dangerous method: public static UnixUserGroupInformation getUgiForUser( String userName) throws IOException { String [] cmd = new String [] { "bash" , "-c" , "id -Gn " + userName }; String [] groups = Shell.execCommand(cmd).split( "\\s+" ); return new UnixUserGroupInformation(userName, groups); } Though the proxy does a fair amount of sanity checking and input sanitizing, future callers may not be so careful (e.g. userName = "foo ; rm -rf /" ). Further, the login functionality is more general and a public static method on UnixUGI will impede future design. As a separate issue, replacing Shell::getGroupsCommand with "id -Gn" and the StringTokenizer with String::split are sound improvements. For this issue, though I can see why this functionality is associated with UnixUGI, looking up groups for arbitrary users belongs in the proxy code and is not required in core. The method should also do some of its own sanity checks. HftpFileSystem.ran should be final On contrib: Why does ProxyHttpServer not extend or own an HttpServer? There is considerable overlap in their interfaces and implementations. I don't understand why the proxy would return a status of 402 when the client doesn't provide a cert. Is there a canonical response for this case? Given that authentication is bypassed for testing: + } else { // http request, set ugi for servlets, only for testing purposes + String ugi = rqst.getParameter("ugi"); + rqst.setAttribute("authorized.ugi", new UnixUserGroupInformation(ugi + .split(","))); + } if a non-ssl listener is attached, this should probably log at WARN level on init at least. Calling System.exit from createHdfsProxy is unnecessarily forceful; returning null for success and throwing on failure is clearer. Similarly, the switch stmt is less readable than humble if stmts.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12393231/4575-1-core.patch
          against trunk revision 709609.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393231/4575-1-core.patch against trunk revision 709609. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3521/console This message is automatically generated.
          Hide
          Kan Zhang added a comment -

          > There is a race condition in UserGroupInformationManager::getUgiForUser
          I was aware of it and chose to punt on it since it doesn't affect correctness (the worst can happen is a few extra ugi objects get created and immediately garbage collected) and I was reluctant to make it a synchronized method since it may launch a shell command. Now that you view it as a problem, I've changed the code to use synchronized methods and no longer synchronize on the ugiCache object. This problem should go away.

          > It introduces a race condition in the jspHelper field initialization
          This is essentially the same issue as the above and I again chose to punt on it. As we discussed, the proper fix should be making jspHelper a singleton. I'll leave that to a separate JIRA and simply noted it in the source code.

          > Why does ProxyHttpServer not extend or own an HttpServer?
          For easier porting to 17, 18 and 19.

          > I don't understand why the proxy would return a status of 402
          Just a return code I picked that doesn't already have a defined semantics.

          > Calling System.exit from createHdfsProxy is unnecessarily forceful...
          I still prefer the code in the patch.

          Your other comments are incorporated into the new patch. Thanks!

          Show
          Kan Zhang added a comment - > There is a race condition in UserGroupInformationManager::getUgiForUser I was aware of it and chose to punt on it since it doesn't affect correctness (the worst can happen is a few extra ugi objects get created and immediately garbage collected) and I was reluctant to make it a synchronized method since it may launch a shell command. Now that you view it as a problem, I've changed the code to use synchronized methods and no longer synchronize on the ugiCache object. This problem should go away. > It introduces a race condition in the jspHelper field initialization This is essentially the same issue as the above and I again chose to punt on it. As we discussed, the proper fix should be making jspHelper a singleton. I'll leave that to a separate JIRA and simply noted it in the source code. > Why does ProxyHttpServer not extend or own an HttpServer? For easier porting to 17, 18 and 19. > I don't understand why the proxy would return a status of 402 Just a return code I picked that doesn't already have a defined semantics. > Calling System.exit from createHdfsProxy is unnecessarily forceful... I still prefer the code in the patch. Your other comments are incorporated into the new patch. Thanks!
          Hide
          Chris Douglas added a comment -

          The core changes look fine. The contrib changes are OK for now, but in the future:

          • If there were a limit on even the number of elements in the cache, checking that condition is less expensive than iterating over all the users in the cache for each insertion.
          • Where there is not a compatibility requirement for previous versions, the code duplication in ProxyHttpServer should be reduced. In particular, 0.20 should have a version with the "right" abstractions.
          • No information would be an improvement over a reserved HTTP error code. Something generic, like 400, perhaps?

          Compiling against trunk, I got no errors when applying only the patch. Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ?

          Show
          Chris Douglas added a comment - The core changes look fine. The contrib changes are OK for now, but in the future: If there were a limit on even the number of elements in the cache, checking that condition is less expensive than iterating over all the users in the cache for each insertion. Where there is not a compatibility requirement for previous versions, the code duplication in ProxyHttpServer should be reduced. In particular, 0.20 should have a version with the "right" abstractions. No information would be an improvement over a reserved HTTP error code. Something generic, like 400, perhaps? Compiling against trunk, I got no errors when applying only the patch. Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ?
          Hide
          Kan Zhang added a comment -

          > If there were a limit on even the number of elements in the cache...
          > Something generic, like 400, perhaps?
          I attached a new patch (4575-3) that addressed the above concerns.

          > Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ?
          Yes.

          Show
          Kan Zhang added a comment - > If there were a limit on even the number of elements in the cache... > Something generic, like 400, perhaps? I attached a new patch (4575-3) that addressed the above concerns. > Are the jars in lib.tgz supposed to go into src/contrib/hdfsproxy/lib ? Yes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12393458/4575-3.patch
          against trunk revision 711734.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 10 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393458/4575-3.patch against trunk revision 711734. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3544/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          I just committed this. Thanks, Kan

          Show
          Chris Douglas added a comment - I just committed this. Thanks, Kan
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Seems that this introduced some javadoc warnings. See HADOOP-4621.

          Show
          Tsz Wo Nicholas Sze added a comment - Seems that this introduced some javadoc warnings. See HADOOP-4621 .
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #655 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/655/)
          . Add a proxy service for relaying HsftpFileSystem requests.
          Includes client authentication via user certificates and config-based
          access control. (Kan Zhang via cdouglas)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #655 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/655/ ) . Add a proxy service for relaying HsftpFileSystem requests. Includes client authentication via user certificates and config-based access control. (Kan Zhang via cdouglas)

            People

            • Assignee:
              Kan Zhang
              Reporter:
              Kan Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development