Hadoop Common
  1. Hadoop Common
  2. HADOOP-5363

Proxying for multiple HDFS clusters of different versions

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New HDFS proxy server (Tomcat based) allows clients controlled access to clusters with different versions. See Hadoop-5366 for information on using curl and wget.

      Description

      A single hdfsproxy server should be able to proxy for multiple HDFS clusters, whose Hadoop versions may be different from each other.

      1. 5363-backport.1.patch
        3 kB
        Kan Zhang
      2. ProxyTestPlan.html
        16 kB
        gary murry
      3. HADOOP-5363.patch
        37 kB
        zhiyong zhang
      4. HADOOP-5363.patch
        35 kB
        zhiyong zhang
      5. HADOOP-5363.patch
        36 kB
        zhiyong zhang
      6. HADOOP-5363.patch
        34 kB
        zhiyong zhang

        Issue Links

          Activity

          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Hide
          Kan Zhang added a comment -

          attaching a new patch for back-porting to 0.20 (not for checking in)

          Show
          Kan Zhang added a comment - attaching a new patch for back-porting to 0.20 (not for checking in)
          Hide
          Kan Zhang added a comment -

          attaching a partially backported patch for 0.20 release, which removes pickOneAddress functionality from HFTP client.

          Show
          Kan Zhang added a comment - attaching a partially backported patch for 0.20 release, which removes pickOneAddress functionality from HFTP client.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #796 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/ )
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks, Zhiyong

          Show
          Chris Douglas added a comment - I committed this. Thanks, Zhiyong
          Hide
          Chris Douglas added a comment -

          +1 Looks good

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 18 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Chris Douglas added a comment - +1 Looks good [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 18 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          zhiyong zhang added a comment -

          Make a new ProxyFileForward.java as a subclass of ProxyForwardServlet to handle /file requests.

          Thanks Chris

          Show
          zhiyong zhang added a comment - Make a new ProxyFileForward.java as a subclass of ProxyForwardServlet to handle /file requests. Thanks Chris
          Hide
          zhiyong zhang added a comment -

          That's a good suggestion.
          Further, to make it more clear and readable in structure, I would think to make ProxyForwardServlet only handle forwarding cases. Then make a new servlet named ProxyFileServlet.java, which is parallel to ProxyListPathsServlet, ProxyFileDataServlet, and ProxyStreamFile. ProxyFileServlet will handle /file requests and internally it will make a single redirect request to ProxyStreamFile. By doing that, we move the support to "/file" requests to each proxy version (18, 20, 21, etc) instead of the forwarding part. Then the forwarding functionality will be independent of each proxy versions and can be deployed in a standalone fashion.

          Show
          zhiyong zhang added a comment - That's a good suggestion. Further, to make it more clear and readable in structure, I would think to make ProxyForwardServlet only handle forwarding cases. Then make a new servlet named ProxyFileServlet.java, which is parallel to ProxyListPathsServlet, ProxyFileDataServlet, and ProxyStreamFile. ProxyFileServlet will handle /file requests and internally it will make a single redirect request to ProxyStreamFile. By doing that, we move the support to "/file" requests to each proxy version (18, 20, 21, etc) instead of the forwarding part. Then the forwarding functionality will be independent of each proxy versions and can be deployed in a standalone fashion.
          Hide
          Chris Douglas added a comment -

          For the forwarding servlets, I was thinking that a single servlet could handle the requests for anything but "/file", which gets handled by a different servlet registered for those requests. The servlet handling "/file" can be a subclass of the normal forwarding servlet, overriding a function that contains the "else" clause in the base class, and the behavior for "/file" in the subclass servlet. So in ProxyForwardServlet, there's a call out of forwardRequest to some function buildForwardPath (or whatever), that in ProxyForwardServlet contains the equivalent to the "else":

          +      if (request.getPathInfo() != null) {
          +        path += request.getPathInfo();
          +      }
          +      if (request.getQueryString() != null) {
          +        path += "?" + request.getQueryString();
          +      }
          

          but in the subclass registered for "/file" requests, buildForwardPath contains the equivalent to:

          +      // use streamFile for file access
          +      path = "/streamFile";
          +      path += "?filename=" + request.getPathInfo();
          +      UnixUserGroupInformation ugi = (UnixUserGroupInformation)request.getAttribute("authorized.ugi");
          +      if (ugi != null) {
          +        path += "&ugi=" + ugi.toString();
          +      }
          

          So then the behavior is bound to the servlets registered to these paths, rather than making the behavior of this servlet vary depending on the paths to which it is bound.

          Show
          Chris Douglas added a comment - For the forwarding servlets, I was thinking that a single servlet could handle the requests for anything but "/file", which gets handled by a different servlet registered for those requests. The servlet handling "/file" can be a subclass of the normal forwarding servlet, overriding a function that contains the "else" clause in the base class, and the behavior for "/file" in the subclass servlet. So in ProxyForwardServlet, there's a call out of forwardRequest to some function buildForwardPath (or whatever), that in ProxyForwardServlet contains the equivalent to the "else": + if (request.getPathInfo() != null) { + path += request.getPathInfo(); + } + if (request.getQueryString() != null) { + path += "?" + request.getQueryString(); + } but in the subclass registered for "/file" requests, buildForwardPath contains the equivalent to: + // use streamFile for file access + path = "/streamFile"; + path += "?filename=" + request.getPathInfo(); + UnixUserGroupInformation ugi = (UnixUserGroupInformation)request.getAttribute("authorized.ugi"); + if (ugi != null) { + path += "&ugi=" + ugi.toString(); + } So then the behavior is bound to the servlets registered to these paths, rather than making the behavior of this servlet vary depending on the paths to which it is bound.
          Hide
          Chris Douglas added a comment -

          Unfortunately, HADOOP-5390 conflicts with this patch. Would you mind regenerating it?

          Show
          Chris Douglas added a comment - Unfortunately, HADOOP-5390 conflicts with this patch. Would you mind regenerating it?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12403566/HADOOP-5363.patch
          against trunk revision 758232.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12403566/HADOOP-5363.patch against trunk revision 758232. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/136/console This message is automatically generated.
          Hide
          zhiyong zhang added a comment -

          Forgot to remove the commented out lines. updated that with the new attachment.

          Show
          zhiyong zhang added a comment - Forgot to remove the commented out lines. updated that with the new attachment.
          Hide
          zhiyong zhang added a comment -
          • Since pickOneAddress was added in support of hdfsproxy (HADOOP-4575), if it's no longer needed then it should be removed from HftpFileSystem
              • Yes.
          • Instead of pulling the configuration out of the context in doGet, it can be saved in HttpServlet::init in ProxyForwardServlet.
          • yes should be more efficient
          • Unless doForward has special meaning, it should probably not be confused with doGet, doHead, etc. that are part of HttpServlet. Either putting the body of that method in doGet or renaming it would make it more readable. doGet should have an @Override annotation
          • yes. changed name now.
          • Please include javadoc for public methods, where reasonable.
          • yes.
          • I'm confused by getServletPath on the servlets; isn't this usually on the HttpServletRequest, set by the server to the matching part of the URI? Unless it has special meaning: rather than overloading getServletPath to return a fixed String, then changing behavior of doForward based on whether it matches "/path", it might make more sense to put the logic in the "else" block in a method, then override it in the ProxyFileForward servlet.
          • very good suggestion, now changed the structure, let all forwarding point to one forwardservlet. the code looks much thinner.
          • ProxyUtil::main should print usage when it has too few arguments rather than throwing an exception for -get.
          • yes.

          Thanks.

          Show
          zhiyong zhang added a comment - Since pickOneAddress was added in support of hdfsproxy ( HADOOP-4575 ), if it's no longer needed then it should be removed from HftpFileSystem Yes. Instead of pulling the configuration out of the context in doGet, it can be saved in HttpServlet::init in ProxyForwardServlet. yes should be more efficient Unless doForward has special meaning, it should probably not be confused with doGet , doHead , etc. that are part of HttpServlet. Either putting the body of that method in doGet or renaming it would make it more readable. doGet should have an @Override annotation yes. changed name now. Please include javadoc for public methods, where reasonable. yes. I'm confused by getServletPath on the servlets; isn't this usually on the HttpServletRequest, set by the server to the matching part of the URI? Unless it has special meaning: rather than overloading getServletPath to return a fixed String, then changing behavior of doForward based on whether it matches "/path", it might make more sense to put the logic in the "else" block in a method, then override it in the ProxyFileForward servlet. very good suggestion, now changed the structure, let all forwarding point to one forwardservlet. the code looks much thinner. ProxyUtil::main should print usage when it has too few arguments rather than throwing an exception for -get . yes. Thanks.
          Hide
          Chris Douglas added a comment -

          This looks good. Just a few small notes:

          • Since pickOneAddress was added in support of hdfsproxy (HADOOP-4575), if it's no longer needed then it should be removed from HftpFileSystem
          • Instead of pulling the configuration out of the context in doGet, it can be saved in HttpServlet::init in ProxyForwardServlet.
          • Unless doForward has special meaning, it should probably not be confused with doGet, doHead, etc. that are part of HttpServlet. Either putting the body of that method in doGet or renaming it would make it more readable. doGet should have an @Override annotation
          • Please include javadoc for public methods, where reasonable.
          • I'm confused by getServletPath on the servlets; isn't this usually on the HttpServletRequest, set by the server to the matching part of the URI? Unless it has special meaning: rather than overloading getServletPath to return a fixed String, then changing behavior of doForward based on whether it matches "/path", it might make more sense to put the logic in the "else" block in a method, then override it in the ProxyFileForward servlet.
          • ProxyUtil::main should print usage when it has too few arguments rather than throwing an exception for -get.
          Show
          Chris Douglas added a comment - This looks good. Just a few small notes: Since pickOneAddress was added in support of hdfsproxy ( HADOOP-4575 ), if it's no longer needed then it should be removed from HftpFileSystem Instead of pulling the configuration out of the context in doGet, it can be saved in HttpServlet::init in ProxyForwardServlet. Unless doForward has special meaning, it should probably not be confused with doGet , doHead , etc. that are part of HttpServlet. Either putting the body of that method in doGet or renaming it would make it more readable. doGet should have an @Override annotation Please include javadoc for public methods, where reasonable. I'm confused by getServletPath on the servlets; isn't this usually on the HttpServletRequest, set by the server to the matching part of the URI? Unless it has special meaning: rather than overloading getServletPath to return a fixed String, then changing behavior of doForward based on whether it matches "/path", it might make more sense to put the logic in the "else" block in a method, then override it in the ProxyFileForward servlet. ProxyUtil::main should print usage when it has too few arguments rather than throwing an exception for -get .
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12402178/HADOOP-5363.patch
          against trunk revision 757625.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 19 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12402178/HADOOP-5363.patch against trunk revision 757625. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 19 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/125/console This message is automatically generated.
          Hide
          zhiyong zhang added a comment -

          ---++ General Design Idea.

          • 1. Use forward servlet to forward requests to the right hadoop version servlet.
          • 2. Package each version's servlet into a separate war file.
          • 3. Communication between forwarding war and hadoop-specific version war only pass build-in type to avoid class loading issue.
          • 4. Use a configuration file (hdfsproxy-site.xml) to identify the war file path. Format example (suppose www.apache1.com and www.apache2.com point to the same ip address through some dns configuration)

          <property>
          <name>www.apache1.com</name>
          <value>/v18</value>
          <description>one hostname corresponds to one web application archive
          </description>
          </property>

          <property>
          <name>www.apache2.com</name>
          <value>/v21</value>
          <description>one hostname corresponds to one web application archive
          </description>
          </property>

          • 5. each war file package one proxy, which corresponds to one source cluster. even same hadoop version can use different war file to point to different source cluster. i.e. the war file are source-cluster oriented, but not hadoop version oriented.
          • 6. tomcat need to set crossContext="true" in context.xml config file to allow cross war forwarding.
          Show
          zhiyong zhang added a comment - ---++ General Design Idea. 1. Use forward servlet to forward requests to the right hadoop version servlet. 2. Package each version's servlet into a separate war file. 3. Communication between forwarding war and hadoop-specific version war only pass build-in type to avoid class loading issue. 4. Use a configuration file (hdfsproxy-site.xml) to identify the war file path. Format example (suppose www.apache1.com and www.apache2.com point to the same ip address through some dns configuration) <property> <name>www.apache1.com</name> <value>/v18</value> <description>one hostname corresponds to one web application archive </description> </property> <property> <name>www.apache2.com</name> <value>/v21</value> <description>one hostname corresponds to one web application archive </description> </property> 5. each war file package one proxy, which corresponds to one source cluster. even same hadoop version can use different war file to point to different source cluster. i.e. the war file are source-cluster oriented, but not hadoop version oriented. 6. tomcat need to set crossContext="true" in context.xml config file to allow cross war forwarding.

            People

            • Assignee:
              zhiyong zhang
              Reporter:
              Kan Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development