Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1824

JobTracker should reuse file system handle for delegation token renewal

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.24.0
    • Component/s: None
    • Labels:
      None

      Description

      In trunk, the DelegationTokenRenewal obtains the file system handle by creating the uri out of service in the token, which is ip:port. The intention of this jira is to use host name of the namenode so that fils system handle in the cache on jobtracker could be re-used. This jira is created because such an optimization is there in 20 code and the patch attached is the direct port of the code in 20.

      1. MR-1824.1.patch
        2 kB
        Jitendra Nath Pandey

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          This will not be fixed in 0.20.205 and Daryn is going to be creating new jira to fix the issue he intended to fix.

          Show
          Suresh Srinivas added a comment - This will not be fixed in 0.20.205 and Daryn is going to be creating new jira to fix the issue he intended to fix.
          Hide
          Suresh Srinivas added a comment -

          Jitendra, can you please update the description of this jira with the problem being addressed.

          Show
          Suresh Srinivas added a comment - Jitendra, can you please update the description of this jira with the problem being addressed.
          Hide
          Jitendra Nath Pandey added a comment -

          I had a discussion with Devraj and realized that the issue being addressed here is to enable JT to renew tokens with a namenode running different versions.
          Please file a different jira for that issue.

          Show
          Jitendra Nath Pandey added a comment - I had a discussion with Devraj and realized that the issue being addressed here is to enable JT to renew tokens with a namenode running different versions. Please file a different jira for that issue.
          Hide
          Jitendra Nath Pandey added a comment -

          The solution mentioned attempts to solve two issues
          1) uri will probably contain hostname so no need for a dns lookup to get the hostname to re-use dfs handle.
          2) It is assumed that the file system is hdfs, the uri will contain a scheme and will point to the correct file system.

          For 1) my opinion is that we don't really need to solve that. Even if we don't re-use existing handle, it will just cause one additional dfs object corresponding to namenodes ip address. Any subsequent renewals will re-use the same handle, therefore I think extra cost is just one dfs handle if we don't do a dns lookup. Also dns caching will keep the dns lookup cost pretty low, therefore it may also be fine to leave the current code as it is.

          2) We also renew the token over https so we cannot assume to simply renew over the uri. Token service is used in many contexts and a change there needs extensive testing.
          A less risky approach could be to use the "kind" of the token and map it to a file system. This can be done in a utility class.
          I also think hdfs assumption is not very wrong, because hdfs is the only FileSystem implementation that issues tokens right now.

          Show
          Jitendra Nath Pandey added a comment - The solution mentioned attempts to solve two issues 1) uri will probably contain hostname so no need for a dns lookup to get the hostname to re-use dfs handle. 2) It is assumed that the file system is hdfs, the uri will contain a scheme and will point to the correct file system. For 1) my opinion is that we don't really need to solve that. Even if we don't re-use existing handle, it will just cause one additional dfs object corresponding to namenodes ip address. Any subsequent renewals will re-use the same handle, therefore I think extra cost is just one dfs handle if we don't do a dns lookup. Also dns caching will keep the dns lookup cost pretty low, therefore it may also be fine to leave the current code as it is. 2) We also renew the token over https so we cannot assume to simply renew over the uri. Token service is used in many contexts and a change there needs extensive testing. A less risky approach could be to use the "kind" of the token and map it to a file system. This can be done in a utility class. I also think hdfs assumption is not very wrong, because hdfs is the only FileSystem implementation that issues tokens right now.
          Hide
          Daryn Sharp added a comment -

          The basic approach I'm taking is that delegation tokens will not have a service of "ip:port". Instead the service field will be the uri of the issuing DFS. The JT's HDFS token renewal will no longer "guess" the uri of the DFS, because it will be contained in the token's service.

          Minor modification of the token selectors is needed to allow the RPC layer to continue to use the socket to match against the authority of the uri in the service field.

          Show
          Daryn Sharp added a comment - The basic approach I'm taking is that delegation tokens will not have a service of "ip:port". Instead the service field will be the uri of the issuing DFS. The JT's HDFS token renewal will no longer "guess" the uri of the DFS, because it will be contained in the token's service. Minor modification of the token selectors is needed to allow the RPC layer to continue to use the socket to match against the authority of the uri in the service field.
          Hide
          Daryn Sharp added a comment -

          A variant of this code appears to have already been integrated. The code is commented with "...THIS IS A WORKAROUND FOR NOW. NEED TO SOLVE THIS PROBLEM IN A BETTER WAY". The issue is the lack of clear traceability of a token to its issuer. The renewal should be improved to avoid "guessing" the URI of the NN.

          Show
          Daryn Sharp added a comment - A variant of this code appears to have already been integrated. The code is commented with "...THIS IS A WORKAROUND FOR NOW. NEED TO SOLVE THIS PROBLEM IN A BETTER WAY". The issue is the lack of clear traceability of a token to its issuer. The renewal should be improved to avoid "guessing" the URI of the NN.
          Hide
          Jitendra Nath Pandey added a comment -

          patch for trunk.

          Show
          Jitendra Nath Pandey added a comment - patch for trunk.

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Jitendra Nath Pandey
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development