Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6475

WebHdfs clients fail without retry because incorrect handling of StandbyException

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: ha, webhdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN.

      When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException.

      The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed.

      Example message:

      {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl
      assName=java.lang.SecurityException, exception=SecurityException}}
      
      org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
              at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
              at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
              at kclient1.kclient$1.run(kclient.java:64)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:356)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
              at kclient1.kclient.main(kclient.java:58)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      

        Attachments

        1. HDFS-6475.009.patch
          8 kB
          Yongjun Zhang
        2. HDFS-6475.008.patch
          8 kB
          Yongjun Zhang
        3. HDFS-6475.007.patch
          10 kB
          Yongjun Zhang
        4. HDFS-6475.006.patch
          13 kB
          Yongjun Zhang
        5. HDFS-6475.005.patch
          12 kB
          Yongjun Zhang
        6. HDFS-6475.004.patch
          11 kB
          Yongjun Zhang
        7. HDFS-6475.003.patch
          11 kB
          Yongjun Zhang
        8. HDFS-6475.003.patch
          11 kB
          Yongjun Zhang
        9. HDFS-6475.002.patch
          5 kB
          Yongjun Zhang
        10. HDFS-6475.001.patch
          2 kB
          Yongjun Zhang

          Activity

            People

            • Assignee:
              yzhangal Yongjun Zhang
              Reporter:
              yzhangal Yongjun Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: