Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10799

NameNode should use loginUser(hdfs) to serve iNotify requests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.6.0
    • None
    • namenode
    • None
    • Kerberized, HA cluster, iNotify client, CDH5.7.0

    Description

      When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes.

      However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message:
      "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!"

      However, the edits are actually good. NameNode should not freak out when an iNotify client's tgt ticket expires.

      I think that an easy solution to this bug, is that after NameNode verifies client has superuser permission, call SecurityUtil.doAsLoginUser and then read edits. This will make sure the operation does not fail due to an expired client ticket.

      Excerpt of related logs:

      2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs@EXAMPLE.COM (auth:KERBEROS) cause:java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.  During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one!  The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606.  If you continue, metadata will be lost forever!
      2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 112 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client IP:port] Call#73 Retry#0
      java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.  During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one!  The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606.  If you continue, metadata will be lost forever!
              at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
              at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
              at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
      

      Attachments

        1. HDFS-10799.001.patch
          2 kB
          Wei-Chiu Chuang

        Issue Links

          Activity

            People

              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: