Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2971

ChangeSecret tool should refuse to run if no write access to HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0, 1.5.1, 1.6.0
    • 1.8.0
    • None

    Description

      Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has the ability to write to /accumlo/instance_id.

      In the event that an admin knows the instance secret but runs the command as a user who can not write to the instance_id, the result is an unhelpful error message and a disconnect between HDFS and zookeeper.

      Example for cluster with instance named "foobar"

      [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
      Found 1 items
      -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
      [busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
      old zookeeper password: 
      new zookeeper password: 
      Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
      
      org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
      	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
      	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
      	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
      	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.accumulo.start.Main$1.run(Main.java:141)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
      	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
      	at $Proxy16.delete(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
      	at $Proxy17.delete(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
      	... 9 more
      [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
      Found 1 items
      -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
      [busbey@edge ~]$ zookeeper-client
      Connecting to localhost:2181
      Welcome to ZooKeeper!
      JLine support is enabled
      
      WATCHER::
      
      WatchedEvent state:SyncConnected type:None path:null
      [zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
      1528cc95-2600-4649-a50e-1645404e9d6c
      cZxid = 0xe00034f45
      ctime = Wed Jul 02 09:27:58 PDT 2014
      mZxid = 0xe00034f45
      mtime = Wed Jul 02 09:27:58 PDT 2014
      pZxid = 0xe00034f45
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0x0
      dataLength = 36
      numChildren = 0
      [zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
      [users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
      [zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
      [users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]
      
      

      What's worse, in this condition the cluster will properly come up and show everything fine if the old instance secret is used.

      However, clients and servers will now end up looking at different zookeeper nodes depending on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to get it so long as they use the corresponding instance secret.

      Furthermore, if an admin uses the CleanZooKeeper utility subsequent to this failure, it'll cause the loss of the zookeeper nodes the server processes are looking at.

      The utility should do a sanity check that /accumulo/instance_id is writable prior to changing zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper until after HDFS has been updated.

      Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the instance name and proceed as though the secret change had succeeded.

      Attachments

        Issue Links

          Activity

            People

              milleruntime Michael Miller
              busbey Sean Busbey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m