Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10734

[Hbase Ozone] ImportTSV fails during OM Rolling Restart with "SecretManager$InvalidToken: Tampered/Invalid token."

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • OM
    • None

    Description

      Triggering ImportTSV during Rolling restart is failing.

      Debugged the issue, and its reproducible everytime when the "reducers" are getting used by ImportTSV and at the same time there is a OM rolling restart stage going on.

      2024-04-22 10:15:41,159|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:41 INFO mapreduce.Job:  map 100% reduce 69%
      2024-04-22 10:15:43,169|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:43 INFO mapreduce.Job:  map 100% reduce 70%
      2024-04-22 10:15:49,198|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:15:49 INFO mapreduce.Job:  map 100% reduce 71%
      2024-04-22 10:16:29,396|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|24/04/22 10:16:29 INFO mapreduce.Job: Task Id : attempt_1713778160624_0007_r_000072_0, Status : FAILED
      2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Error: org.apache.hadoop.security.token.SecretManager$InvalidToken: Tampered/Invalid token.
      2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      2024-04-22 10:16:29,434|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
      2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
      2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:253)
      2024-04-22 10:16:29,435|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:115)
      2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.<init>(BasicRootedOzoneClientAdapterImpl.java:201)
      2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.ozone.RootedOzoneClientAdapterImpl.<init>(RootedOzoneClientAdapterImpl.java:51)
      2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.ozone.RootedOzoneFileSystem.createAdapter(RootedOzoneFileSystem.java:111)
      2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.initialize(BasicRootedOzoneFileSystem.java:189)
      2024-04-22 10:16:29,436|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3451)
      2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:161)
      2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
      2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3503)
      2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:521)
      2024-04-22 10:16:29,437|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:269)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at java.security.AccessController.doPrivileged(Native Method)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at javax.security.auth.Subject.doAs(Subject.java:422)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
      2024-04-22 10:16:29,438|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Tampered/Invalid token.
      2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1616)
      2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.Client.call(Client.java:1562)
      2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.Client.call(Client.java:1459)
      2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
      2024-04-22 10:16:29,439|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
      2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at com.sun.proxy.$Proxy17.submitRequest(Unknown Source)
      2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
      2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2024-04-22 10:16:29,440|INFO|Thread-37|machine.py:205 - run()||GUID=51e988c6-6805-43d1-9290-eb6f667ac2dd|at java.lang.reflect.Method.invoke(Method.java:498) 

      Checked the leader OM logs, shows below:

      2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.133.64:46032:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.)
      2024-04-22 10:16:24,671 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.68.1:43592:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.)
      2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.170.2:41974:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.)
      2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.133.64:46020:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.)
      2024-04-22 10:16:24,672 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.11.131:50274:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.)
      2024-04-22 10:16:24,675 WARN [Socket Reader #1 for port 9862]-SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 10.140.11.131:50290:null (DIGEST-MD5: IO error acquiring password) with true cause: (OM:om102 is not the leader. Could not determine the leader node.) 

      Attachments

        Activity

          People

            smeng Siyao Meng
            pratyush.bhatt Pratyush Bhatt
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: