Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7593 Supporting HSync and lease recovery
  3. HDDS-10497

[hsync] Refresh block token immediately if block token expires

    XMLWordPrintableJSON

Details

    Description

      HDDS-9734 and HDDS-7930 improves error handling when input stream fails to read due to expired block token. But it only refreshes block token after retry every datanode in the pipeline, which not only adds log spew but also increase 99.9% tail latency.

      The input stream should request new block token immediately after an expired block token.

      Relevant logs:

      2024-03-08 23:03:20,109 WARN org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk 113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061 bcsId: 129941 from 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133); will try another datanode.
      org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase (auth:SIMPLE)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
              at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425)
              at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402)
              at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387)
              at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319)
              at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173)
              at org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
              at org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367)
      ...
      
      2024-03-08 23:03:20,112 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 04646212-c013-4f8c-9ada-80580c189135, Nodes: 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20), ReplicationConfig: STANDALONE/THREE, State:OPEN, leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
      
      2024-03-08 23:03:20,113 WARN org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read chunk 113750153625603061_chunk_1 (len=1048576) conID: 4 locID: 113750153625603061 bcsId: 129941 from 98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18); will try another datanode.
      org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase (auth:SIMPLE)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
              at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
              at org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
              at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
      ...
      2024-03-08 23:03:20,116 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 04646212-c013-4f8c-9ada-80580c189135, Nodes: 5fa1d092-1f11-4f6e-af4a-cf2785a8cae4(ccycloud-1.weichiu-hbase.root.comops.site/10.140.131.133)98e5528d-c790-465e-91e0-f47d4cabe3bc(ccycloud-3.weichiu-hbase.root.comops.site/10.140.103.18)0238996a-9361-4b83-aaa8-e99fd9523ad0(ccycloud-2.weichiu-hbase.root.comops.site/10.140.135.20), ReplicationConfig: STANDALONE/THREE, State:OPEN, leaderId:98e5528d-c790-465e-91e0-f47d4cabe3bc, CreationTimestamp2024-03-08T18:59:15.755Z[UTC]].
      2024-03-08 23:03:20,390 INFO org.apache.hadoop.hdds.scm.storage.BlockInputStream: Unable to read information for block conID: 3 locID: 113750153625603098 bcsId: 459126 from pipeline PipelineID=eb1d2690-75a6-48d7-9eec-60675b907fc0: BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase (auth:SIMPLE)
      

      Attachments

        Issue Links

          Activity

            People

              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: