Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1753

Datanode unable to find chunk while replication data using ratis.

    XMLWordPrintableJSON

Details

    Description

      Leader datanode is unable to read chunk from the datanode while replicating data from leader to follower.
      Please note that deletion of keys is also happening while the data is being replicated.

      2019-07-02 19:39:22,604 INFO  impl.RaftServerImpl (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. Reply:76a3eb0f-d7cd-477b-8973-db1
      014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
      2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
      -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
      2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot (9770) already h
      as the append entries (first index: 1)
      2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. Reply:76a3eb0f-d7cd-477b-8973-db1
      014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
      2019-07-02 19:39:22,605 INFO  keyvalue.KeyValueHandler (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
      hunk file. chunk info ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
      2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot (9770) already h
      as the append entries (first index: 2)
      2019-07-02 19:39:22,606 INFO  impl.RaftServerImpl (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. Reply:76a3eb0f-d7cd-477b-8973-db1
      014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
      19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | ret=FAILURE
      java.lang.Exception: Unable to find the chunk file. chunk info ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495) ~[hadoop-hdds-container-service-0.5.0-SN
      APSHOT.jar:?]
              at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache.get(LocalCache.java:3965) ~[guava-11.0.2.jar:?]
              at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) ~[guava-11.0.2.jar:?]
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.getCachedStateMachineData(ContainerStateMachine.java:494) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.ja
      r:?]
              at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$4(ContainerStateMachine.java:542) ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) [?:1.8.0_171]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
      

      Attachments

        1. HDDS-1753.000.patch
          33 kB
          Shashikant Banerjee

        Issue Links

          Activity

            People

              shashikant Shashikant Banerjee
              msingh Mukul Kumar Singh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h