Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6061

Peer datanode cannot add group for pipeline in secure env

    XMLWordPrintableJSON

Details

    Description

      Extracted from HDDS-3907.

      Secure acceptance tests intermittently fail at test cases where data is being written.

      https://github.com/elek/ozone-build-results/tree/master/2021/08/19/9810/acceptance-secure for logs.

      robot log.html
      07:19:23.258	INFO	Running command 'ozone freon randomkeys --num-of-volumes 5 --num-of-buckets 5 --num-of-keys 5 --num-of-threads 1 --replication-type RATIS --factor THREE --validate-writes 2>&1'.	
      07:24:23.225	FAIL	Test timeout 5 minutes exceeded.
      
      datanode_3  | 2021-08-19 05:20:09,598 [java.util.concurrent.ThreadPoolExecutor$Worker@5f5ccab7[State = -1, empty queue]] WARN server.GrpcLogAppender: 1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899->25dd9de7-1caa-448d-a35a-2b29afced1cc-GrpcLogAppender:  appendEntries Timeout, request=AppendEntriesRequest:cid=8,entriesCount=1,lastEntry=(t:3, i:0)
      ...
      datanode_3  | 2021-08-19 05:23:56,577 [Thread-181] INFO client.GrpcClientProtocolService: Failed RaftClientRequest:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899, cid=102, seq=0, Watch-ALL_COMMITTED(131), Message:<EMPTY>, reply=RaftClientReply:client-14C4D4C86555->1c7f86b2-ded3-441b-9f20-84ba3ff60d2d@group-74FBCD15D899, cid=102, FAILED org.apache.ratis.protocol.exceptions.NotReplicatedException: Request with call Id 102 and log index 131 is not yet replicated to ALL_COMMITTED, logIndex=131, commits[1c7f86b2-ded3-441b-9f20-84ba3ff60d2d:c132, 64230e6f-d613-4ced-8084-22c404c29d15:c132, 25dd9de7-1caa-448d-a35a-2b29afced1cc:c127]
      
      datanode_2  | 2021-08-19 05:18:42,242 [Command processor thread] WARN commandhandler.CreatePipelineCommandHandler: Add group failed for 1c7f86b2-ded3-441b-9f20-84ba3ff60d2d{ip: 172.18.0.9, host: ozonesecure_datanode_3.ozonesecure_default, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}
      datanode_2  | java.io.IOException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
      

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: