Hadoop Distributed Data Store
  1. Hadoop Distributed Data Store
  HDDS-3293

read operation failing when two container replicas are corrupted




      steps taken :

      1) Mounted noise injection FUSE on all datanodes.

      2) Write a key ( multi blocks)

      3) Select one of the container ids ,  inject error on 2 container replicas for that container id.

      4) Run GET key operation.

      GET key operation fails intermittenly.

      Error seen :



      20/03/27 18:30:40 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
      E 20/03/27 18:30:40 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
      E 20/03/27 18:30:40 INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics system started
      E 20/03/27 18:31:12 ERROR scm.XceiverClientGrpc: Failed to execute command cmdType: ReadChunk
      E traceID: "f80a51eaec481a1c:cbb8e92869015a53:f80a51eaec481a1c:0"
      E containerID: 67
      E datanodeUuid: "96101390-2446-40e6-a54e-36e170497e57"
      E readChunk {
      E blockID {
      E containerID: 67
      E localID: 103896435892617248
      E blockCommitSequenceId: 1010
      E }
      E chunkData {
      E chunkName: "103896435892617248_chunk_28"
      E offset: 113246208
      E len: 4194304
      E checksumData {
      E type: CRC32
      E bytesPerChecksum: 1048576
      E checksums: "\034\376\313\031"
      E checksums: ";U\225\037"
      E checksums: "\327m\332."
      E checksums: "|\307\004E"
      E }
      E }
      E }
      E on the pipeline Pipeline[ Id: bce6316c-9690-452b-80e3-0f3590533444, Nodes: 96101390-2446-40e6-a54e-36e170497e57{ip:, host: quasar-olrywk-3.quasar-olrywk.root.hwx.site, networkLocation: /default-rack, certSerialId: null}3e85204d-2399-43b5-952a-55b837eb4c1d{ip:, host: quasar-olrywk-1.quasar-olrywk.root.hwx.site, networkLocation: /default-rack, certSerialId: null}5af0340a-6fee-4ce8-9f68-37fa35566a5a{ip:, host: quasar-olrywk-9.quasar-olrywk.root.hwx.site, networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, Factor:THREE, State:OPEN, leaderId:96101390-2446-40e6-a54e-36e170497e57, CreationTimestamp2020-03-27T03:36:51.880Z].
      E Unexpected OzoneException: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 84603913ns. [remote_addr=/]]





              Assignee:
                Shashikant Banerjee
                Nilotpal Nandi
