Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10652

[Upgrade][EC] Reconstruction failing with "java.io.IOException: None of the block data have checksum"

    XMLWordPrintableJSON

Details

    Description

      Upgrade versions:
      Pre upgrade hash: https://github.com/apache/ozone/commit/6ee6c357678676661ebb3181a56622c79b487bc1

      Post upgrade Hash:
      https://github.com/apache/ozone/commit/46b6f3def1d84ca769affb4d3f0d84dece6e8567
      Scenario:
      Write a EC file(5GB) RS-3-2-1024K policy(in this case) before upgrade, after upgrade, shut down either 2 Parity nodes(this case) or 2 Data nodes, as the policy supports tolerating 2 DN failure. Check if reconstruction happens after sometime.

      Observed Behavior:
      1. Data was successfully written pre-upgrade using Freon. 
      File name: o3://ozone1711558189/ec-construct-vol/ec-construct-buck/ec-construction/0
      2. Post upgrade Stop two of the DNs, in this case the Parity nodes that we obtained from one of the containers that was storing the above file's data.

      ozone admin container info 1004 --json
      2024-03-27 21:35:15,065|INFO|MainThread|machine.py:232 - run()||GUID=183f2d10-e3a7-407f-adb5-b87f3e3af53b|Exit Code: 0
      2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:723 - find_ec_data_parity_hosts()|parity hosts: ['DN-4', 'DN-3']
      2024-03-27 21:35:15,098|INFO|MainThread|ozone.py:724 - find_ec_data_parity_hosts()|data hosts: ['DN-8', 'DN-5', 'DN-1'] 
      2024-03-27 21:35:15,311|INFO|MainThread|cm_apilib.py:1214 - stopComponent()|Initiating stop of OZONE_DATANODE at host DN-4
      2024-03-27 21:35:15,349|INFO|MainThread|cm_apilib.py:1218 - stopComponent()|Command name = Stop , ID = 2860  
      2024-03-27 21:35:15,580|INFO|MainThread|cm_apilib.py:1214 - stopComponent()|Initiating stop of OZONE_DATANODE at host DN-3
      2024-03-27 21:35:15,609|INFO|MainThread|cm_apilib.py:1218 - stopComponent()|Command name = Stop , ID = 2862  

      Node DN-3 and DN-4 are stopped.

      3. Read file's data(Online Reconstruction) and compute checksum, -> That matched.
      4. Wait for Reconstruction to happen, test waited for 20 Minutes, but Still only 3 DNs were present even after 20 minutes:

      ['DN-5', 'DN-1', 'DN-8']

      Infact still after 10 hours(At the time of writing), there are still 3 DNs only:

      date
      Thu Mar 28 08:39:16 UTC 2024
      ozone admin container info 1004 --json
      {
        "containerInfo" : {
          "state" : "CLOSED",
          "stateEnterTime" : "2024-03-27T18:43:51.934Z",
          "replicationConfig" : {
            "data" : 3,
            "parity" : 2,
            "ecChunkSize" : 1048576,
            "codec" : "RS",
            "requiredNodes" : 5,
            "replicationType" : "EC"
          },
          "usedBytes" : 1342177280,
          "numberOfKeys" : 5,
          "lastUsed" : "2024-03-28T08:39:24.535189Z",
          "owner" : "om1",
          "containerID" : 1004,
          "deleteTransactionId" : 0,
          "sequenceId" : 0,
          "deleted" : false,
          "open" : false
        },
        "pipeline" : {
          "id" : {
            "id" : "73532c14-40ac-4924-9353-2f18ab0d63f2"
          },
          "replicationConfig" : {
            "data" : 3,
            "parity" : 2,
            "ecChunkSize" : 1048576,
            "codec" : "RS",
            "requiredNodes" : 5,
            "replicationType" : "EC"
          },
          "nodesInOrder" : [ {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          }, {
            "level" : 0,
            "cost" : 0,
            "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "ipAddress" : "10.140.40.9",
            "hostName" : "DN-1",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -1387859873,
            "networkLocation" : "/default",
            "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "numOfLeaves" : 1
          }, {
            "level" : 0,
            "cost" : 0,
            "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "ipAddress" : "10.140.137.128",
            "hostName" : "DN-8",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : 1098159392,
            "networkLocation" : "/default",
            "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "numOfLeaves" : 1
          } ],
          "creationTimestamp" : "2024-03-28T08:39:24.480Z",
          "stateEnterTime" : "2024-03-28T08:39:24.545517Z",
          "leaderNode" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          },
          "firstNode" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          },
          "closestNode" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          },
          "allocationTimeout" : false,
          "healthy" : true,
          "pipelineState" : "ALLOCATED",
          "nodes" : [ {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          }, {
            "level" : 0,
            "cost" : 0,
            "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "ipAddress" : "10.140.40.9",
            "hostName" : "DN-1",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -1387859873,
            "networkLocation" : "/default",
            "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "numOfLeaves" : 1
          }, {
            "level" : 0,
            "cost" : 0,
            "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "ipAddress" : "10.140.137.128",
            "hostName" : "DN-8",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : 1098159392,
            "networkLocation" : "/default",
            "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "numOfLeaves" : 1
          } ],
          "empty" : false,
          "type" : "EC"
        },
        "replicas" : [ {
          "containerID" : 1004,
          "state" : "CLOSED",
          "datanodeDetails" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "uuidString" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "ipAddress" : "10.140.37.12",
            "hostName" : "DN-5z",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -662262523,
            "networkLocation" : "/default",
            "networkName" : "6179347f-5824-41d4-b722-f1dbc5f14880",
            "networkFullPath" : "/default/6179347f-5824-41d4-b722-f1dbc5f14880",
            "numOfLeaves" : 1
          },
          "placeOfBirth" : "6179347f-5824-41d4-b722-f1dbc5f14880",
          "sequenceId" : 0,
          "keyCount" : 5,
          "bytesUsed" : 1342177280,
          "replicaIndex" : 2
        }, {
          "containerID" : 1004,
          "state" : "CLOSED",
          "datanodeDetails" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "uuidString" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "ipAddress" : "10.140.40.9",
            "hostName" : "DN-1",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : -1387859873,
            "networkLocation" : "/default",
            "networkName" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "networkFullPath" : "/default/d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
            "numOfLeaves" : 1
          },
          "placeOfBirth" : "d8afb52b-5f4c-4d94-9286-7c3cfd6c315c",
          "sequenceId" : 0,
          "keyCount" : 5,
          "bytesUsed" : 1342177280,
          "replicaIndex" : 3
        }, {
          "containerID" : 1004,
          "state" : "CLOSED",
          "datanodeDetails" : {
            "level" : 0,
            "cost" : 0,
            "uuid" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "uuidString" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "ipAddress" : "10.140.137.128",
            "hostName" : "DN-8",
            "ports" : [ {
              "name" : "HTTPS",
              "value" : 9883
            }, {
              "name" : "CLIENT_RPC",
              "value" : 9864
            }, {
              "name" : "REPLICATION",
              "value" : 9886
            }, {
              "name" : "RATIS",
              "value" : 9858
            }, {
              "name" : "RATIS_ADMIN",
              "value" : 9857
            }, {
              "name" : "RATIS_SERVER",
              "value" : 9856
            }, {
              "name" : "STANDALONE",
              "value" : 9859
            } ],
            "setupTime" : 0,
            "persistedOpState" : "IN_SERVICE",
            "persistedOpStateExpiryEpochSec" : 0,
            "initialVersion" : 0,
            "currentVersion" : 1,
            "decommissioned" : false,
            "maintenance" : false,
            "signature" : 1098159392,
            "networkLocation" : "/default",
            "networkName" : "ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "networkFullPath" : "/default/ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e",
            "numOfLeaves" : 1
          },
          "placeOfBirth" : "711656cf-a99e-4b2c-8c35-f015ee94889c",
          "sequenceId" : 0,
          "keyCount" : 5,
          "bytesUsed" : 1342177280,
          "replicaIndex" : 1
        } ]
      } 

      Checked the SCM Logs, it is still sending reconstructECContainersCommand, 

      2024-03-28 08:36:56,748 INFO [Under Replicated Processor]-org.apache.hadoop.hdds.scm.container.replication.ReplicationManager: Sending command [reconstructECContainersCommand: containerID: 1004, replicationConfig: EC{rs-3-2-1024k}, sources: [ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128) replicaIndex: 1, 6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12) replicaIndex: 2, d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9) replicaIndex: 3], targets: [572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)], missingIndexes: [4, 5]] for container ContainerInfo{id=#1004, state=CLOSED, stateEnterTime=2024-03-27T18:43:51.934Z, pipelineID=PipelineID=53f5587f-9e6c-465d-a0cb-b82d10c227d3, owner=om1} to 572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130) with datanode deadline 1711615886747 and scm deadline 1711615916747 

      Checked one of the Target DN DN-7, its throwing below warnings.

      2024-03-28 08:37:14,982 WARN [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask: FAILED reconstructECContainersCommand: containerID=1004, replication=rs-3-2-1024k, missingIndexes=[4, 5], sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128), 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12), 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)}, targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} after 10639 ms
      java.io.IOException: None of the block data have checksum which means 2(parity)+1 blocks are not present
              at org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:156)
              at org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECBlockGroup(ECReconstructionCoordinator.java:325)
              at org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinator.reconstructECContainerGroup(ECReconstructionCoordinator.java:171)
              at org.apache.hadoop.ozone.container.ec.reconstruction.ECReconstructionCoordinatorTask.runTask(ECReconstructionCoordinatorTask.java:68)
              at org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:359)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:834)
      2024-03-28 08:37:14,982 WARN [ContainerReplicationThread-5]-org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: Failed FAILED reconstructECContainersCommand: containerID=1004, replication=rs-3-2-1024k, missingIndexes=[4, 5], sources={1=ef7ae3e9-5ec3-49d6-9b93-1c687009bc1e(DN-8/10.140.137.128), 2=6179347f-5824-41d4-b722-f1dbc5f14880(DN-5/10.140.37.12), 3=d8afb52b-5f4c-4d94-9286-7c3cfd6c315c(DN-1/10.140.40.9)}, targets={4=572ed33d-a834-4d80-be35-7b1b19c8bd74(DN-7/10.140.234.130), 5=711656cf-a99e-4b2c-8c35-f015ee94889c(DN-2/10.140.45.129)} 

      Expected Behavior: Reconstruction should have happened 

      Note: This is fairly reproducible everytime.

      cc: siddhant 

      Attachments

        Issue Links

          Activity

            People

              siddhant Siddhant Sangwan
              pratyush.bhatt Pratyush Bhatt
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: