Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5548

Keep downloaded container .gz.tar file for debug purpose

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There are a lot of container import failure LOGs in production, such as,

      2021-08-03 21:48:12,311 [ContainerReplicationThread-9] INFO org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: Starting replication of container 66315 from [4e613295-6d55-4bf9-bdc9-1668fd24741c

      {ip: 11.61.44.244, host: 11.61.44.244, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, STANDALONE=9859], networkLocation: /rack582702, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}

      , 7694e208-c887-4d8e-b249-28a176b4d7b7

      {ip: 11.61.45.38, host: 11.61.45.38, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, STANDALONE=9859], networkLocation: /rack582788, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}

      ]
      2021-08-03 21:48:17,462 [grpc-default-executor-12557] INFO org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: Container 66315 is downloaded to /data/ozoneadmin/ozoneenv/ozone-temp/container-66315.tar.gz
      2021-08-03 21:48:17,462 [ContainerReplicationThread-9] INFO org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: Container 66315 is downloaded with size 6154503, starting to import.
      2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator: Container 66315 replication was unsuccessful.
      java.io.IOException: Container descriptor is missing from the container archive.
      at org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.unpackContainerDescriptor(TarContainerPacker.java:190)
      at org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.importContainer(DownloadAndImportReplicator.java:76)
      at org.apache.hadoop.ozone.container.replication.DownloadAndImportReplicator.replicate(DownloadAndImportReplicator.java:125)
      at org.apache.hadoop.ozone.container.replication.MeasuredReplicator.replicate(MeasuredReplicator.java:69)
      at org.apache.hadoop.ozone.container.replication.ReplicationSupervisor$TaskRunner.run(ReplicationSupervisor.java:139)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      2021-08-03 21:48:17,582 [ContainerReplicationThread-9] ERROR org.apache.hadoop.ozone.container.replication.ReplicationSupervisor: Container 66315 can't be downloaded from any of the datanodes.

      In the above case, 66315 container on the source datanode actually has the Container descriptor on disk. So what's the root cause of this error is in doubt.

      This task is to keep the downloaded tar file for investigation purpose at the cost of storage space.

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              Sammi Sammi Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: