Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3314

FIx ContainerOperationClient#readContainer to use Grpc Client to read from datanode

    XMLWordPrintableJSON

Details

    Description

      config set before running the command :

      "ozone.scm.stale.node.interval": "2m",
      "ozone.scm.dead.node.interval": "4m",
      "hdds.scm.replication.thread.interval": "12s",
      "ozone.scm.container.size": "1GB"

       

      steps taken :

      1) write a key (less than a block size)

      2) shutdown two container replica datanodes.

      3) Tried to query container info

      Container info command failed .

       

       

      ozone debug chunkinfo <KeyUri> 
      Failed to execute command cmdType: ReadContainer
      

       

      scm log during that time range :

      2020-04-01 10:09:29,665 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for hrt_qa@ROOT.HWX.SITE (auth:KERBEROS)
      2020-04-01 10:09:29,706 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for hrt_qa@ROOT.HWX.SITE (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol
      2020-04-01 10:09:55,283 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/quasar-fjgcwr-2.quasar-fjgcwr.root.hwx.site@ROOT.HWX.SITE (auth:KERBEROS)
      2020-04-01 10:09:55,287 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/quasar-fjgcwr-2.quasar-fjgcwr.root.hwx.site@ROOT.HWX.SITE (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
      2020-04-01 10:09:55,474 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Starting Replication Monitor Thread.
      2020-04-01 10:09:55,486 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor Thread took 10 milliseconds for processing 33 containers.
      2020-04-01 10:10:07,488 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor Thread took 2 milliseconds for processing 33 containers.
      2020-04-01 10:10:17,996 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for dn/quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site@ROOT.HWX.SITE (auth:KERBEROS)
      2020-04-01 10:10:18,001 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for dn/quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site@ROOT.HWX.SITE (auth:KERBEROS) for protocol=interface org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol
      2020-04-01 10:10:19,491 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor Thread took 3 milliseconds for processing 33 containers.
      2020-04-01 10:10:31,494 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor Thread took 2 milliseconds for processing 33 containers.
      2020-04-01 10:10:43,495 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor Thread took 1 milliseconds for processing 33 containers.
      2020-04-01 10:10:47,987 ERROR org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline action CLOSE for Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes: 92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host: quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128, host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: 172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6, CreationTimestamp2020-04-01T10:04:47.723688Z] from datanode ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: 172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: 12651664310640168}. Reason : ea2322d9-8ede-4f48-a72d-693e809d2b95 is in candidate state for 61616ms
      2020-04-01 10:10:47,988 INFO org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: Destroying pipeline:Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes: 92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host: quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128, host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: 172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6, CreationTimestamp2020-04-01T10:04:47.723688Z]

       

      Attachments

        Issue Links

          Activity

            People

              sadanand_shenoy Sadanand Shenoy
              nilotpalnandi Nilotpal Nandi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m