Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9823

Pipeline failure should trigger heartbeat immediately

    XMLWordPrintableJSON

Details

    Description

      XceiverServerRatis#handlePipelineFailure is called in CSM failure scenarios

      • XceiverServerRatis#handleNodeSlowness
        • From StateMachine#notifyFollowerSlowness 
        • Set to hdds.ratis.rpc.slowness.timeout (default value 300s)
          • Note: Ratis default value is 60s
      • XceiverServerRatis#handleNoLeader
        • From StateMachine#notifyExtendedNoLeader
        • Set to hdds.ratis.notification.no-leader.timeout (default value 300s)
          • Note: Ratis default value is 60s
      • XceiverServerRatis#handleInstallSnapshotFromLeader
        • From StateMachine#notifyInstallSnapshotFromLeader

      Currently, XceiverServerRatis#handlePipelineFailure does not trigger Heartbeat to SCM immediately. Instead, it waits until the next heartbeat (default 60s) to send the pipeline close action command. This might cause SCM to still allocate blocks to these "failed" pipelines during this duration which might impact on client writing to these blocks.

      To minimize the impact on the client and the datanodes on the failed pipeline. I suggest that the datanode trigger the pipeline close command immediately for every pipeline action close command triggered due to pipeline failure.

      Attachments

        Issue Links

          Activity

            People

              ivanandika Ivan Andika
              ivanandika Ivan Andika
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: