Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5794

The misleading "No available thread in pool for past * sencond" log message in DN StateContext

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • None

    Description

      There are a lot of following messages in DN log file, which makes me think there is something wrong with the heartbeat channel between the DN and SCM.

      2021-09-28 10:59:36,871 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,345 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,379 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,584 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,598 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,631 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,650 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,652 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,668 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,741 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,918 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,925 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,930 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,954 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,955 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:37,995 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:38,005 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:38,097 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:38,114 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.
      2021-09-28 10:59:38,174 [Datanode State Machine Thread - 0] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.

      After add some extra debug info, I found the thing is not as the LOG shows, the heartbeat channel is very healthy and this "No available thread in pool for past 30 seconds" is quite misleading. The corrent part of the LOG is "No available thread in pool", the wrong part is "for the past 30 seconds".

      2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719157366
      2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 24
      2021-09-29 11:28:01,257 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
      2021-09-29 11:28:01,261 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
      2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished executing the task 24
      2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: Sleep 29996 ms
      2021-09-29 11:28:05,622 [ChunkWriter-125-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [ChunkWriter-71-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161731
      2021-09-29 11:28:05,622 [ChunkWriter-165-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [ChunkWriter-53-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 25
      2021-09-29 11:28:05,622 [ChunkWriter-31-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [ChunkWriter-41-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [ChunkWriter-25-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] ERROR org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: execute task.wait is interrupted.
      2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161732
      2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 26
      2021-09-29 11:28:05,623 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
      2021-09-29 11:28:05,641 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
      2021-09-29 11:28:05,642 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Sending heartbeat message
      2021-09-29 11:28:05,652 [EndpointStateMachine task thread for /9.186.21.245:9861 - 0 ] INFO org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received heartbeat response
      2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished executing the task 26
      2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: Sleep 29971 ms
      2021-09-29 11:28:05,709 [ChunkWriter-73-0] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: triggerHeartbeat: interrupt stateMachineThread for ICR report
      2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161818
      2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to execute the task 27
      2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] ERROR org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: execute task.wait is interrupted.
      2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: heartbeat interval : 30000, next HB 6719161818
      2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] WARN org.apache.hadoop.ozone.container.common.statemachine.StateContext: No available thread in pool for past 30 seconds.

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              Sammi Sammi Chen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: