Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5916

DNs in pipeline raft group get stuck in infinite leader election in Kubernets env




      During the chaos test, 10% DNs were killed to mimic the possible accident. 


      kubernetes+ PV+prom



      The key writing rate sharply reduces and was tended to be a horizontal line. 

      Even after the chaos injection was recovered, the rate kept still.

      In addition, the scm_pipeline_metrics_num_pipeline_allocated metrics showed the periodic creation of new pipelines endlessly. 

      Datanodes were holding leader elections continuously, and cannot become stable after the leader was elected.



      The DN pods were killed once and the IP of each revived pod might not have the same IP address as previous. SCM can receive heartbeats from them and treat them as normal due to the invariance of DN UUID with PV. The SCM currently does not update IP in the DatanodeDetails, thus it would transfer wrong info for the datanodes in the newly allocated pipeline. 

      In the raft group,  for example,  three raft peers are  ABC respectively.  A was revived and had a new IP address. A could contact BC, but BC could not contact A. Thus A would never receive the heartbeats from leader B or C and get stuck in the transition of follower and candidate.  Each time A become the candidate, it will increase the term, raise the leader election and send it successfully to BC. The leader once receives the requestVote, will step down and reelect. This explains why the raft group in the pipeline never stabilize.

      Meanwhile, the short-term leader could send the ready message to the SCM, and the SCM misunderstands this pipeline is ready to write chunk, causing blocking issues.


      Possible solution:

      check the datanodeDetails either by  the SCM and update IP if necessary.



        Issue Links



              sokui Shawn
              Nibiruxu Xu Shao Hong
              2 Vote for this issue
              3 Start watching this issue