Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6379

Not deducting the STANDALONE pipelines when counting pipelines on each datanode to check the pipeline limit.




      So I found this bug when I tried to add robot tests to the ozone debug CLI, but I was able to recreate it locally. I had three datanodes and created a new pipeline with the ozone admin pipeline create command, which chose a datanode and made a STANDALONE/ONE pipeline with it. After that I stopped a datanode and waited until it had a DEAD state; after I started it again it didn't create a RATIS/THREE pipeline, even though there were three healthy datanodes and no RATIS/THREE pipeline.

      In the docker-config the ozone.scm.datanode.pipeline.limit property is set to 1 (the default is 2) due to the multi raft support. When we are trying to create the pipeline we are making a healthy datanode list where we are filtering the list based on the pipeline limit. We are calculating the currect pipeline count like this on a datanode:

      int currentPipelineCount(DatanodeDetails datanodeDetails, int nodesRequired) {
          // Datanodes from pipeline in some states can also be considered available
          // for pipeline allocation. Thus the number of these pipeline shall be
          // deducted from total heaviness calculation.
          int pipelineNumDeductable = 0;
          Set<PipelineID> pipelines = nodeManager.getPipelines(datanodeDetails);
          for (PipelineID pid : pipelines) {
            Pipeline pipeline;
            try {
              pipeline = stateManager.getPipeline(pid);
            } catch (PipelineNotFoundException e) {
              LOG.debug("Pipeline not found in pipeline state manager during" +
                  " pipeline creation. PipelineID: {}", pid, e);
            if (pipeline != null &&
                // single node pipeline are not accounted for while determining
                // the pipeline limit for dn
                pipeline.getType() == HddsProtos.ReplicationType.RATIS &&
                    .hasFactor(pipeline.getReplicationConfig(), ReplicationFactor.ONE)
                        == nodesRequired &&
                            == Pipeline.PipelineState.CLOSED)) {
          return pipelines.size() - pipelineNumDeductable;

      We are only deducting the RATIS replication type pipelines (due to this condition: pipeline.getType() == HddsProtos.ReplicationType.RATIS), so will count in the STANDALONE/ONE pipeline and because of that we will reach the pipeline limit on that datanode, therefore we won't create a RATIS/THREE pipeline.

      We should deduct all the single node pipelines in this check.


        Issue Links



              zitadombi Zita Dombi
              zitadombi Zita Dombi
              0 Vote for this issue
              3 Start watching this issue