Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2646

Start acceptance tests only if at least one THREE pipeline is available

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 0.5.0
    • None

    Description

      After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes.

      It makes all the acceptance tests unstable.

      For example in this run.

      scm_1         | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26
      scm_1         | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c
      scm_1         | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5
      scm_1         | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED]
      scm_1         | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} 

      As you can see the pipeline is created but the the cluster is not usable as it's not yet reporter back by datanode_2:

      scm_1         | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline creation failed for type:RATIS factor:THREE. Retrying get pipelines c
      all once.
      scm_1         | org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 0 nodes.

       The quick fix is to configure all the compose clusters to wait until one pipeline is available. This can be done by adjusting the number of the required datanodes:

      // We only care about THREE replica pipeline
      int minHealthyPipelines = minDatanodes /
          HddsProtos.ReplicationFactor.THREE_VALUE; 

       

      Attachments

        Activity

          People

            elek Marton Elek
            elek Marton Elek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m