Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-2646

Start acceptance tests only if at least one THREE pipeline is available

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.2
    • Component/s: None

      Description

      After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes.

      It makes all the acceptance tests unstable.

      For example in this run.

      scm_1         | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26
      scm_1         | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c
      scm_1         | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5
      scm_1         | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED]
      scm_1         | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}
      scm_1         | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} 

      As you can see the pipeline is created but the the cluster is not usable as it's not yet reporter back by datanode_2:

      scm_1         | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline creation failed for type:RATIS factor:THREE. Retrying get pipelines c
      all once.
      scm_1         | org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 0 nodes.

       The quick fix is to configure all the compose clusters to wait until one pipeline is available. This can be done by adjusting the number of the required datanodes:

      // We only care about THREE replica pipeline
      int minHealthyPipelines = minDatanodes /
          HddsProtos.ReplicationFactor.THREE_VALUE; 

       

        Attachments

          Activity

            People

            • Assignee:
              elek Marton Elek
              Reporter:
              elek Marton Elek
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m