Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2606

Acceptance tests are flaky due to async pipeline creation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • SCM

    Description

      Some of the acceptance tests are failing because the first Ratis.THREE pipeline is not created on time:

      For example in a HA proxy acceptance test the first block allocation is tarted before moving the Ratis.THREE pipeline to OPEN state.

      scm_1       | 2019-11-20 14:45:38,359 INFO pipeline.PipelineStateManager: Pipeline Pipeline[ Id: 4d27f405-d257-4b1b-a7b3-51bbd57356db, Nodes: 2c010ef7-8c0e-45e3-b230-326bf759773b{ip: 172.25.0.7, host: ozones3-haproxy_datanode_2.ozones3-haproxy_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:ONE, State:OPEN] moved to OPEN state
      scm_1       | 2019-11-20 14:45:46,944 WARN block.BlockManagerImpl: Pipeline creation failed for type:RATIS factor:THREE. Retrying get pipelines call once.
      scm_1       | org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 0 nodes.
      scm_1       | 	at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:153)
      scm_1       | 	at org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:58)
      scm_1       | 	at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:155)
      scm_1       | 	at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:198)
      scm_1       | 	at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:187)
      scm_1       | 	at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:159)
      scm_1       | 	at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:117)
      scm_1       | 	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
      scm_1       | 	at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:98)
      scm_1       | 	at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
      scm_1       | 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
      scm_1       | 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
      scm_1       | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
      scm_1       | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
      scm_1       | 	at java.base/java.security.AccessController.doPrivileged(Native Method)
      scm_1       | 	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
      scm_1       | 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
      scm_1       | 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
      scm_1       | 2019-11-20 14:45:46,944 INFO block.BlockManagerImpl: Could not find available pipeline of type:RATIS and factor:THREE even after retrying
      scm_1       | 2019-11-20 14:45:46,945 ERROR block.BlockManagerImpl: Unable to allocate a block for the size: 268435456, type: RATIS, factor: THREE
      scm_1       | 2019-11-20 14:45:49,274 INFO pipeline.PipelineStateManager: Pipeline Pipeline[ Id: e208588c-9a16-4519-89dc-7bad5bae4331, Nodes: 1da10d32-12be-4328-bc0a-f3d8de21b056{ip: 172.25.0.8, host: ozones3-haproxy_datanode_3.ozones3-haproxy_default, networkLocation: /default-rack, certSerialId: null}d9edb776-ee02-48a1-8c73-40f33bc0d128{ip: 172.25.0.5, host: ozones3-haproxy_datanode_1.ozones3-haproxy_default, networkLocation: /default-rack, certSerialId: null}2c010ef7-8c0e-45e3-b230-326bf759773b{ip: 172.25.0.7, host: ozones3-haproxy_datanode_2.ozones3-haproxy_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] moved to OPEN state 

      Attachments

        1. robot-ozones3-haproxy-ozones3-haproxy-basic-scm.xml
          15 kB
          Marton Elek
        2. docker-ozones3-haproxy-ozones3-haproxy-basic-scm.log
          187 kB
          Marton Elek
        3. report.html
          245 kB
          Marton Elek
        4. log.html
          1.21 MB
          Marton Elek
        5. summary.html
          1.21 MB
          Marton Elek

        Activity

          People

            elek Marton Elek
            elek Marton Elek
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: