Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5626 Track and Address Flaky tests
  3. HDDS-5971

[disabled] TestHDDSUpgrade fails to allocate pipeline after finalization

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      TestHDDSUpgrade is frequently hitting maven global test timeout threshold (about 1 hr), causing integration (filesystem-hdds) to fail. The class's junit timeout is set to 11000000ms (3 hrs+).

      I've seen this at least 3 times recently for new PR CI runs. Need to investigate why some test cases can become stuck for so long. I ran the test class locally with IntelliJ and it finished in 5 min 55 sec:

      CC avijayan erose

      Failing run:

      https://github.com/apache/ozone/runs/4160837361

      Found this I the above run's artifact bundle: No healthy node found to allocate container ?

      org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt
      2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find enough nodes that meet the space requirement of 1073741824 bytes for metadata and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
      2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR scm.SCMCommonPlacementPolicy (SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found to allocate container.
      2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  container.ReplicationManager (ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception while replicating container 2.
      org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to allocate container.
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
      	at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
      	at java.util.ArrayList.forEach(ArrayList.java:1259)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
      	at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
      	at java.lang.Thread.run(Thread.java:748)
      2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  container.ReplicationManager (ReplicationManager.java:processAll(371)) - Replication Monitor Thread took 3 milliseconds for processing 2 containers.
      2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one open pipeline after SCM finalization.
      
      

      Attachments

        1. 4390545403-it-filesystem-hdds.zip
          332 kB
          Siyao Meng
        2. screenshot-1.jpg
          67 kB
          Siyao Meng

        Activity

          People

            Unassigned Unassigned
            smeng Siyao Meng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: